Professional Documents
Culture Documents
Introduction
Version 2 EE IIT, Kharagpur 1
Lesson 1
Introduction to Real Time Embedded Systems Part I
Version 2 EE IIT, Kharagpur 2
Pre-Requisite
Digital Electronics, Microprocessors
Introduction
In the day-to-day life we come across a wide variety of consumer electronic products. We are habituated to use them easily and flawlessly to our advantage. Common examples are TV Remote Controllers, Mobile Phones, FAX machines, Xerox machines etc. However, we seldom ponder over the technology behind each of them. Each of these devices does have one or more programmable devices waiting to interact with the environment as effectively as possible. These are a class of embedded systems and they provide service in real time. i.e. we need not have to wait too long for the action. Let us see how an embedded system is characterized and how complex it could be? Take example of a mobile telephone: (Fig. 1.1)
When we want to purchase any of them what do we look for? Let us see what are the choices available? Phone Price Phone 1 Rs 5000/Weight / Size 88.1 x 47.6 x 23.6 mm 116 g Screen TFT1 65k Color 96x32 screen Games Camera Yes 4 x Zoom Radio Ring tones Memory No Polyphonic
Phone 2 Rs 6000/-
89 x 49 x 24.8 mm 123 g
Phone 3 Rs 5000/-
Stauntman2 & Monopoly3 included more downloadable J2ME TFT Games: 65k Stauntman Color and 176x220 Monopoly screen More downloadable Symbian and 176 x Java 208 download pixel games or backlit packaged on screen MMC cards with 4096 colors
No
No
FM Stereo
Besides the above tabulated facts about the mobile handset, being a student of technology you may also like to know the following Network type GSM2 or CDMA3 (Bandwidth), Battery: Type and ampere hour Talk-time per one charge, Standby time
Short for thin film transistor, a type of LCD flat-panel display screen, in which each pixel is controlled by from one to four transistors. The TFT technology provides better resolution of all the flat-panel techniques, but it is also the most expensive. TFT screens are sometimes called active-matrix LCDs.
short form of Global System for Mobile Communications, one of the leading digital cellular systems. GSM uses narrowband Time Division Multiple Access (TDMA), which allows eight simultaneous calls on the same radio frequency. GSM was first introduced in 1991. As of the end of 1997, GSM service was available in more than 100 countries and has become the de facto standard in Europe and Asia.
3
Short form of Code-Division Multiple Access, a digital cellular technology that uses spread-spectrum techniques. Unlike competing systems, such as GSM, that use TDMA, CDMA does not assign a specific frequency to each user. Instead, every channel uses the full available spectrum. Individual conversations are encoded with a pseudo-random digital sequence. CDMA is a military technology first used during World War II by the English allies to foil German attempts at jamming transmissions. The allies decided to transmit over several frequencies, instead of one, making it difficult for the Germans to pick up the complete signal.
From the above specifications it is clear that a mobile phone is a very complex device which houses a number of miniature gadgets functioning coherently on a single device. Moreover each of these embedded gadgets such as digital camera or an FM radio along with the telephone has a number of operating modes such as: you may like to adjust the zoom of the digital camera, you may like to reduce the screen brightness, you may like to change the ring tone, you may like to relay a specific song from your favorite FM station to your friend using your mobile You may like to use it as a calculator, address book, emailing device etc.
These variations in the functionality can only be achieved by a very flexible device. This flexible device sitting at the heart of the circuits is none other than a Customized Microprocessor better known as an Embedded Processor and the mobile phone housing a number of functionalities is known as an Embedded System. Since it satisfies the requirement of a number of users at the same time (you and your friend, you and the radio station, you and the telephone network etc) it is working within a timeconstraint, i.e. it has to satisfy everyone with the minimum acceptable delay. We call this as to work in Real Time. This is unlike your holidaying attitude when you take the clock on your stride. We can also say that it does not make us wait long for taking our words and relaying them as well as receiving them, unlike an email server, which might take days to receive/deliver your message when the network is congested or slow. Thus we can name the mobile telephone as a Real Time Embedded System (RTES)
Definitions
Now we are ready to take some definitions
Real Time
Real-time usually means time as prescribed by external sources For example the time struck by clock (however fast or late it might be). The timings generated by your requirements. You may like to call someone at mid-night and send him a picture. This external timing requirements imposed by the user is the real-time for the embedded system.
Embedded (Embodiment)
Embodied phenomena are those that by their very nature occur in real time and real space In other words, A number of systems coexist to discharge a specific function in real time Thus A Real Time Embedded System (RTES) is precisely the union of subsystems to discharge a specific task coherently. Hence forth we call them as RTES. RTES as a generic term may mean a wide variety of systems in the real world. However we will be concerned about them which use programmable devices such as microprocessors or microcontrollers and have specific functions. We shall characterize them as follows.
Tightly Constrained
The constraints on the design and marketability of RTES are more rigid than their non-realtime non-embedded counter parts. Time-domain constraints are the first thing that is taken care while developing such a system. Size, weight, power consumption and cost4 are the other major factors.
Very few in India will be interested to buy a mobile phone if it costs Rs50,000/- even if it provides you a faster processor with 200MB of memory to store your address, your favorite mp3 music and plays them , acts as a smallscreen TV whenever you desire, takes your call intelligently However in USA majority can afford it !!!!!!
System
Subsystems
Components
= interfaces = key interface = uses open standards Fig. 1.2 The System Interface and Architecture The red and grey spheres in Fig.1.2 represent interface standards. When a system is assembled it starts with some chassis or a single subsystem. Subsequently subsystems are added onto it to make it a complete system. Let us take the example of a Desktop Computer. Though not an Embedded System it can give us a nice example of assembling a system from its subsystems. You can start assembling a desktop computer (Fig.1.3) starting with the chassis and then take the SMPS (switched mode power supply), motherboard, followed by hard disk drive, CDROM drive, Graphic Cards, Ethernet Cards etc. Each of these subsystems consists of several components e.g. Application Specific Integrated Circuits (ASICs), microprocessors, Analog as well as Digital VLSI circuits, Miniature Motor and its control electronics, Multilevel Power supply units crystal clock generators, Surface mounted capacitors and resistors etc. In the end you close the chassis and connect Keyboard, Mouse, Speakers, Visual Display Units, Ethernet Cable, Microphone, Camera etc fitting them into certain well-defined sockets. As we can see that each of the subsystems inside or outside the Desktop has cables fitting well into the slots meant for them. These cables and slots are uniform for almost any Desktop you choose to assemble. The connection of one subsystem into the other and vice-versa is known as Interfacing. It is so easy to assemble because they are all standardized. Therefore, standardization of the interfaces is most essential for the universal applicability of the system and its compatibility with other systems. There can be open standards which makes it exchange Version 2 EE IIT, Kharagpur 9
information with products from other companies. It may have certain key standards, which is only meant for the specific company which manufactures them.
SMPS
CDROM drive
Fig. 1.3 Inside Desktop Computer A Desktop Computer will have more open standards than an Embedded System. This is because of the level of integration in the later. Many of the components of the embedded systems are integrated on to a single chip. This concept is known as System on Chip (SOC) design. Thus there are only few subsystems left to be connected. Analyzing the assembling process of a Desktop let us comparatively assess the possible subsystems of the typical RTES. One such segregation is shown in Fig.1.4. The explanation of various parts as follows: User Interface: for interacting with users. May consists of keyboard, touch pad etc ASIC: Application Specific Integrated Circuit: for specific functions like motor control, data modulation etc. Microcontroller(C): A family of microprocessors
Real Time Operating System (RTOS): contains all the software for the system control and user interface Controller Process: The overall control algorithm for the external process. It also provides timing and control for the various units inside the embedded system. Digital Signal Processor (DSP) a typical family of microprocessors DSP assembly code: code for DSP stored in program memory Dual Ported Memory: Data Memory accessible by two processors at the same time CODEC: Compressor/Decompressor of the data User Interface Process: The part of the RTOS that runs the software for User Interface activities Controller Process: The part of the RTOS that runs the software for Timing and Control amongst the various units of the embedded system
User Interface Controller Process
ASIC
Dual-port memory
Fig. 1.4 Architecture of an Embedded System The above architecture represents a hypothetical Embedded System (we will see more realistic ones in subsequent examples). More than one microprocessor (2 DSPs and 1 C) are employed here to carry out different tasks. As we will learn later, the C is generally meant for simpler and slower jobs such as carrying out a Proportional Integral (PI) control action or interpreting the user commands etc. The DSP is a more heavy duty processor capable of doing real time signal processing and control. Both the DSPs along with their operating systems and codes are independent of each other. They share the same memory without interfering with each other. This kind of memory is known as dual ported memory or two-way post-box memory. The Real Time Operating System (RTOS) controls the timing requirement of all the devices. It executes the over all control algorithm of the process while diverting more complex tasks to the DSPs. It also specifically controls the C for the necessary user interactivity. The ASICs are specialized Version 2 EE IIT, Kharagpur 11
units capable of specialized functions such as motor control, voice encoding, modulation/demodulation (MODEM) action etc. They can be digital, analog or mixed signal VLSI circuits. CODECs are generally used for interfacing low power serial Analog-to-Digital Converters (ADCs). The analog signals from the controlled process can be monitored through an ADC interfaced through this CODEC.
Please click on
(c) (d)
Embedded System
An embedded system is a special-purpose system in which the computer is completely encapsulated by the device it controls. Unlike a general-purpose computer, such as a personal computer, an embedded system performs pre-defined tasks, usually with very specific requirements. Since the system is dedicated to a specific task, design engineers can optimize it, reducing the size and cost of the product. Embedded systems are often mass-produced, so the cost savings may be multiplied by millions of items. Version 2 EE IIT, Kharagpur 13
Handheld computers or PDAs are generally considered embedded devices because of the nature of their hardware design, even though they are more expandable in software terms. This line of definition continues to blur as devices expand. Q.2 Write five advantages and five disadvantages of embodiment. Ans: Five advantages: 1. 2. 3. 4. 5. Five disadvantages 1. 2. 3. 4. 5.
Smaller Size Smaller Weight Lower Power Consumption Lower Electromagnetic Interference Lower Price
Lower Mean Time Between Failure Repair and Maintenance is not possible Faster Obsolesce Unmanageable Heat Loss Difficult to Design
Q3. What do you mean by Reactive in Real Time. Cite an example. Ans: Many embedded systems must continually react to changes in the systems environment and must compute certain results in real time without delay. For example, a cars cruise controller continually monitors and reacts to speed and brake sensors. It must compute acceleration or deceleration amounts repeatedly within a limited time; a delayed computation could result in a failure to maintain control of the car. In contrast a desktop computer system typically focuses on computations, with relatively infrequent (from the computers perspective) reactions to input devices. In addition, a delay in those computations, while perhaps inconvenient to the computer user, typically does not result in a system failure. Q4. Give at least five examples of embedded systems you are using/watching in your day to day life. (i) Mobile Telephone (ii)Digital Camera (iii) A programmable calculator (iv) An iPod digital blood pressure machine (v) A
iPod: The iPod is a brand of portable media players designed and marketed by Apple Computer. Devices in the iPod family are designed around a central scroll wheel (except for the iPod shuffle) and provide a simple user interface. The full-sized model stores media on a built-in hard drive, while the smaller iPod use flash memory. Like many digital audio players, iPods can serve as external data storage devices when connected to a computer.
Q5. Write the model number and detailed specification of your/friends mobile telephone. Manufacturer Model: Network Types: EGSM/ GSM /CDMA Form Factor: The industry standard that defines the physical, external dimensions of a particular device. The size, configuration, and other specifications used to describe hardware. Battery Life Talk (hrs): Battery Life Standby (hrs): Battery Type: Measurements Weight: Dimensions: Display Display Type: Colour or Black & White Display Size (px): Display Colours: General Options Camera: Mega Pixel: Email Client: Games: Yes High Speed Data: MP3 Player: PC Sync: Yes Phonebook: Platform Series Polyphonic Ring tones: Predictive Text: Streaming Multimedia: Text Messages: Wireless Internet: Opera Other Options Alarm: Bluetooth: Calculator: Calendar: Data Capable: EMS: FM Radio: Graphics (Custom): Infrared: Speaker Phone: USB: Vibrate:
Module 1
Introduction
Version 2 EE IIT, Kharagpur 1
Lesson 2
Introduction to Real Time Embedded Systems Part II
Version 2 EE IIT, Kharagpur 2
Pre-Requisite
Digital Electronics, Microprocessors
Home appliances microwave ovens, answering machines. thermostats, home security systems, washing machines. and lighting systems etc.
business equipment electronic cash registers, curbside check-in, alarm systems, card readers product scanners, and automated teller machines
automobiles Electronic Control Unit(ECU) which includes transmission control, cruise control, fuel injection, antilock brakes, and active suspension in the same or separate modules.
Mobile Phone
Let us take the same mobile phone as discussed in Lesson 1 as example for illustrating the typical architecture of RTES. In general, a cell phone is composed of the following components:
A Circuit board (Fig. 2.2) Antenna Microphone Speaker Liquid crystal display (LCD) Keyboard Battery
Speaker Microphone
Microcontroller
Display Keyboard
Fig. 2.3 The block diagram A typical mobile phone handset (Fig. 2.3) should include standard I/O devices (keyboard, LCD), plus a microphone, speaker and antenna for wireless communication. The Digital Signal Processor (DSP) performs the signal processing, and the micro-controller controls the user interface, battery management, call setup etc. The performance specification of the DSP is very crucial since the conversion has to take place in real time. This is why almost all cell phones contain such a special processor dedicated for making digital-to-analog (DA) and analog-todigital(AD) conversions and real time processing such as modulation and demodulation etc. The Read Only Memory (ROM) and flash memory (Electrically Erasable and Programmable Memory) chips provide storage for the phones operating system(RTOS) and various data such as phone numbers, calendars information, games etc. Version 2 EE IIT, Kharagpur 7
1. Microprocessor
This is the heart of any RTES. The microprocessors used here are different from the general purpose microprocessors like Pentium Sun SPARC etc. They are designed to meet some specific requirements. For example Intel 8048 is a special purpose microprocessor which you will find in the Keyboards of your Desktop computer. It is used to scan the keystrokes and send them in a synchronous manner to your PC. Similarly mobile phones Digital Cameras use special purpose processors for voice and image processing. A washer and dryer may use some other type of processor for Real Time Control and Instrumentation.
2. Memory
The microprocessor and memory must co-exit on the same Power Circuit Board(PCB) or same chip. Compactness, speed and low power consumption are the characteristics required for the memory to be used in an RTES. Therefore, very low power semiconductor memories are used in almost all such devices. For housing the operating system Read Only Memory(ROM) is used. The program or data loaded might exist for considerable duration. It is like changing the setup of your Desktop Computer. Similar user defined setups exist in RTES. For example you may like to change the ring tone of your mobile and keep it for some time. You may like to change the screen color etc. In these cases the memory should be capable of retaining the information even after the power is removed. In other words the memory should be non-volatile and should be easily programmable too. It is achieved by using Flash1 memories.
A memory technology similar in characteristics to EPROM(Erasable Programmable Read Only Memory) memory, with the exception that erasing is performed electrically instead of via ultraviolet light, and, depending upon the organization of the flash memory device, erasing may be accomplished in blocks (typically 64k bytes at a time) instead of the entire device.
4. Software
The RTES is the just the physical body as long as it is not programmed. It is like the human body without life. Whenever you switch on your mobile telephone you might have marked some activities on the screen. Whenever you move from one city to the other you might have noticed the changes on your screen. Or when you are gone for a picnic away from your city you might have marked the no-signal sign. These activities are taken care of by the Real Time Operating System sitting on the non-volatile memory of the RTES. Besides the above an RTES may have various other components and Application Specific Integrated Circuits (ASIC) for specialized functions such as motor control, modulation, demodulation, CODEC. The design of a Real Time Embedded System has a number of constraints. The following section discusses these issues.
Design Issues
The constraints in the embedded systems design are imposed by external as well as internal specifications. Design metrics are introduced to measure the cost function taking into account the technical as well as economic considerations.
Design Metrics
A Design Metric is a measurable feature of the systems performance, cost, time for implementation and safety etc. Most of these are conflicting requirements i.e. optimizing one shall not optimize the other: e.g. a cheaper processor may have a lousy performance as far as speed and throughput is concerned. Following metrics are generally taken into account while designing embedded systems
Unit cost
The monetary cost of manufacturing each copy of the system, excluding NRE cost. Version 2 EE IIT, Kharagpur 9
Size
The physical space required by the system, often measured in bytes for software, and gates or transistors for hardware.
Performance
The execution time of the system
Power Consumption
It is the amount of power consumed by the system, which may determine the lifetime of a battery, or the cooling requirements of the IC, since more power means more heat.
Flexibility
The ability to change the functionality of the system without incurring heavy NRE cost. Software is typically considered very flexible.
Time-to-prototype
The time needed to build a working version of the system, which may be bigger or more expensive than the final system implementation, but it can be used to verify the systems usefulness and correctness and to refine the systems functionality.
Time-to-market
The time required to develop a system to the point that it can be released and sold to customers. The main contributors are design time, manufacturing time, and testing time. This metric has become especially demanding in recent years. Introducing an embedded system to the marketplace early can make a big difference in the systems profitability.
Maintainability
It is the ability to modify the system after its initial release, especially by designers who did not originally design the system.
Correctness
This is the measure of the confidence that we have implemented the systems functionality correctly. We can check the functionality throughout the process of designing the system, and we can insert test circuitry to check that manufacturing was correct.
Throughput
This is the number of tasks that can be processed per unit time. For example, a camera may be able to process 4 images per second These are the some of the cost measures for developing an RTES. Optimization of the overall cost of design includes each of these factors taken with some multiplying factors depending on their importance. And the importance of each of these factors depends on the type of application. For instance in defense related applications while designing an anti-ballistic system the execution time is the deciding factor. On the other hand, for de-noising a photograph in an embedded camera in your mobile handset the execution time may be little relaxed if it can bring down the cost and complexity of the embedded Digital Signal Processor. The design flow of an RTES involves several steps. The cost and performance is tuned and finetuned in a recursive manner. An overall design methodology is enumerated below.
Conclusion
The scope of embedded systems has been encompassing more and more diverse disciplines of technology day by day. Obsolescence of technology occurs at a much faster pace as compared to the same in other areas. The development of Ultra-Low-Power VLSI mixed signal technology is the prime factor in the miniaturization and enhancement of the performance of the existing systems. More and more systems are tending to be compact and portable with the RTES technology. The future course of embedded systems depends on the advancements of sensor technology, mechatronics and battery technology. The design of these RTES by and large is application specific. The time-gap between the conception of the design problem and marketing has been the key factor for the industry. Most of the cases for very specific applications the system needs to be developed using the available processors rather than going for a custom design.
Questions
Q1. Give one example of a typical embedded system other than listed in this lecture. Draw the block diagram and discuss the function of the various blocks. What type of embedded processor they use? Ans:
For details please http://www.gpsworld.com/ A GPS receiver receives signals from a constellation of at least four out of a total of 24 satellites. Based on the timing and other information signals sent by these satellites the digital signal processor calculates the position using triangulation.
The major block diagram is divided into (1) Active Antenna System (2)RF/IF front end (3) The Digital Signal Processor(DSP) The Active Antenna System houses the antenna a band pass filter and a low noise amplifier (LNA) The RF/IF front end houses another band pass filter, the RF amplifier and the demodulator and A/D converter. The DSP accepts the digital data and decodes the signal to retrieve the information sent by the GPS satellites. Q2. Discuss about the Hard Disk Drive housed in your PC. Is it an RTES?
Ans: Hard drives have two kinds of components: internal and external. External components are located on a printed circuit board called logic board while internal components are located in a sealed chamber called HDA or Hard Drive Assembly. For details browse http://www.hardwaresecrets.com/article/177/3 The big circuit is the controller. It is in charge of everything: exchanging data between the hard drive and the computer, controlling the motors on the hard drive, commanding the heads to read or write data, etc. All these tasks are carried out as demanded by the processor sitting on the motherboard. It can be verified to be single-functioned, tightly constrained, Therefore one can say that a Hard Disk Drive is an RTES.
Q3. Elaborate on the time-to-market design metric. Ans: The time required to develop a system to the point that it can be released and sold to customers. The main contributors are design time, manufacturing time, and testing time. This metric has become especially demanding in recent years. Introducing an embedded system to the marketplace early can make a big difference in the systems profitability. Q4. What is Moores Law? How was it conceived? Moore's law is the empirical observation that the complexity of integrated circuits, with respect to minimum component cost, doubles every 24 months. It is attributed to Gordon E. Moor, a cofounder of Intel.
[6]
Module 1
Introduction
Version 2 EE IIT, Kharagpur 1
Lesson 3
Embedded Systems Components Part I
Version 2 EE IIT, Kharagpur 2
Pre-Requisite
Digital Electronics, Microprocessors
Introduction
The various components of an Embedded System can be hierarchically grouped as System Level Components to Transistor Level Components. A system (subsystem) component is different than what is considered a "standard" electronic component. Standard components are the familiar active devices such as integrated circuits, microprocessors, memory, diodes, transistors, etc. along with passives such as resistors, capacitors, and inductors. These are the basic elements needed to mount on a circuit board for a customized, application-specific design. A system component on the other hand, has active and passive components mounted on circuit boards that are configured for a specific task. (Fig. 3.1) System components can be either single- or multi-function modules that serve as highly integrated building blocks of a system. A system component can be as simple as a digital I/O board or as complex as a computer with video, memory, networking, and I/O all on a single board. System components support industry standards and are available from multiple sources worldwide.
System
Subsystems (PCBs)
Processor Level Components (Integrated Circuits) (Microprocessors, Memory, I/O devices etc)
Gate Level Components Generally inside the Integrated Circuits rarely outside Fig. 3.1 The Hierarchical Components
AD Converter-Analog to Digital Converter UART Universal Asynchronous Receiver and Transmitter Fig. 3.2 The typical structure of an Embedded System
Primary Memory
Microprocessor
Keyboard, Hard Disk Drive, Network Card, Video Display Units Fig. 3.3 The structural layout of a desktop Computer
Typical Example
A Single Board Computer (SBC)
Since you are familiar with Desktop Computers, we should see how to make a desktop PC on a single power circuit board. They will be called Single Board Computers or SBC. These SBCs are typical embedded systems custom-made generally for Industrial Applications. In the introductory lectures you should have done some exercises on your PC. Now try to compare with this SBC with your desktop. Let us look at an example of a single board computer from EBC-C3PLUS SBC from Winsystems1.
Fig. 3.4 The Single Board Computer (SBC) Let us discuss and try to understand the features of the above single board Embedded computer. This will pave the way of our understanding more complex System-On-Chip (SOC) type of systems. The various unit and their specifications are as follows VIA 733MHz or 1 GHz low power C3 processor EBX-compliant board (Fig. 3.5) This is the processor on this SBC. VIA represents the company which manufactures the processor (www.via.com.tw), 733MHz or 1GHz is the clock frequency of this processor. C3 is
Courtesy WinSystems, Inc. 715 Stadium Drive, Arlington Texas 76011 http://sbc.winsystems.com/products/sbcs/ebcc3plus.html
the brand name as P3 and P4 for Intel. (You must be familiar with Intel processors as your PC has one)
Fig. 3.5 The Processor 32 to 512MB of system PC133 SDRAM supported in a 168-pin DIMM socket 32 to 512 MB tells the possible Random Access Memory size on the SBC. SDRAM stands for Synchronous Dynamic RAM. We will learn more about this in the memory chapter. 168-pin DIMM stands for Dual-In-Line Memory-Modules which holds the memory chips and can fit into the board easily.
Fig. 3.6 DIMM Socket for up to 1Giga Byte bootable DiskOnChip or 512KB SRAM or 1MB EPROM These are Static RAMs (SRAM) or EPROM which houses the operating system just like the Hard Disk in a Desktop computer Type I and II Compact Flash (CF) cards supported It is otherwise known as semiconductor hard-disk or floppy disk. Flash memory is an advanced form of Electrically Erasable and Programmable Read Only Memory (EEPROM). Type I and Type II are just two different designs Type II being more compact and is a recent version.
Fig. 3.7 Flash Memory PC-compatible supports Linux, Windows CE.NET and XP, plus other x86-compatible RTOS This indicates the different types of operating systems supported on this SBC platform. High resolution video controller supports: Color panels supported with up to 36-bits/pixel Supports resolutions up to 1920 x 1440 This is the video quality supported by the on-board video chips Simultaneous CRT and LCD operation: 4X AGP local bus for high speed operation: LVDS supported CRT is for cathode ray terminal, LCD for Liquid Crystal Display terminal AGP means Accelerated Graphic Port 4X represents the speed of the graphic port Accelerated Graphics Port: An extremely fast expansion-slot and bus (64 bit) designed for highperformance graphics cards LVDS Low Voltage Differential Signaling, a low noise, low power, low amplitude method for high-speed (gigabits per second) data transmission over copper wire on the Power Circuit Boards. Dual 10/100 Mbps Intel PCI Ethernet controllers The networking interface 4 RS-232 serial ports with FIFO, COM1 & COM2 with RS-422/485 support The serial interface FIFO stands for First in First Out, RS-232/RS-422/RS-485: These are the serial communication standards which you will study in due course. COM1 and COM2 stands for the same RS232 port. (your desktop has COM ports) Bi-directional LPT port supports EPP/ECP LPT stands for Line Printer Terminal: EPP/ECP stands for Enhanced Parallel Port and Extended Capabilities Port 48 bi-directional TTL digital I/O lines with 24 pins capable of event sense interrupt generation These are extra digital Input/Output lines. 24 lines are capable of sensing interrupts. Four USB ports onboard USB Universal Serial Bus, an external bus standard that supports data transfer rates of 12 Mbps. A single USB port can be used to connect up to 127 peripheral devices, such as mouse, modems, and keyboards.
Two, dual Ultra DMA 33/66/100 EIDE connectors Ultra DMA DMA stands for Direct Memory Access. It is a mode to transfer a bulk of data from the memory to hard-drive and vice-versa EIDE Short for Enhanced Integrated Drive Electronics (IDE), a newer version of the IDE mass storage device interface. It supports higher data rates about three to four times faster than the old IDE standard. In addition, it can support mass storage devices of up to 8.4 gigabytes, whereas the old standard was limited to 528 MB. The numbers 33/66/100 indicates bit rates in Mbps Floppy disk controller supports 1 or 2 drives AC97 Audio-Codec 97 Audio Codec '97 (AC'97) is the specification for, 20-bit audio architecture used in many desktop PCs. The specification was developed in the old Intel Architecture Labs in 1997 to provide system developers with a standardized specification for integrated PC audio devices. AC'97 defined a high-quality audio architecture for the PC and is capable of delivering up to 96kHz/20bit playback in stereo and 48kHz/20-bit in multi-channel playback modes PC/104 and PC/104-Plus expansion connectors PC104 gets its name from the popular desktop personal computers initially designed by IBM called the PC, and from the number of pins used to connect the cards together (104). PC104 cards are much smaller than ISA-bus cards found in PC's and stack together which eliminates the need for a motherboard, backplane, and/or card cage AT keyboard controller and PS/2 mouse support An 84-key keyboard introduced with the PC /AT. It was later replaced with the 101-key Enhanced Keyboard. Two interrupt controllers and 7 DMA channels, Three, 16-bit counter/timers, Real Time Clock, Watch Dog Timer and Power on Self Test The interrupt controllers, DMA channels, counter/timers and Real Time Clock are used for real time applications. Specifications +5 volt only operation Mechanical Dimensions: 5.75" x 8.0" (146mm x 203mm) Jumpers: 0.025" square posts Connectors Serial, Parallel, Keyboard: 50-pin on 0.100" grid COM3 & 4: 20-pin on 0.100" grid Floppy Disk Interface: 34-pin on 0.100" grid EIDE Interface: 40-pin on 0.100" grid (Primary) 44-pin on 2mm grid (Primary) 40-pin on 0.100" grid (Secondary) 50-pin 2mm Flash connector Parallel I/O: Two, 50-pin on 0.100" grid Version 2 EE IIT, Kharagpur 9
CRT: 14-pin on 2-mm. grid FP-100 Panel: Two, 50-pin on 2-mm. grid LVDS 20-pin on 0.100" grid Ethernet: Two RJ-45 PC/104 bus: 64-pin 0.100" socket, 40-pin 0.100" socket PC/104-Plus 120-pin (4 x 30; 2mm) stackthrough with shrouded header USB Four, 4-pin 0.100 Audio Three, 3.5mm stereo phone jacks Power: 9-pin in-line Molex Environmental Operating Temperature: -40 to +85C (733MHz) -40 to +60C (1GHz) Non-condensing relative humidity: 5% to 95%
Conclusion
It is apparent from the above example that a typical embedded system consist of by and large the following units housed on a single board or chip. Version 2 EE IIT, Kharagpur 10
1. 2. 3. 4. 5. 6. 7.
Processor Memory Input/Output interface chips I/O Devices including Sensors and Actuators A-D and D-A converters Software as operating system Application Software
One or more of the above units can be housed on a single PCB or single chip In a typical Embedded Systems the Microprocessor, a large part of the memory and major I/O devices are housed on a single chip called a microcontroller. Being custom-made the embedded systems are required to function for specific purposes with little user programmability. The user interaction is converted into a series of commands which is executed by the RTOS by calling various subroutines. RTOS is stored in a flash memory or read-only-memory. There will be additional scratch-pad memory for temporary data storage. If the CPU sits on the same chip as memory then a part of the memory can be used for scratch-pad purposes. Otherwise a number of CPU registers will be required for the same. CPU communicates with the memory through the address and data bus. The timing and control of these data exchange takes place by the control unit of the CPU via the control lines. The memory which is housed on the same chip as the CPU has the fastest transfer rate. This is also known as the memory band-width or bit rate. The memory outside the processor chip is slower and hence has a lesser transfer rate. On the other hand Input/Output devices have a varied degree of bandwidth. These varying degrees of data transfer rates are handled in different ways by the processor. The slower devices need interface chips. Generally chips which are faster than the microprocessor are not used. Architecture of a typical embedded-system is shown in Fig. 3.8. The hardware unit consists of the above units along with a digital as well as an analog subsystem. The software in the form of a RTOS resides in the memory.
EMBEDDED SYSTEM hardware software mechanical optical subsystem digital subsystem sensors actuators
analog subsystem
Question Answers
Q1. What are the Hierarchical components in a embedded system design. Ans: System
Subsystems (PCBs)
Processor Level Components (Integrated Circuits) (Microprocessors, Memory, I/O devices etc)
Gate Level Components Generally inside the Integrated Circuits rarely outside The Hierarchical Components Q.2. What is LVDS? Ans: Known as Low Voltage Differential Signaling. The advantages of such a standard is low noise and low interference such that one can increase the data transmission rate. Instead of 0 and 5 V or 5V a voltage level of 1.5 or 3.3 V is used for High and 0 or 1 V is used for Low. The Low to High voltage swing reduces interference. A differential mode rejects common mode noises. Q.3. Is there any actuator in your mobile phone? Ans: There is a vibrator in a mobile phone which can be activated to indicate an incoming call or message. Generally there is a coreless motor which is operated by the microcontroller for generating the vibration.
Module 1
Introduction
Version 2 EE IIT, Kharagpur 1
Lesson 4
Embedded Systems Components Part II
Version 2 EE IIT, Kharagpur 2
Pre-Requisite
Digital Electronics, Microprocessors You are now almost familiar with the various components of an embedded system. In this chapter we shall discuss some of the general components such as Processors Memory Input/Out Devices
Processors
The central processing unit is the most important component in an embedded system. It exists in an integrated manner along with memory and other peripherals. Depending on the type of applications the processors are broadly classified into 3 major categories 1. General Purpose Microprocessors 2. Microcontrollers 3. Digital Signal Processors For more specific applications customized processors can also be designed. Unless the demand is high the design and manufacturing cost of such processors will be high. Therefore, in most of the applications the design is carried out using already available processors in the market. However, the Field Programmable Gate Arrays (FPGA) can be used to implement simple customized processors easily. An FPGA is a type of logic chip that can be programmed. They support thousands of gates which can be connected and disconnected like an EPROM (Erasable Programmable Read Only Memory). They are especially popular for prototyping integrated circuit designs. Once the design is set, hardwired chips are produced for faster performance.
generally cheap because of the manufacturing of large number of units. The NRE (Non-recurring Engineering Cost: Lesson I) is spread over a large number of units. Being cheaper the manufacturer can invest more for improving the VLSI design with advanced optimized architectural features. Thus the performance, size and power consumption can be improved. Most cases, for such processors the design tools are provided by the manufacturer. Also the supporting hardware is cheap and easily available. However, only a part of the processor capability may be needed for a specific design and hence the over all embedded system will not be as optimized as it should have been as far as the space, power and reliability is concerned. Processor
Control unit
Datapath ALU
Control /Status
Controller
Registers
PC
IR
I/O Memory
Fig. 4.1 The architecture of a General Purpose Processor Pentium IV is such a general purpose processor with most advanced architectural features. Compared to its overall performance the cost is also low. A general purpose processor consists of a data path, a control unit tightly linked with the memory. (Fig. 4.1) The Data Path consists of a circuitry for transforming data and storing temporary data. It contains an arithmetic-logic-unit(ALU) capable of transforming data through operations such as addition, subtraction, logical AND, logical OR, inverting, shifting etc. The data-path also contains registers capable of storing temporary data generated out of ALU or related operations. The internal data-bus carries data within the data path while the external data bus carries data to and from the data memory. The size of the data path indicates the bit-size of the CPU. An 8-bit data path means an 8-bit CPU such as 8085 etc. The Control Unit consists of circuitry for retrieving program instructions and for moving data to, from, and through the data-path according to those instructions. It has a program counter(PC) to hold the address of the next program instruction to fetch and an Instruction register(IR) to hold Version 2 EE IIT, Kharagpur 4
the fetched instruction. It also has a timing unit in the form of state registers and control logic. The controller sequences through the states and generates the control signals necessary to read instructions into the IR and control the flow of data in the data path. Generally the address size is specified by the control unit as it is responsible to communicate with the memory. For each instruction the controller typically sequences through several stages, such as fetching the instruction from memory, decoding it, fetching the operands, executing the instruction in the data path and storing the results. Each stage takes few clock cycles.
Microcontroller
Just as you put all the major components of a Desktop PC on to a Single Board Computer (SBC) if you put all the major components of a Single Board Computer on to a single chip it will be called as a Microcontroller. Because of the limitations in the VLSI design most of the input/output functions exist in a simplified manner. Typical architecture of such a microprocessor is shown in Fig. 4.2.
Address Bus
Interrupt Controller
IRAM
XRAM
Peripheral Bus
ROM
Control
Access Control
A D
Housekeeper
Fig. 4.2 The architecture of a typical microcontroller named as C500 from Infineon Technology, Germany *The double-lined blocks are core to the processor. Other blocks are on-chip The various units of the processors (Fig. 4.2) are as follows: The C500 Core contains the CPU which consists of the Instruction Decoder, Arithmetic Logic Unit (ALU) and Program Control section The housekeeper unit generates internal signals for controlling the functions of the individual internal units within the microcontroller. Port 0 and Port 2 are required for accessing external code and data memory and for emulation purposes. Version 2 EE IIT, Kharagpur 5
Data Bus
Ext. Control
The external control block handles the external control signals and the clock generation. The access control unit is responsible for the selection of the on-chip memory resources. The IRAM provides the internal RAM which includes the general purpose registers. The XRAM is another additional internal RAM sometimes provided The interrupt requests from the peripheral units are handled by an Interrupt Controller Unit. Serial interfaces, timers, capture/compare units, A/D converters, watchdog units (WDU), or a multiply/divide unit (MDU) are typical examples for on-chip peripheral units. The external signals of these peripheral units are available at multifunctional parallel I/O ports or at dedicated pins.
Processing Unit
Result/Operands
Data Memory
Status
Opcode
Address
Control Unit
Instructions Address
Program Memory
Fig. 4.3 The modified Harvard architecture The MACD type of instructions can be executed faster by parallel implementation. This is possible by separately accessing the program and data memory in parallel. This can be accomplished by the modified architecture shown in Fig. 4.3. These DSP units generally use Multiple Access and Multi Ported Memory units. Multiple access memory allows more than one access in one clock period. The Multi-ported Memory allows multiple addresses as well Data ports. This also increases the number of access per unit clock cycle.
Address Bus 1
Data Bus 1
Address Bus 2
Data Bus 2
Fig. 4.4 Dual Ported Memory The Very Long Instruction Word (VLIW) architecture is also suitable for Signal Processing applications. This has got a number of functional units and data paths as seen in Fig. 4.5. The long instruction words are fetched from the memory. The operands and the operation to be performed by the various units are specified in the instruction itself. The multiple functional units share a common multi-ported register file for fetching the operands and storing the results. Parallel random access to the register file is possible through the read/write cross bar. Execution in the functional units is carried out concurrently with the load/store operation of data between RAM and the register file.
Functional Unit 1
.......
Functional Unit n
Instruction Cache
Microprocessors vs Microcontrollers
A microprocessor is a general-purpose digital computers central processing unit. To make a complete microcomputer, you add memory (ROM and RAM) memory decoders, an oscillator, and a number of I/O devices. The prime use of a microprocessor is to read data, perform extensive calculations on that data, and store the results in a mass storage device or display the results. These processors have complex architectures with multiple stages of pipelining and parallel processing. The memory is divided into stages such as multi-level cache and RAM. The development time of General Purpose Microprocessors is high because of a very complex VLSI design. Version 2 EE IIT, Kharagpur 7
ROM
EEPROM
RAM
Microprocessor
Serial I/O
A/D
Analog I/O
D/A
Parallel I/O
Timer
PWM
Fig. 4.6 A Microprocessor based System The design of the microcontroller is driven by the desire to make it as expandable and flexible as possible. Microcontrollers usually have on chip RAM and ROM (or EPROM) in addition to on chip i/o hardware to minimize chip count in single chip solutions. As a result of using on chip hardware for I/O and RAM and ROM they usually have pretty low performance CPU. Microcontrollers also often have timers that generate interrupts and can thus be used with the CPU and on chip A/D D/A or parallel ports to get regularly timed I/O. The prime use of a microcontroller is to control the operations of a machine using a fixed program that is stored in ROM and does not change over the lifetime of the system. The microcontroller is concerned with getting data from and to its own pins; the architecture and instruction set are optimized to handle data in bit and byte size.
ROM
EEPROM
RAM
Analog in
A/D
Timer Analog out Microcontroller PWM Filter Digital PWM Fig. 4.7 A Microcontroller The contrast between a microcontroller and a microprocessor is best exemplified by the fact that most microprocessors have many operation codes (opcodes) for moving data from external memory to the CPU; microcontrollers may have one or two. Microprocessors may have one or two types of bit-handling instructions; microcontrollers will have many. A basic Microprocessors vs a basic DSP
Fig. 4.8 The memory organization in a DSP DSP Characterization 1. Microprocessors specialized for signal processing applications 2. Harvard architecture 3. Two to Four memory accesses per cycle 4. Dedicated hardware performs all key arithmetic operations in 1 cycle Version 2 EE IIT, Kharagpur 9
5. Very limited SIMD(Single Instruction Multiple Data) features and Specialized, complex instructions 6. Multiple operations per instruction 7. Dedicated address generation units 8. Specialized addressing [ Auto-increment Modulo (circular) Bit-reversed ] 9. Hardware looping. 10. Interrupts disabled during certain operations 11. Limited or no register Shadowing 12. Rarely have dynamic features 13. Relatively narrow range of DSP oriented on-chip peripherals and I/O interfaces 14. synchronous serial port Processor Memory
Fig. 4.9 Memory Organization in General Purpose Processor Characterization of General Purpose Processor 1. CPUs for PCs and workstations E.g., Intel Pentium IV 2. Von Neumann architecture 3. Typically 1 access per cycle 4. Most operations take more than 1 cycle 5. General-purpose instructions Typically only one operation per instruction 6. Often, no separate address generation units 7. General-purpose addressing modes 8. Software loops only 9. Interrupts rarely disabled 10. Register shadowing common 11. Dynamic caches are common 12. Wide range of on-chip and off-chip peripherals and I/O interfaces 13. Asynchronous serial port...
Memory
Memory serves processor short and long-term information storage requirements while registers serve the processors short-term storage requirements. Both the program and the data are stored in the memory. This is known as Princeton Architecture where the data and program occupy the same memory. In Harvard Architecture the program and the data occupy separate Version 2 EE IIT, Kharagpur 10
memory blocks. The former leads to simpler architecture. The later needs two separate connections and hence the data and program can be made parallel leading to parallel processing. The general purpose processors have the Princeton Architecture. The memory may be Read-Only-Memory or Random Access Memory (RAM). It may exist on the same chip with the processor itself or may exist outside the chip. The on-chip memory is faster than the off-chip memory. To reduce the access (read-write) time a local copy of a portion of memory can be kept in a small but fast memory called the cache memory. The memory also can be categorized as Dynamic or Static. Dynamic memory dissipate less power and hence can be compact and cheaper. But the access time of these memories are slower than their Static counter parts. In Dynamic RAMs (or DRAM) the data is retained by periodic refreshing operation. While in the Static Memory (SRAM) the data is retained continuously. SRAMs are much faster than DRAMs but consume more power. The intermediate cache memory is an SRAM. In a typical processor when the CPU needs data, it first looks in its own data registers. If the data isn't there, the CPU looks to see if it's in the nearby Level 1 cache. If that fails, it's off to the Level 2 cache. If it's nowhere in cache, the CPU looks in main memory. Not there? The CPU gets it from disk. All the while, the clock is ticking, and the CPU is sitting there waiting.
Conclusion
Besides the above units some real time embedded systems may have specific circuits included on the same chip or circuit board. They are known as Application Specific Integrated Circuit (ASIC). Some examples are
3. Filters
Filters are used to condition the incoming signal by eliminating the out-band noise and other unnecessary signals. A specific class of filters called Anti-aliasing filters, are used before the AD converters to prevent aliasing while acquiring a broad-band signal (signal with a very wide frequency spectrum)
4. Controllers
These are specific circuits for controlling, motors, actuators and light-intensities etc.
Questions-Answers
Q1. Enumerate the similarities and differences between the Microcontroller and Digital Signal Processor Ans: Microcontrollers usually have on chip RAM and ROM (or EPROM) in addition to on chip i/o hardware to minimize chip count in single chip solutions. As a result of using on chip hardware for I/O and RAM and ROM they usually have pretty low performance CPU. Microcontrollers also often have timers that generate interrupts and can thus be used with the CPU and on chip A/D D/A or parallel ports to get regularly timed I/O. The prime use of a microcontroller is to control the operations of a machine using a fixed program that is stored in ROM and does not change over the lifetime of the system. The microcontroller is concerned with getting data from and to its own pins; the architecture and instruction set are optimized to handle data in bit and byte size. Digital Signal Processors have been designed based on the modified Harvard Architecture to handle real time signals. The features of these processors are suitable for implementing signal processing algorithms. One of the common operations required in such applications is array multiplication. For example convolution and correlation require array multiplication. This is accomplished by multiplication followed by accumulation and addition. This is generally carried out by Multiplier and Accumulator (MAC) units. Some times it is known as MACD, where D stands for Data move. Generally all the instructions are executed in single cycle. These DSP units generally use Multiple Access and Multi Ported Memory units. Multiple access memory allows more than one access in one clock period. The Multiported Memory allows multiple addresses as well Data ports. This also increases the number of access per unit clock cycle. Q2. Name few chips in each of the family of processors such as: Microcontroller, Digital Signal Processor, General Purpose Processor Ans: Microcontroller: Intel 8051, Intel 80196, Motorola 68705 Digital Signal Processors: TI 3206711, TI 3205000 General Purpose Processor: Intel Pentium IV, Power PC Q3. Enlist the following in the increasing order of their access speed Flash Memory, Dynamic Memory, Cache Memory, CDROM, Hard Disk, Magnetic Tape, Processor Memory Ans: Magnetic Tape, CDROM, Hard Disk, Dynamic Memory, Flash Memory, Cache Memory, Processor Memory
Q4. Draw the circuit of an anti-aliasing Filter using Operational amplifiers Ans:
Low Pass Sallen Key Butterworth Filter Q5. Is it possible to implement an anti-aliasing filter in the digital form? Ans: No it is not possible to implement an anti-aliasing filter in digital form. Because aliasing is an error introduced at the sampling phase of analog to digital converter. If the sampling frequency is less than twice of the highest frequency present the higher signal frequencies fold back to lower frequency band and hence can be distinguished in the digital/discrete domain. Q6. Download any free emulator of some simple microcontrollers such as 8051, 68705 etc and learn about it. Home work Q7. Draw the internal architecture of 8051 and explain the functions of various units. See http://www.atmel.com/products/8051/ Q8. State with justification if the following statements are right (or wrong) Cache memory can be a static RAM Dynamic RAMs occupy more space per word storage The full-form of SDRAM is static-dynamic RAM BIOS in your PC is not a Random Access Memory (RAM) Ans: Cache memory can be a static RAM right The cache memory need to have very fast access time which is possible with static RAM. Dynamic RAMs occupy more space per word storage wrong DRAMs are basically simple MOS based capacitors. Therefore occupy much lower space as compared to static RAMs. Version 2 EE IIT, Kharagpur 14
The full-form of SDRAM is static-dynamic RAM wrong SDRAM is Synchronous Dynamic RAM. Covered in later chapters BIOS in your PC is not a Random Access Memory (RAM) Wrong The BIOS is a CMOS based memory which can be accessed uniformly. Q9. Explain the function of the following units in a general purpose processor Instruction Register Program Counter Instruction Queue Control Unit Ans: Instruction Register: A register inside the CPU which holds the instruction code temporarily before sending it to the decoding unit. Program Counter: It is a register inside the CPU which holds the address of the next instruction code in a program. It gets updated automatically by the address generation unit. Instruction Queue: A set of memory locations inside the CPU to hold the instructions in a pipeline before rending them to the next instruction decoding unit. Control Unit: This is responsible in generating timing and control signals for various operations inside the CPU. It is very closely associated with the instruction decoding unit.
Module 2
Embedded Processors and Memory
Version 2 EE IIT, Kharagpur 1
Lesson 5
Memory-I
Version 2 EE IIT, Kharagpur 2
Instructional Objectives
After going through this lesson the student would o Different kinds of Memory Processor Memory Primary Memory Memory Interfacing
Pre-Requisite
Digital Electronics, Microprocessors
5.1 Introduction
This chapter shall describe about the memory. Most of the modern computer system has been designed on the basis of an architecture called Von-Neumann Architecture1
Memory
The Memory stores the instructions as well as data. No one can distinguish an instruction and data. The CPU has to be directed to the address of the instruction codes. The memory is connected to the CPU through the following lines 1. Address 2. Data 3. Control
http://en.wikipedia.org/wiki/John_von_Neumann. The so-called von Neumann architecture is a model for a computing machine that uses a single storage structure to hold both the set of instructions on how to perform the computation and the data required or generated by the computation. Such machines are also known as storedprogram computers. The separation of storage from the processing unit is implicit in this model. By treating the instructions in the same way as the data, a stored-program machine can easily change the instructions. In other words the machine is reprogrammable. One important motivation for such a facility was the need for a program to increment or otherwise modify the address portion of instructions. This became less important when index registers and indirect addressing became customary features of machine architecture.
Data Lines
CPU
Address Lines
Control Lines
Fig. 5.2 The Memory Interface
Memory
In a memory read operation the CPU loads the address onto the address bus. Most cases these lines are fed to a decoder which selects the proper memory location. The CPU then sends a read control signal. The data is stored in that location is transferred to the processor via the data lines. In the memory write operation after the address is loaded the CPU sends the write control signal followed by the data to the requested memory location. The memory can be classified in various ways i.e. based on the location, power consumption, way of data storage etc The memory at the basic level can be classified as 1. Processor Memory (Register Array) 2. Internal on-chip Memory 3. Primary Memory 4. Cache Memory 5. Secondary Memory
Primary Memory
This is the one which sits just out side the CPU. It can also stay in the same chip as of CPU. These memories can be static or dynamic.
Cache Memory
This is situated in between the processor and the primary memory. This serves as a buffer to the immediate instructions or data which the processor anticipates. There can be more than one levels of cache memory.
Secondary Memory
These are generally treated as Input/Output devices. They are much cheaper mass storage and slower devices connected through some input/output interface circuits. They are generally magnetic or optical memories such as Hard Disk and CDROM devices. The memory can also be divided into Volatile and Non-volatile memory.
Volatile Memory
The contents are erased when the power is switched off. Semiconductor Random Access Memories fall into this category.
Non-volatile Memory
The contents are intact even of the power is switched off. Magnetic Memories (Hard Disks), Optical Disks (CDROMs), Read Only Memories (ROM) fall under this category.
CPU
Control Unit
ALU
Registers
Input
Output
Memory
Fig. 5.3 The Internal Registers
m words
n bits per word Fig. 5.4 Data Array Version 2 EE IIT, Kharagpur 6
Memory access
The memory location can be accessed by placing the address on the address lines. The control lines read/write selects read or write. Some memory devices are multi-port i.e. multiple accesses to different locations simultaneously memory external view
r/w
enable
A0 Ak-1
Q0
Memory Specifications
The specification of a typical memory is as follows The storage capacity: The number of bits/bytes or words it can store The memory access time (read access and write access): How long the memory takes to load the data on to its data lines after it has been addressed or how fast it can store the data upon supplied through its data lines. This reciprocal of the memory access time is known as Memory
Bandwidth
The Power Consumption and Voltage Levels: The power consumption is a major factor in embedded systems. The lesser is the power consumption the more is packing density. Size: Size is directly related to the power consumption and data storage capacity.
Generation 1
Generation 3
Generation 4 Fig. 5.6 Four Generations of RAM chips There are two important specifications for the Memory as far as Real Time Embedded Systems are concerned. Write Ability Storage Performance
Write ability
It is the manner and speed that a particular memory can be written
Ranges of write ability High end processor writes to memory simply and quickly e.g., RAM Middle range processor writes to memory, but slower e.g., FLASH, EEPROM (Electrically Erasable and Programmable Read Only Memory) Lower range special equipment, programmer, must be used to write to memory e.g., EPROM, OTP ROM (One Time Programmable Read Only Memory) Low end bits stored only during fabrication e.g., Mask-programmed ROM In-system programmable memory Can be written to by a processor in the embedded system using the memory Memories in high end and middle range of write ability
Storage permanence
It is the ability to hold the stored bits. Range of storage permanence High end essentially never loses bits e.g., mask-programmed ROM Version 2 EE IIT, Kharagpur 8
Middle range holds bits days, months, or years after memorys power source turned off e.g., NVRAM Lower range holds bits as long as power supplied to memory e.g., SRAM
Low end begins to lose bits almost immediately after written e.g., DRAM Nonvolatile memory Holds bits after power is no longer supplied High end and middle range of storage permanence
Store constant data needed by system Implement combinational circuit External view enable A0
2k n ROM
Ak-1
Qn-1 Example
Q0
The figure shows the structure of a ROM. Horizontal lines represents the words. The vertical lines give out data. These lines are connected only at circles. If address input is 010 the decoder sets 2nd word line to 1. The data lines Q3 and Q1 are set to 1 because there is a programmed Version 2 EE IIT, Kharagpur 9
connection with word 2s line. The word 2 is not connected with data lines Q2 and Q0. Thus the output is 1010 Internal view 8 4 ROM
word 0 enable
38 decoder
A0 A1 A2
Q3 Q2 Q1 Q0 Fig. 5.8 The example of a ROM with decoder and data storage
82 ROM
0 0 0 1 1 1 1 1 y 0 1 1 0 0 1 1 1 z
word 0 word 1
enable c b a
word 7
Mask-programmed ROM
The connections programmed at fabrication. They are a set of masks. It can be written only once (in the factory). But it stores data for ever. Thus it has the highest storage permanence. The bits never change unless damaged. These are typically used for final design of high-volume systems. Version 2 EE IIT, Kharagpur 10
0V floating
+15V
(b)
(a)
(d)
5-30 min
(c)
EEPROM
EEPROM is otherwise known as Electrically Erasable and Programmable Read Only Memory. It is erased typically by using higher than normal voltage. It can program and erase individual words unlike the EPROMs where exposure to the UV light erases everything. It has Version 2 EE IIT, Kharagpur 11
can be in-system programmable with built-in circuit to provide higher than normal voltage
built-in memory controller commonly used to hide details from memory user busy pin indicates to processor EEPROM still writing
can be erased and programmed tens of thousands of times Similar storage permanence to EPROM (about 10 years) Far more convenient than EPROMs, but more expensive
Flash Memory
It is an extension of EEPROM. It has the same floating gate principle and same write ability and storage permanence. It can be erased at a faster rate i.e. large blocks of memory erased at once, rather than one word at a time. The blocks are typically several thousand bytes large Writes to single words may be slower
Entire block must be read, word updated, then entire block written back Used with embedded systems storing large data items in nonvolatile memory e.g., digital cameras, TV set-top boxes, cell phones
bits are not held without power supply Read and written to easily by embedded system during execution Internal structure more complex than ROM a word consists of several memory cells, each storing 1 bit each input and output data line connects to each cell in its column rd/wr connected to every cell when row is enabled by decoder, each cell has logic that stores input data bit when rd/wr indicates write or outputs stored bit when rd/wr indicates read
external view
Qn-1
Q0
A0 A1
Memory cell
rd/wr
To every cell Q3 Q2 Q Q
Holds data as long as power supplied DRAM: Dynamic RAM Memory cell uses MOS transistor and capacitor to store bit More compact than SRAM Refresh required due to capacitor leak
Typical refresh rate 15.625 microsec. Slower to access than SRAM Version 2 EE IIT, Kharagpur 13
SRAM
DRAM
Data' Data
Data W
Ram variations
PSRAM: Pseudo-static RAM
Popular low-cost high-density alternative to SRAM NVRAM: Nonvolatile RAM Holds data after external power removed Battery-backed RAM
SRAM with own permanently connected battery writes as fast as reads no limit on number of writes unlike nonvolatile ROM-based memory
SRAM with EEPROM or flash stores complete RAM contents on EEPROM or flash before power
RAM: 62
11-13, 15-19
27,26,2,23,21,
24,25, 3-10 22 20
device characteristics Read operation data addr OE /CS1 CS2 data addr WE /CS1 CS2 timing diagrams Write operation
device characteristics
/WE /ADSP /OE /ADSC MODE /ADV /ADSP /ADSC /ADV CLK TC55V2325 FF-100 block diagram timing diagram addr <150> /WE /OE /CS1 and /CS2 CS3 data<310>
A0 Am-1 Am
enable
2m n ROM
Qn-1
Q0
2m 3n ROM
enable
2m n ROM
2m n ROM
2m n ROM
A0 Am
Q3n-1
Q2n-1
Q0
5.7 Conclusion
In this chapter you have learnt about the following 1. Basic Memory types 2. Basic Memory Organization 3. Definitions of RAM, ROM and Cache Memory Version 2 EE IIT, Kharagpur 17
4. Difference between Static and Dynamic RAM 5. Various Memory Control Signals 6. Memory Specifications 7. Basics of Memory Interfacing
5.8 Questions
Q1. Ans:
11-13, 15-19 2,23,21,24, 25, 3-10 22 27 20 26 data<70> addr<15...0> /OE /WE /CS1 CS2 HM6264
Discuss the various control signals in a typical RAM device (say HM626)
/OE: output enable bar: the output is enables when it is low. It is same as the read bar line /WE: write enable bar: the line has to made low while writing to this device CS1: chip select 1 bar: this line has to be made low along with CS2 bar to enable this chip Q2. Download the datasheet of TC55V2325FF chip and indicate the various signals.
Module 2
Embedded Processors and Memory
Version 2 EE IIT, Kharagpur 1
Lesson 6
Memory-II
Version 2 EE IIT, Kharagpur 2
Instructional Objectives
After going through this lesson the student would Memory Hierarchy Cache Memory - Different types of Cache Mappings - Cache Impact on System Performance Dynamic Memory - Different types of Dynamic RAMs Memory Management Unit
Pre-Requisite
Digital Electronics, Microprocessors
Process Registers Cache Main memory Disk Tape Fig. 6.1 The memory Hierarchy
6.2 Cache
Usually designed with SRAM faster but more expensive than DRAM Usually on same chip as processor space limited, so much smaller than off-chip main memory faster access (1 cycle vs. several cycles for main memory) Cache operation Request for main memory access (read or write) First, check cache for copy cache hit - copy is in cache, quick access cache miss - copy not in cache, read address and possibly its neighbors into cache Several cache design choices cache mapping, replacement policies, and write techniques
Direct Mapping
Main memory address divided into 2 fields Index which contains - cache address - number of bits determined by cache size Tag - compared with tag stored in cache at address indicated by index - if tags match, check valid bit Valid bit indicates whether data in slot has been loaded from memory Offset used to find particular word in cache line
Tag
Index
Offset
T D
Data
Valid =
Valid = = =
Set-Associative Mapping
Compromise between direct mapping and fully associative mapping Index same as in direct mapping But, each cache address contains content and tags of 2 or more memory address locations Tags of that set simultaneously compared as in fully associative mapping Cache with set size N called N-way set-associative 2-way, 4-way, 8-way are common
Tag
V T
Index
D V T
Offset
D
Data
Valid = =
Total size of cache - total number of data bytes cache can hold - tag, valid and other house keeping bits not included in total Degree of associativity Data block size Larger caches achieve lower miss rates but higher access cost e.g., - 2 Kbyte cache: miss rate = 15%, hit cost = 2 cycles, miss cost = 20 cycles - avg. cost of memory access = (0.85 * 2) + (0.15 * 20) = 4.7 cycles 4 Kbyte cache: miss rate = 6.5%, hit cost = 3 cycles, miss cost will not change - avg. cost of memory access = (0.935 * 3) + (0.065 * 20) = 4.105 cycles (improvement) 8 Kbyte cache: miss rate = 5.565%, hit cost = 4 cycles, miss cost will not change - avg. cost of memory access = (0.94435 * 4) + (0.05565 * 20) = 4.8904 cycles
cache size
rd/ wr
Data
Addr Col
Col Decoder
ras, clock
Row Decod er
cas,
address
ras cas address data row col data col data col data
6.12 Question
Q1. Discuss different types of cache mappings.
Ans: Direct, Fully Associative, Set Associative Q2 Discuss the size of the cache memory on the system performance. Ans:
0.16 0.14 0.12 0.1 % cache miss 0.08 0.06 0.04 0.02 0 1 Kb 2 Kb 4 Kb 8 Kb 16 Kb 32 Kb 64 Kb 128 Kb 1 way 2 way 4 ways 8 way
cache size
EDO RAM
ras cas address data
row col data col data col data
SDRAM
clock ras cas address data
row col data data data
Module 2
Embedded Processors and Memory
Version 2 EE IIT, Kharagpur 1
Lesson 7
Digital Signal Processors
Version 2 EE IIT, Kharagpur 2
Instructional Objectives
After going through this lesson the student would learn o Architecture of a Real time Signal Processing Platform o Different Errors introduced during A-D and D-A converter stage o Digital Signal Processor Architecture o Difference in the complexity of programs between a General Purpose Processor and Digital Signal Processor
Pre-Requisite
Digital Electronics, Microprocessors
Introduction
Evolution of Digital Signal Processors Comparative Performance with General Purpose Processor
7.1 Introduction
Digital Signal Processing deals with algorithms for handling large chunk of data. This branch identified itself as a separate subject in 70s when engineers thought about processing the signals arising from nature in the discrete form. Development of Sampling Theory followed and the design of Analog-to-Digital converters gave an impetus in this direction. The contemporary applications of digital signal processing was mainly in speech followed by Communication, Seismology, Biomedical etc. Later on the field of Image processing emerged as another important area in signal processing. The following broadly defines different processor classes General Purpose - high performance Pentiums, Alpha's, SPARC Used for general purpose software Heavy weight OS - UNIX, NT Workstations, PC's Embedded processors and processor cores ARM, 486SX, Hitachi SH7000, NEC V800 Single program Lightweight, real-time OS DSP support Cellular phones, consumer electronics (e. g. CD players) Microcontrollers Extremely cost sensitive Small word size - 8 bit common Highest volume processors by far Automobiles, toasters, thermostats, ... Version 2 EE IIT, Kharagpur 3
A Digital Signal Processor is required to do the following Digital Signal Processing tasks in real time Signal Modeling Difference Equation Convolution Transfer Function Frequency Response Signal Processing Data Manipulation Algorithms Filtering Estimation What is Digital Signal Processing? Application of mathematical operations to digitally represented signals Signals represented digitally as sequences of samples Digital signals obtained from physical signals via transducers (e.g., microphones) and analog-to- digital converters (ADC) Digital signals converted back to physical signals via digital-to-analog converters (DAC) Digital Signal Processor (DSP): electronic system that processes digital signals Signal Processing Analog Processing Analog Processing Measurand Sensor Conditioner Analog Processor LPF ADC
Fig. 7.1 The basic Signal Processing Platform The above figure represents a Real Time digital signal processing system. The measurand can be temperature, pressure or speech signal which is picked up by a sensor (may be a thermocouple, microphone, a load cell etc). The conditioner is required to filter, demodulate and amplify the signal. The analog processor is generally a low-pass filter used for anti-aliasing effect. The ADC block converts the analog signals into digital form. The DSP block represents the signal processor. The DAC is for Digital to Analog Converter which converts the digital signals into Version 2 EE IIT, Kharagpur 4
analog form. The analog low-pass filter eliminates noise introduced by the interpolation in the DAC.
x (t )
Sampler
xs ( t ) p(t ) x ( n)
ADC Quantizer
xq ( t ) xq ( n )
Coder
bbits
xb ( n )
DAC bbits
xb ( n )
Decoder
Sample/hold
y ( n)
Fig. 7.2 D-A and A-D Conversion Process The performance of the signal processing system depends to the large extent on the ADC. The ADC is specified by the number of bits which defines the resolution. The conversion time decides the sampling time. The errors in the ADC are due to the finite number of bits and finite conversion time. Some times the noise may be introduced by the switching circuits. Similarly the DAC is represented by the number of bits and the settling time at the output. A DSP tasks requires Repetitive numeric computations Attention to numeric fidelity High memory bandwidth, mostly via array accesses Real-time processing And the DSP Design should minimize Cost Power Memory use Development time Take an Example of FIR filtering both by a General Purpose Processor as well as DSP
x (k )
y (k )
y ( k ) = ( h0 + h1 z 1 + h2 z 2 + L + hN 1 z N 1 ) x ( k ) = h0 x ( k ) + h1 x ( k 1) + h2 x ( k 2 ) + L + hN 1 x ( k N + 1) = hi x ( k i ) = h ( k ) * x ( k )
i =0 N 1
An FIR (Finite Impulse Response filter) is represented as shown in the following figure. The output of the filter is a linear combination of the present and past values of the input. It has several advantages such as: Linear Phase Stability Improved Computational Time
x (k)
1
z-1 z
-1
h0 h1 h2
y (k)
z-1
hN -1
tst ctr jnz loop sw b,(r2) inc r2 This program assumes that the finite window of input signal is stored at the memory location starting from the address specified by r1 and the equal number filter coefficients are stored at the memory location starting from the address specified by r0. The result will be stored at the memory location starting from the address specified by r2. The program assumes the content of the register b as 0 before the start of the loop. lw x0, (r0) lw y0, (r1) These two instructions load x0 and y0 registers with values from the memory location specified by the registers r0 and r1 with values x0 and y0 mul a, x0,y0 This instruction multiplies x0 with y0 and stores the result in a. add b,a,b This instruction adds a with b (which contains already accumulated result from the previous operation) and stores the result in b. inc r0 inc r1 dec ctr tst ctr jnz loop The above portion of the program increment the registers to point to the next memory location, decrement the counters, to see if the filter order has been reached and tests for 0. It jumps to the start of the loop. sw b,(r2) inc r2 This stores the final result and increments the register r2 to point to the next location.
Let us see the program for an early DSP TMS32010 developed by Texas
Instruments in 80s. It has got the following features 16-bit fixed-point Harvard architecture separate instruction and data memories Accumulator Version 2 EE IIT, Kharagpur 7
Instruction Memory Processor Data Memory Datapath: Mem T-Register Multiplier ALU Accumulator
Fig. 7.4 Basic TMS32010 Architecture The program for the FIR filter (for a 3rd order) is given as follows Here X4, H4, ... are direct (absolute) memory addresses: LT X4 ;Load T with x(n-4) MPY H4 ;P = H4*X4 ;Acc = Acc + P LTD X3 ;Load T with x(n-3); x(n-4) = x(n-3); MPY H3 ; P = H3*X3 ; Acc = Acc + P LTD X2 MPY H2 ... Two instructions per tap, but requires unrolling ; for comment lines
P-Register
LT X4 Loading from direct address X4 MPY H4 Multiply and accumulate LTD X3 Loading and shifting in the data points in the memory The advantages of the DSP over the General Purpose Processor can be written as Multiplication and Accumulation takes place at a time. Therefore this architecture supports filtering kind of tasks. The loading and subsequent shifting is also takes place at a time. II. Questions 1. Discuss the different errors introduced in a typical real time signal processing systems. Answers Various errors are in ADC i. Sampling error ii. Quantization iii. Coding Algorithm iv. in accurate modeling v. Finite word length vi. Round of errors vii. Delay due to finite execution time of the processor DAC viii. Decoding ix. Transients in sampling time
Module 2
Embedded Processors and Memory
Version 2 EE IIT, Kharagpur 1
Lesson 8
General Purpose Processors - I
Version 2 EE IIT, Kharagpur 2
In this lesson the student will learn the following Architecture of a General Purpose Processor Various Labels of Pipelines Basic Idea on Different Execution Units Branch Prediction
Pre-requisite
Digital Electronics
8.1 Introduction
The first single chip microprocessor came in 1971 by Intel Corporation. It was called Intel 4004 and that was the first single chip CPU ever built. We can say that was the first general purpose processor. Now the term microprocessor and processor are synonymous. The 4004 was a 4-bit processor, capable of addressing 1K data memory and 4K program memory. It was meant to be used for a simple calculator. The 4004 had 46 instructions, using only 2,300 transistors in a 16pin DIP. It ran at a clock rate of 740kHz (eight clock cycles per CPU cycle of 10.8 microseconds). In 1975, Motorola introduced the 6800, a chip with 78 instructions and probably the first microprocessor with an index register. In 1979, Motorola introduced the 68000. With internal 32-bit registers and a 32-bit address space, its bus was still 16 bits due to hardware prices. On the other hand in 1976, Intel designed 8085 with more instructions to enable/disable three added interrupt pins (and the serial I/O pins). They also simplified hardware so that it used only +5V power, and added clock-generator and bus-controller circuits on the chip. In 1978, Intel introduced the 8086, a 16-bit processor which gave rise to the x86 architecture. It did not contain floating-point instructions. In 1980 the company released the 8087, the first math coprocessor they'd developed. Next came the 8088, the processor for the first IBM PC. Even though IBM engineers at the time wanted to use the Motorola 68000 in the PC, the company already had the rights to produce the 8086 line (by trading rights to Intel for its bubble memory) and it could use modified 8085-type components (and 68000-style components were much more scarce). Table 1 Development History of Intel Microprocessors Intel Processor 4004 8008 8080 8086 8088 Intel286TM Intel386TM Intel486TM PentiumTM PentiumTM Pro PentiumTM II Year of Introduction 1971 1972 1974 1978 1979 1982 1985 1989 1993 1995 1997 Initial Clock Speed 108 kHz 500-800 KHz 2 MHz 5 MHz 5 MHz 6 MHz 16 MHz 25 MHz 66 MHz 200 MHz 300 MHz Number of Transistors 2300 3500 4500 29000 29000 134,000 275,000 1.2 Million 3.1 Million 5.5 Million 7.5 Million Circuit Line Width 10 micron 10 micron 6 micron 3 micron 3 micron 1.5 micron 1.5 micron 1 Micron 0.8 Micron 0.35 Micron 0.25 Micron
266 MHz 500 MHz 1.5MHz 800 MHz 1.7 GHz 1 GHz 1.5 GHz
7.5 Million 9.5 Million 42 Million 25 Million 42 million 220 million 140 Million
0.25 Micron 0.25 Micron 0.18 Micron 0.18 Micron 0.18 micron 0.18 micron 90 nm
The development history of Intel family of processors is shown in Table 1. The Very Large Scale Integration (VLSI) technology has been the main driving force behind the development.
Fig. 8.2 The photograph The photograph and architecture of a modern general purpose processor from VIA (C3) (please refer lesson on Embedded components 2) is shown in Fig2 and Fig. 8.3 respectively.
I-Cache 64 KB 4-way
&
I-TLB
I I-Fetch
Return stack 3 BHTs Branch Prediction BTB Bus Unit L2 cache 64 Kb 4-way
decode buffer Decode 4-entry inst Q Translate 4-entry inst Q Register File address calculation ROM R A D G Execute Integer ALU Store-Branch Write back E S W MMX/ 3D unit FP Q FP unit X F Decode & Translate
Store Buffers
Write Buffers
Specification
Name: VIA C3TM in EBGA: VIA C3 is the name of the company and EBGA for Enhanced Ball Grid Array, clock speed is 1 GHz Ball Grid Array. (Sometimes abbreviated BG.) A ball grid array is a type of microchip connection methodology. Ball grid array chips typically use a group of solder dots, or balls, Version 2 EE IIT, Kharagpur 5
arranged in concentric rectangles to connect to a circuit board. BGA chips are often used in mobile applications where Pin Grid Array (PGA) chips would take up too much space due to the length of the pins used to connect the chips to the circuit board.
SIMM DIP
PGA
The Architecture
The processor has a 12-stage integer pipe lined structure: Pipe Line: This is a very important characteristic of a modern general purpose processor. A program is a set of instructions stored in memory. During execution a processor has to fetch these instructions from the memory, decode it and execute them. This process takes few clock cycles. To increase the speed of such processes the processor divide itself into different units. While one unit gets the instructions from the memory, another unit decodes them and some other unit executes them. This is called pipelining. This can be termed as segmenting a functional unit such that it can accept new operands every cycle while the total execution of the instruction may take many cycles. The pipeline construction works like a conveyor belt accepting units until the pipeline is filled and than producing results every cycle. The above processors has got such a pipeline divided into 12stages There are four major functional groups: I-fetch, decode and translate, execution, and data cache. The I-fetch components deliver instruction bytes from the large I-cache or the external bus. The decode and translate components convert these instruction bytes into internal execution forms. If there is any branching operation in the program it is identified here and the processor starts getting new instructions from a different location. The execution components issue, execute, and retire internal instructions Version 2 EE IIT, Kharagpur 7
The data cache components manage the efficient loading and storing of execution data to and from the caches, bus, and internal components
I-Cache
64 KB 4-way
&
I-TLB
128-ent 8-way 8-ent PDC
I B V
First three pipeline stages (I, B, V) deliver aligned instruction data from the I-cache (Instruction Cache) or external bus into the instruction decode buffers. The primary I-cache contains 64 KB organized as four-way set associative with 32-byte lines. The associated large I-TLB(Instruction Translation Look-aside Buffer) contains 128 entries organized as 8-way set associative. TLB: translation look-aside buffer a table in the processors memory that contains information about the pages in memory the processor has accessed recently. The table cross-references a programs virtual addresses with the corresponding absolute addresses in physical memory that the program has most recently used. The TLB enables faster computing because it allows the address processing to take place independent of the normal address-translation pipeline. The instruction data is predecoded as it comes out of the cache; this predecode is overlapped with other required operations and, thus, effectively takes no time. The fetched instruction data is placed sequentially into multiple buffers. Starting with a branch, the first branch-target byte is left adjusted into the instruction decode buffer.
BTB
Branch Prediction
predec Return stack 3 BHTs Decode buffer
Tran
Integer Unit
BT Bus Unit L2
64 Kb 4-way
Register address calculation D-Cache & D-TLB - 64 KB - 128-ent 8-way 4 way 8-ent PDC
R A D G
E S W
Write Buffers Fig. 8.10 Decode stage (R): Micro-instructions are decoded, integer register files are accessed and resource dependencies are evaluated. Addressing stage (A): Memory addresses are calculated and sent to the D-cache (Data Cache). Version 2 EE IIT, Kharagpur 10
Cache Access stages (D, G): The D-cache and D-TLB (Data Translation Look aside Buffer) are accessed and aligned load data returned at the end of the G-stage. Execute stage (E): Integer ALU operations are performed. All basic ALU functions take one clock except multiply and divide. Store stage (S): Integer store data is grabbed in this stage and placed in a store buffer. Write-back stage (W): The results of operations are committed to the register file.
R A D G
D-Cache
&
- 64 KB 4-way
Integer ALU
Fig. 8.11 The D-cache contains 64 KB organized as four-way set associative with 32-byte lines. The associated large D-TLB contains 128 entries organized as 8-way set associative. The cache, TLB, and page directory cache all use a pseudo-LRU (Least Recently Used) replacement algorithm
- 64 KB 4-way
Fig. 8.12 The L2 cache at any point in time are not contained in the two 64-KB L1 caches. As lines are displaced from the L1 caches (due to bringing in new lines from memory), the displaced lines are placed in the L2 cache. Thus, a future L1-cache miss on this displaced line can be satisfied by returning the line from the L2 cache instead of having to access the external memory.
E S W
Fig. 8.13 FP; Floating Point Processing Unit MMX: Multimedia Extension or Matrix Math Extension Unit Version 2 EE IIT, Kharagpur 12
3D: Special set of instructions for 3D graphics capabilities In addition to the integer execution unit, there is a separate 80-bit floating-point execution unit that can execute floating-point instructions in parallel with integer instructions. Floating-point instructions proceed through the integer R, A, D, and G stages. Floating-point instructions are passed from the integer pipeline to the FP-unit through a FIFO queue. This queue, which runs at the processor clock speed, decouples the slower running FP unit from the integer pipeline so that the integer pipeline can continue to process instructions overlapped with FP instructions. Basic arithmetic floating-point instructions (add, multiply, divide, square root, compare, etc.) are represented by a single internal floating-point instruction. Certain little-used and complex floating point instructions (sin, tan, etc.), however, are implemented in microcode and are represented by a long stream of instructions coming from the ROM. These instructions tie up the integer instruction pipeline such that integer execution cannot proceed until they complete. This processor contains a separate execution unit for the MMX-compatible instructions. MMX instructions proceed through the integer R, A, D, and G stages. One MMX instruction can issue into the MMX unit every clock. The MMX multiplier is fully pipelined and can start one nondependent MMX multiply[-add] instruction (which consists of up to four separate multiplies) every clock. Other MMX instructions execute in one clock. Multiplies followed by a dependent MMX instruction require two clocks. Architecturally, the MMX registers are the same as the floating-point registers. However, there are actually two different register files (one in the FPunit and one in the MMX units) that are kept synchronized by hardware. There is a separate execution unit for some specific 3D instructions. These instructions provide assistance for graphics transformations via new SIMD(Single Instruction Multiple Data) singleprecision floating-point capabilities. These instruction-codes proceed through the integer R, A, D, and G stages. One 3D instruction can issue into the 3D unit every clock. The 3D unit has two single-precision floating-point multipliers and two single-precision floating-point adders. Other functions such as conversions, reciprocal, and reciprocal square root are provided. The multiplier and adder are fully pipelined and can start any non-dependent 3D instructions every clock.
8.3 Conclusion
This lesson discussed about the architecture of a typical modern general purpose processor(VIA C3) which similar to the x86 family of microprocessors in the Intel family. In fact this processor uses the same x86 instruction set as used by the Intel processor. It is a pipelined architecture. The General Purpose Processor Architecture has the following characteristics Multiple Stages of Pipeline More than one Level of Cache Memory Branch Prediction Mechanism at the early stage of Pipe Line Separate and Independent Processing Units (Integer Floating Point, MMX, 3D etc) Because of the uncertainties associated with Branching the overall instruction execution time is not fixed (therefore it is not suitable for some of the real time applications which need accurate execution speed) It handles a very complex instruction set The over all power consumption because of the complexity of the processor is higher In the next lesson we shall discuss the signals associated with such a processor. Version 2 EE IIT, Kharagpur 13
Answers
Q1. Intel P4 Net-Burst architecture System Bus Frequently used paths Less frequently used paths
Bus Unit
3rd Level Cache Optional 2nd Level Cache 8-Way Front End Fetch/Decode Trace Cache Microcode ROM Execution Out-Of-Order Core Retirement 1st Level Cache 4-Way
BTBs/Branch Prediction
Q.2 Superscalar architecture refers to the use of multiple execution units, to allow the processing of more than one instruction at a time. This can be thought of as a form of "internal multiprocessing", since there really are multiple parallel processors inside the CPU. Most modern processors are superscalar; some have more parallel execution units than others. It can be said to consist of multiple pipelines. Q3. Some MMX instructions from x86 family MOVQ Move quadword PUNPCKHWD Unpack high-order words PADDUSW Add packed unsigned word integers with unsigned saturation They also can be SIMD instructions. Q4. (a) (b) (c) Look Up Table Taylor Series From the complex exponential
Q5. This is done by averaging the instruction execution in various programming models which includes latency and overhead. This is a statistical measure. Q6. All x86 family instructions will work. Q7. around 7.5 watts Q8. Parameter VIL Input Low Voltage VIH1.5 Input High Voltage VIH2.5 Input High Voltage VOL Low Level Output Voltage VOH High Level Output Voltage IOL Low Level Output Current ILI Input Leakage Current ILO Output Leakage Current Q9. Refer Text Q10. Refer Text Min -0.58 VREF + 0.2 2.0 Max 0.700 VTT 3.18 0.40 VCMOS 100 100 Units V V V V V mA A A Notes (2) (3) @IOL (1) @VCL
Module 2
Embedded Processors and Memory
Version 2 EE IIT, Kharagpur 1
Lesson 9
General Purpose Processors - II
Version 2 EE IIT, Kharagpur 2
Signals
In this lesson the student will learn the following Signals of a General Purpose Processor Multiplexing Address Signals Data Signals Control Bus Arbitration Signals Status Signal Indicators Sleep State Indicators Interrupts
Pre-requisite
Digital Electronics
9.1 Introduction
The input/output signals of a processor chip are the matter discussion in this chapter. We shall take up the same VIA C3 processor as discussed in the last chapter. In the design flow of a processor the internal architecture is determined and simulated for optimal performance.
ASIC HW FLOW
SW TOOLS FLOW
Fig. 9.1 The overall design flow for a typical processor Version 2 EE IIT, Kharagpur 3
The basic architecture decides the signals. Broadly the signals can be classified as: 1. Address Signals 2. Data Signals 3. Control Signals 4. Power Supply Signals Some of these signals are multiplexed in time for making the VLSI design easier and efficient without affecting the over all performance.
Inquire Cycles: These are bus cycles, initiated by external logic, that cause the processor to look up an address in its physical cache tags.
Internal Snooping: These are internal actions by the processor (rather than external logic) that are taken during certain types of cache accesses in order to detect selfmodifying code. Bus Watching: Some caching devices watch their address and data bus continuously while they are held off the bus, comparing every address driven by another bus master with their internal cache tags and optionally updating their cached lines on the fly, during write backs by the other master.
A20M# A20 Mask causes the CPU to make (force to 0) the A20 address bit when driving the external address bus or performing an internal cache access. A20M# is provided to emulate the 1 M Byte address wrap-around that occurs on the x86. Snoop addressing is not affected. It is an input signal. If it is not used then it is connected to the power supply. This is not synchronized with the Bus Clock or anything. ADS# Address Strobe begins a memory/I/O cycle and indicates the address bus (A31#-A3#) and transaction request signals (REQ#) are valid. This is an output signal during addressing cycle and an input/output signal during transaction request cycles. This is synchronized with the bus clock. Memory /I/O cycle: The memory and input output data transfer (read or write) is carried out in different clock cycles. The address is first loaded on the address bus. The processor being faster waits till the memory or input/output is ready to send or receive the date through the data bus. Normally it takes more than one clock cycle. Transaction Request Cycle: When the external device request the CPU to transmit data. The request comes through this line. BCLK Bus Clock: provides the fundamental timing for the CPU. The frequency of the input clock determines the operating frequency of the CPUs bus. External timing is defined referenced to the rising edge of CLK. It is an Input clock signal. BNR# Block Next Request: signals a bus stall by a bus agent unable to accept new transactions. This is an input or output signal and is synchronized with the bus clock. BPRI# Priority Agent Bus Request arbitrates for ownership of the system bus. Input and is synchronized with the Bus clock. Bus Arbitration: At times external devices signal the processor to release the system address/data/control bus from its control. This is achieved by an external request which normally comes from the external devices such as a DMA controller or a Coprocessor. BR[4:0]: Hardware strapping options for setting the processors internal clock multiplier. By strapping these wires to the supply or ground (some times they can be kept open for making them 1). This option divides the input clock. BSEL[1:0]: Bus frequency select balls (BSEL 0 and BSEL 1) identify the appropriate bus speed (100 MHz or 133 MHz). It is an output signal. BR0#: It drives the BREQ[0]# signal in the system to request access to the system bus. D[63:0]#: Data Bus signals are bi-directional signals which provide the data path between the CPU and external memory and I/O devices. The data bus must assert DRDY# to indicate valid data transfer. This is both input as well as output. Version 2 EE IIT, Kharagpur 6
DBSY#: Data Bus Busy is asserted by the data bus driver to indicate data bus is in use. This is both input as well as output. DEFER#: Defer is asserted by target agent and indicates the transaction cannot be guaranteed as an in-order completion. This is an input signal. DRDY#: Data Ready is asserted by data driver to indicate that a valid signal is on the data bus. This is both input and output signal. FERR#: FPU Error Status indicates an unmasked floating-point error has occurred. FERR# is asserted during execution of the FPU instruction that caused the error. This is an output signal. FLUSH#: Flush Internal Caches writing back all data in the modified state. This is an input signal to the CPU. HIT#: Snoop Hit indicates that the current cache inquiry address has been found in the cache. This is both input as well as output signal. HITM#: Snoop Hit Modified indicates that the current cache inquiry address has been found in the cache and dirty data exists in the cache line (modified state). (both input/output) INIT#: Initialization resets integer registers and does not affect internal cache or floating point registers. (Input) INTR: Maskable Interrupt I. This is an input signal to the CPU. NMI: Non-Maskable Interrupt I LOCK#: Lock Status is used by the CPU to signal to the target that the operation is atomic. An atomic operation is any operation that a CPU can perform such that all results will be made visible to each CPU at the same time and whose operation is safe from interference by other CPUs. For example, reading or writing a word of memory is an atomic operation. NCHCTRL: The CPU uses this ball to control integrated I/O pull-ups. A resistance is to be connected here to control the current on the input/output pins. PWRGD (power good) Indicates that the processors VCC is stable. It is an input signal. REQ[4:0]#: Request Command is asserted by bus driver to define current transaction type. RESET#: This is an input that resets the processor and invalidates internal cache without writing back. RTTCTRL: The CPU uses this ball to control the output impedance. RS[2:0]#: Response Status is an input that signals the completion status of the current transaction when the CPU is the response agent. SLP#: Sleep when asserted in the stop grant state, causes the CPU to enter the sleep state. Version 2 EE IIT, Kharagpur 7
9.3 Conclusion
In this chapter the various signals of a typical general purpose processor has been discussed. Broadly we can classify them into the following categories. Address Signals: They are used to address the memory as well as input/output devices. They are often multiplexed with other control signals. In such cases External Bus controllers latch these address lines and make them available for a longer time for the memory and input/output devices while the CPU changes the status of the same. The Bus controllers drive their inputs which are Version 2 EE IIT, Kharagpur 8
connected to the CPU to high impedance so as not to interfere with the current state of these lines from the CPU. Data Signals: These lines carry the data to and fro the processor and memory or i/o devices. Transceivers are connected on the data path to control the data flow. The data flow might succeed some bus transaction signals. This bus transaction signals are necessary to negotiate the speed mismatch between the input/output and the processor. Control Signals: These can be generally divided into the following groups Read Write Control Memory Write The processor issues this signal while sending data to the memory Memory Read The processor issues this signal while reading the data from the memory I/O Read The input/output read signal which is generally preceded by some bus transaction signals I/O Write The input/output read signal which is generally succeeded by some bus transaction signals These read write signals are not generally directly available from the CPU. They are decoded from a set of status signal by an external bus controller.
A bus transaction includes two parts: sending the address and receiving or sending the data. Master is the one who starts the bus transaction by sending the address. Slave is the one who responds to the address by sending data to the master if the master asks for data and receiving data from master if master wants to send data. These are controlled by signals like Ready, Defer etc.
Bus Slave
This is known as requesting to obtain the access to a bus. They are achieved by the following lines. Bus Request: The slave requests for the access grant Bus Grant: Gets the grant signal Lock: For specific operations the bus requests are not granted as the CPU might be doing some important operations. Version 2 EE IIT, Kharagpur 9
Interrupt Control
In a multitasking environment the Interrupts are external signals to the CPU for emergency operations. The CPU executes the interrupt service routines while acknowledging the interrupts. The interrupts are processed according to their priority. More discussion is available in subsequent lessons.
Processor Control
These lines are activated when there is a power on or the processor comes up from a powersaving mode such as sleep. These are Reset Test lines etc. Some of the above signals will be discussed in the subsequent lessons.
Ans: It is called Power On Self Test. This is a routine executed to check the proper functioning of Hard Disk, CDROM, Floppy Disk and many other on-board and off-board components while the computer is powered on. Q3. Describe the various power-saving modes in a general purpose CPU?
Ans: Refer to: Sleep Mode in Text Q4. What could be the differences in design of a processor to be used in the following applications? LAPTOP Desktop Motor Control Ans: LAPTOP processor: should be complex General Purpose Processor with low power consumption and various power saving modes. Desktop: High Performance processor which has no limit on power consumption. Motor Control: Simple low power specialized processor with on-chip peripherals with Real Time Operating System. Q5. What is the advantage of reducing the High state voltage from 5 V to 3.5 volts? What are the disadvantages? Version 2 EE IIT, Kharagpur 10
Ans: It reduces the interference but decreases the noise accommodation. Q6. What is the use of Power-Good signal? Ans: It is used to know the quality of supply in side the CPU. If it is not good there may maloperations and data loss.
Module 2
Embedded Processors and Memory
Version 2 EE IIT, Kharagpur 1
Lesson 10
Embedded Processors - I
Version 2 EE IIT, Kharagpur 2
In this lesson the student will learn the following Architecture of an Embedded Processor The Architectural Overview of Intel MCS 96 family of Microcontrollers
Pre-requisite
Digital Electronics
10.1 Introduction
It is generally difficult to draw a clear-cut boundary between the class of microcontrollers and general purpose microprocessors. Distinctions can be made or assumed on the following grounds. Microcontrollers are generally associated with the embedded applications Microprocessors are associated with the desktop computers Microcontrollers will have simpler memory hierarchy i.e. the RAM and ROM may exist on the same chip and generally the cache memory will be absent. The power consumption and temperature rise of microcontroller is restricted because of the constraints on the physical dimensions. 8-bit and 16-bit microcontrollers are very popular with a simpler design as compared to large bit-length (32-bit, 64-bit) complex general purpose processors.
However, recently, the market for 32-bit embedded processors has been growing. Further the issues such as power consumption, cost, and integrated peripherals differentiate a desktop CPU from an embedded processor. Other important features include the interrupt response time, the amount of on-chip RAM or ROM, and the number of parallel ports. The desktop world values processing power, whereas an embedded microprocessor must do the job for a particular application at the lowest possible cost.
ROM
EEPROM RAM
Microprocessor A/D Analog I/O D/A Input and output ports Input and output ports
ROM
EEPROM RAM
Analog in
Serial I/O
Parallel I/O
PWM Microcontroller
Filter
Digital PWM
Fig. 10.2 Microprocessor versus microcontroller Fig. 10.1 shows the performance cost plot of the available microprocessors. Naturally the more is the performance the more is the cost. The embedded controllers occupy the lower left hand corner of the plot. Fig.10.2 shows the architectural difference between two systems with a general purpose microprocessor and a microcontroller. The hardware requirement in the former system is more than that of later. Separate chips or circuits for serial interface, parallel interface, memory and AD-DA converters are necessary On the other hand the functionality, flexibility and the complexity of information handling is more in case of the former. Version 2 EE IIT, Kharagpur 5
Core
Optional ROM
Interrupt Controller
PTS
I/O
EPA
PWM
WG
A/D
WDT
FG
SIO
Fig. 10.3 The Architectural Block diagram of Intel 8XC196 Microcontroller PTS: Peripheral Transaction Server; I/O: Input/Output Interface; EPA: Event Processor Array; PWM: Pulse Width Modulated Outputs; WG: Waveform Generator; A/D- Analog to Digital Converter; FG: Frequency Generator; SIO: Serial Input/Output Port Fig. 10.3 shows the functional block diagram of the microcontroller. The core of the microcontroller consists of the central processing unit (CPU) and memory controller. The CPU contains the register file and the register arithmetic-logic unit (RALU). A 16-bit internal bus connects the CPU to both the memory controller and the interrupt controller. An extension of this bus connects the CPU to the internal peripheral modules. An 8-bit internal bus transfers instruction bytes from the memory controller to the instruction register in the RALU.
CPU Register File RALU Microcode Engine Register RAM ALU Master PC PSW CPU SFRs Registers
Bus Controller
Fig. 10.4 The Architectural Block diagram of the core CPU: Central Processing Unit; RALU: Register Arithmetic Logic Unit; ALU: Arithmetic Logic Unit; Master PC: Master Program Counter; PSW: Processor Status Word; SFR: Special Function Registers
CPU Control
The CPU is controlled by the microcode engine, which instructs the RALU to perform operations using bytes, words, or double-words from either the 256-byte lower register file or through a window that directly accesses the upper register file. Windowing is a technique that maps blocks of the upper register file into a window in the lower register file. CPU instructions move from the 4-byte prefetch queue in the memory controller into the RALUs instruction register. The microcode engine decodes the instructions and then generates the sequence of events that cause desired functions to occur.
Register File
The register file is divided into an upper and a lower file. In the lower register file, the lowest 24 bytes are allocated to the CPUs special-function registers (SFRs) and the stack pointer, while the remainder is available as general-purpose register RAM. The upper register file contains only general-purpose register RAM. The register RAM can be accessed as bytes, words, or double words. The RALU accesses the upper and lower register files differently. The lower register file is always directly accessible with direct addressing. The upper register file is accessible with direct addressing only when windowing is enabled.
Code Execution
The RALU performs most calculations for the microcontroller, but it does not use an accumulator. Instead it operates directly on the lower register file, which essentially provides 256 accumulators. Because data does not flow through a single accumulator, the microcontrollers code executes faster and more efficiently.
Instruction Format
These microcontrollers combine general-purpose registers with a three-operand instruction format. This format allows a single instruction to specify two source registers and a separate destination register. For example, the following instruction multiplies two 16-bit variables and stores the 32-bit result in a third variable.
When the bus controller receives a request from the queue, it fetches the code from the address contained in the slave PC. The slave PC increases execution speed because the next instruction byte is available immediately and the processor need not wait for the master PC to send the address to the memory controller. If a jump interrupt, call, or return changes the address sequence, the master PC loads the new address into the slave PC, then the CPU flushes the queue and continues processing.
Interrupt Service
The interrupt-handling system has two main components: the programmable interrupt controller and the peripheral transaction server (PTS). The programmable interrupt controller has a hardware priority scheme that can be modified by the software. Interrupts that go through the interrupt controller are serviced by interrupt service routines those are provided by you. The peripheral transaction server (PTS) which is a microcoded hardware interrupt-processor provides efficient interrupt handling.
Disable Clock Input (Powerdown) XTAL 1 FXTAL 1 Divide-by-two Circuit Disable Clocks (Powerdown) XTAL 2 Disable Oscillator (Powerdown) Clock Generators Peripheral Clocks (PH1, PH2) CLKOUT CPU Clocks (PH1, PH2) Disable Clocks (Idle, Powerdown)
Internal Timing
The clock circuitry (Fig. 10.5) receives an input clock signal on XTAL1 provided by an external crystal or oscillator and divides the frequency by two. The clock generators accept the divided input frequency from the divide-by-two circuit and produce two non-overlapping internal timing signals, Phase 1(PH1) and Phase 2 (PH2). These signals are active when high.
1 State Time PH 1
Fig. 10.6 The internal clock phases The rising edges of PH1 and PH2 generate the internal CLKOUT signal (Fig. 10.6). The clock circuitry routes separate internal clock signals to the CPU and the peripherals to provide flexibility in power management. Because of the complex logic in the clock circuitry, the signal on the CLKOUT pin is a delayed version of the internal CLKOUT signal. This delay varies with temperature and voltage.
I/O Ports
Individual I/O port pins are multiplexed to serve as standard I/O or to carry special function signals associated with an on-chip peripheral or an off-chip component. If a particular specialfunction signal is not used in an application, the associated pin can be individually configured to serve as a standard I/O pin. Ports 3 and 4 are exceptions; they are controlled at the port level. When the bus controller needs to use the address/data bus, it takes control of the ports. When the address/data bus is idle, you can use the ports for I/O. Port 0 is an input-only port that is also the analog input for the A/D converter. For more details the reader is requested to see the data manual at www.intel.com/design/mcs96/manuals/27218103.pdf.
Frequency Generator
Some microcontrollers of this class has this frequency generator. This peripheral produces a waveform with a fixed duty cycle (50%) and a programmable frequency (ranging from 4 kHz to 1 MHz with a 16 MHz input clock).
Waveform Generator
A waveform generator simplifies the task of generating synchronized, pulse-width modulated (PWM) outputs. This waveform generator is optimized for motion control applications such as driving 3-phase AC induction motors, 3-phase DC brushless motors, or 4-phase stepping motors. The waveform generator can produce three independent pairs of complementary PWM outputs, which share a common carrier period, dead time, and operating mode. Once it is initialized, the waveform generator operates without CPU intervention unless you need to change a duty cycle.
Analog-to-digital Converter
The analog-to-digital (A/D) converter converts an analog input voltage to a digital equivalent. Resolution is either 8 or 10 bits; sample and convert times are programmable. Conversions can be performed on the analog ground and reference voltage, and the results can be used to calculate gain and zero-offset errors. The internal zero-offset compensation circuit enables automatic zero offset adjustment. The A/D also has a threshold-detection mode, which can be used to generate an interrupt when a programmable threshold voltage is crossed in either direction. The A/D scan mode of the PTS facilitates automated A/D conversions and result storage.
Watchdog Timer
The watchdog timer is a 16-bit internal timer that resets the microcontroller if the software fails to operate properly.
10.3 Conclusion
This lesson discussed about the architecture of a typical high performance microcontrollers. The next lesson shall discuss the signals of a typical microcontroller from the Intel MCS96 family.
much harder to test and debug the code. As a result, the microcode that shipped with machines was often buggy and had to be patched numerous times out in the field. It was the difficulties involved with using microcode for control that spurred Patterson and others began to question whether implementing all of these complex, elaborate instructions in microcode was really the best use of limited transistor resources. 2. What is the function of the Watch Dog Timer? Ans: A fail-safe mechanism that intervenes if a system stops functioning. A hardware timer that is periodically reset by software. If the software crashes or hangs, the watchdog timer will expire, and the entire system will be reset automatically. The Watch Dog Unit contains a Watch Dog Timer. A watchdog timer (WDT) is a device or electronic card that performs a specific operation after a certain period of time if something goes wrong with an electronic system and the system does not recover on its own. A common problem is for a machine or operating system to lock up if two parts or programs conflict, or, in an operating system, if memory management trouble occurs. In some cases, the system will eventually recover on its own, but this may take an unknown and perhaps extended length of time. A watchdog timer can be programmed to perform a warm boot (restarting the system) after a certain number of seconds during which a program or computer fails to respond following the most recent mouse click or keyboard action. The timer can also be used for other purposes, for example, to actuate the refresh (or reload) button in a Web browser if a Web site does not fully load after a certain length of time following the entry of a Uniform Resource Locator (URL). A WDT contains a digital counter that counts down to zero at a constant speed from a preset number. The counter speed is kept constant by a clock circuit. If the counter reaches zero before the computer recovers, a signal is sent to designated circuits to perform the desired action.
Module 2
Embedded Processors and Memory
Version 2 EE IIT, Kharagpur 1
Lesson 11
Embedded Processors - II
Version 2 EE IIT, Kharagpur 2
Pre-requisite
Digital Electronics
11.1 Introduction
Microcontrollers are required to operate in the real world without much of interface circuitry. The input-output signals of such a processor are both analog and digital. The digital data transmission can be both parallel and serial. The voltage levels also could be different. The architecture of a basic microcontroller is shown in Fig. 11.1. It illustrates the various modules inside a microcontroller. Common processors will have Digital Input/Output, Timer and Serial Input/Output lines. Some of the microcontrollers also support multi-channel Analog to Digital Converter (ADC) as well as Digital to Analog Converter (DAC) units. Thus analog signal input and output pins are also present in typical microcontroller units. For external memory and I/O chips the address as well as data lines are also supported. RAM area Timer 16-Bit 8 ROM area Port B 5 Serial Port Tx Rx Port A ADC 8
CPU
Port C 8
Port 11
Port 10
EPORT
Port 12
Watchdog Timer
A/D Converter
Pulse-width Modulators
SSI00 SSI01
SIO0
Baud-rate Generator
Chip-select Unit
Port 2
AD15:0
SIO1
Baud-rate Generator
Ports 7.8
Microcode Engine
ALU
Destination (16)
Address/Data Lines Bus Control Signals Signals related to Interrupt Signals related to Timers/Event Manager Digital Input/Output Ports Analog Input/Output Ports
support extended addressing. The EPORT is an 8-bit port which can operate either as a generalpurpose I/O signal (I/O mode) or as a special-function signal (special-function mode). AD15:0 Address/Data Lines These lines serve as input as well as output pins. The function of these pins depends on the bus width and mode. When a bus access is not occurring, these pins revert to their I/O port function. AD15:0 drive address bits 015 during the first half of the bus cycle and drive or receive data during the second half of the bus cycle.
WRH Write High: Output Signal: During 16-bit bus cycles, this active-low output signal is asserted for high-byte writes and word writes to external memory. WRL Write Low: Output Signal: During 16-bit bus cycles, this active-low output signal is asserted for low-byte writes and word writes to external memory.
XTAL2: Output: Inverted Output for the Crystal/Resonator Output of the on-chip oscillator inverter. Leave XTAL2 floating when the design uses an external clock source instead of the onchip oscillator.
Analog Inputs
ACH15:0: Input Analog Channels: These signals are analog inputs to the A/D converter. The ANGND and VREF pins are also used for the standard A/D converter to function. Other important signals of a typical microcontroller include Power Supply and Ground pins at multiple points Signals from the internal programmable Timer Debug Pins The reader is requested to follow the link www.intel.com/design/mcs96/manuals/272804.htm or www.intel.com/design/mcs96/manuals/27280403.pdf for more details.
11.3 Conclusions
This chapter discussed the important signals of a typical microcontroller. The detailed electrical and timing specifications are available in the respective manuals.
11.4 Questions
1. Which ports of the 80C196EA can generate PWM pulses? What is the voltage level of such pulses? Ans:
2. Why the power supply is given to multiple points on a chip? Ans: The multiple power supply points ensure the following The voltages at devices (transistors and cells) are better than a set target under a specified set of varying load conditions in the design. This is to ensure correct operation of circuits at the expected level of performance. the current supplied by a pad, pin, or voltage regulator is within a specified limit under any of the specified loading conditions. This is required: a) for not exceeding the design capacity of regulators and pads; and b) to distribute currents more uniformly among the pads, so that the L di/dt voltage variations due to parasitic inductance in the packages substrate, ball-grid array, and bond wires are minimized.
Module 2
Embedded Processors and Memory
Version 2 EE IIT, Kharagpur 1
Lesson 12
Memory-Interfacing
Version 2 EE IIT, Kharagpur 2
Instructional Objectives
After going through this lesson the student would learn Requirement of External Memory Different modes of a typical Embedded Controller Standard Control Signals for Memory Interface A typical Example
Pre-Requisite
Digital Electronics, Microprocessors
12.1 Introduction
A Single Chip Microcontroller
RAM area 8 Timer 16bit CPU Port ADC A 8
Serial Port
Tx Rx
Port C 8
CPU: The processing module of the microcontroller Fig. 12.1 The basic architecture of a Microcontroller Fig. 12.1 shows the internal architecture of single chip microcontroller with internal RAM as well as ROM. Most of these microcontrollers do not require external memory for simpler tasks. The program lengths being small can easily fit into the internal memory. Therefore it often provides single chip solutions. However the amount of internal memory cannot be increased beyond a certain limit because of the following reasons. Power Consumption Size
The presence of extra memory needs more power consumption and hence higher temperature rise. The size has to be increased to house the additional memory. The need for extra memory Version 2 EE IIT, Kharagpur 3
space arises in some specific applications. Fig. 12.2 shows the basic block diagram of memory interface to a processor.
Data Lines
CPU
Address Lines
Memory
Control Lines
Microcontroller Mode
The processor accesses only on-chip FLASH memory. External Memory Interface functions are disabled. Attempts to read above the physical limit of the on-chip FLASH causes a read of all 0s (a NOP instruction).
Microprocessor Mode
The processor permits execution and access only through external program memory; the contents of the on-chip FLASH memory are ignored.
Boundary Boundary+1
Boundary Boundary+1
Reads 0s
OE
WRL WRH
UB
LB
BA0 A16-A19: The 4 most significant bits of the address BA0: Byte Address 0
Fig. 12.5 The address, data and control lines of the PIC18F8XXX microcontroller required for external memory interfacing The address, data and control lines of a PIC family of microcontroller is shown in Fig. 12.5 and are explained below. AD0-AD15: 16-bit Data and 16 bits of Address are multiplexed ALE: Address Latch Enable Signal to latch the multiplexed address in the first clock cycle
WRL Write Low Control Pin to make the memory write the lower byte of the data when it is low WRH Write High Control Pin to make the memory write the higher byte of the data when it is low OE Output Enable is made low when valid data is made available to the external memory CE Chip enable line is made low to access the external memory chip
LB Lower Byte Enable Control is kept low when the lower byte is available for the memory.
UB Upper Byte Enable Control is kept low when the upper byte is available for the memory.
The microcontroller has a 16-bit wide bus for data transfer. These data lines are shared with address lines and are labeled AD<15:0>. Because of this, 16 bits of latching are necessary to demultiplex the address and data. There are four additional address lines labeled A<19:16>. The PIC18 architecture provides an internal program counter of 21 bits, offering a capability of 2 Mbytes of addressing. There are seven control lines that are used in the External Memory Interface: ALE, WRL , WRH , OE , CE , LB , UB . All of these lines except OE may be used during data writes. All of these lines except WRL and WRH may be used during fetches and reads. The application will determine which control lines are necessary. The basic connection diagram is shown in Fig. 12.6. The 16-bit byte select mode is shown here.
D15:DO PIC18F8XXX LATCH AD<15:0> ALE CE A<19:16> OE WRH WRL BA0 UB LB Address Bus Data Bus Control Lines Ax:A0 MEMORY Ax:A0 D15:DO CE OE WR(1)
Fig. 12.6 The connection diagram for external memory interface in 16-bit byte select mode The PIC18 family runs from a clock that is four times faster than its instruction cycle. The four clock pulses are a quarter of the instruction cycle in length and are referred to as Q1, Q2, Q3, and Q4. During Q1, ALE is enabled while address information A<15:0> are placed on pins AD<15:0>. At the same time, the upper address information A<19:16> are available on the upper address bus. On the negative edge of ALE, the address is latched in the external latch. At the beginning of Q3, the OE output enable (active low) signal is generated. Also, at the beginning of Q3, BA0 is generated. This signal will be active high only during Q3, indicating the state of the program counter Least Significant bit. At the end of Q4, OE goes high and data (16bit word) is fetched from memory at the low-to-high transition edge of OE . The timing diagram for all signals during external memory code execution and table reads is shown in Fig. 12.7.
Q1
Q2
Q3
Q4
Q1
Q2
Q3
Q4
0Ch 9256h
12.3 Conclusion
This lesson discussed a typical external memory interface example for PIC family of microcontrollers. A typical timing diagram for memory read operation is presented.
12.4 Questions
Q1.Draw the read timing diagram for a typical memory operation Ans: Refer to text Q2. Draw the read timing diagram for a typical memory operation
Module 3
Embedded Systems I/O
Version 2 EE IIT, Kharagpur 1
Lesson 13
Interfacing bus, Protocols, ISA bus etc.
Version 2 EE IIT, Kharagpur 2
Instructional Objectives
After going through this lesson the student would learn Bus, Wires and Ports Basic Protocols of data transfer Bus arbitration ISA bus signals and handshaking Memory mapped I/O and simple I/O Parallel I/O and Port Based I/O Example of interfacing memory to the ports of 8051
Pre-Requisite
Digital Electronics, Microprocessors
13.1 Introduction
The traditional definition of input-output is the devices those create a medium of interaction with the human users. They fall into the following categories such as: 1. Printers 2. Visual Display Units 3. Keyboard 4. Cameras 5. Plotters 6. Scanners However in Real-Time embedded systems the definition of I/O devices is very different. An embedded controller needs to communicate with a wide range of devices namely 1. Analog to Digital (A-D) and Digital to Analog (D-A) Converters 2. CODECs 3. Small Screen Displays such as TFT, LCD etc 4. Antennas 5. Cameras 6. Microphones 7. Touch Screens Etc. A typical Embedded system is a Digital Camera as shown in Fig. 13.1. As it can be seen it possesses broad range of input-output devices such as Lens, Microphone, speakers, Serial interface standards, TFT screens etc.
Buttons
Motors
Lens
CCD Module
RS232c
USB 1394
32164-MB SDRAM
Removable Storage
Buck Converter
Boost Converter
Charge Pump
Inverter
Power Management LI-Ion Protector Battery Monitor Li+NiMH Battery Management Alkaline Battery Charger Wall Supply USB Power
The functionality of an Embedded System can be broadly classified as Processing Transformation of data Implemented using processors Storage Retention of data Implemented using memory And Communication (also called Interfacing) Transfer of data between processors and memories Implemented using buses
Interfacing
Interfacing is a way to communicate and transfer information in either way without ending into deadlocks. In our context it is a way of effective communication in real time. This involves Addressing Arbitration Protocols
Master
Slave Control Lines Address Lines Data Lines Fig. 13.2(a) The Bus structure
Addressing: The data sent by the master over a specified set of lines which enables just the device for which it is meant Protocols: The literal meaning of protocol is a set of rules. Here it is a set of formal rules describing how to transfer data, especially between two devices. A simple example is memory read and write protocol. The set of rules or the protocol is For read (Fig. 13.2 (b)) The CPU must send the memory address The read line must be enabled The processor must wait till the memory is ready Then accept the bits in the data lines
rd'/wr enable addr data tsetup tread read protocol Fig. 13.2(b) For write (Fig. 13.2(c)) The CPU must send the memory address The write line must be enabled The processor sends the data over the data lines The processor must wait till the memory is ready
write protocol Fig. 13.2(c) Arbitration: When the same set of address/data/control lines are shared by different units then the bus arbitration logic comes into play. Access to a bus is arbitrated by a bus master. Each node on a bus has a bus master which requests access to the bus, called a bus request, when then node requires to use the bus. This is a global request sent to all nodes on the bus. The node that currently has access to the bus responds with either a bus grant or a bus busy signal, which is also globally known to all bus masters. (Fig. 13.3)
CPU
Memory 1
Memory 2
I/O Device 1
I/O Device 2
DMA
Fig. 13.3 The bus arbitration of the DMA, known as direct memory access controller which is responsible for transferring data between an I/O device and memory without involving the CPU. It starts with a bus request to the CPU and after it is granted it takes over the address/data and control bus to initiate the data transfer. After the data transfer is complete it passes the control over to the CPU. Before learning more details about each of these concepts a concrete definition of the following terms is necessary. Wire: It is just a passive physical connection with least resistance Bus: A group of signals (such as data, address etc). It may be augmented with buffers latches etc. A bus has standard specification such as number of bits, the clock speed etc. Port: It is the set of physical wires available so that any device which meets the specified standard can be directly plugged in. Example is the serial, parallel and USB port of the PC. Time multiplexing: This is to Share a single set of wires for multiple pieces of data. It saves wires at expense of time
demux
Data serializing
Address/data muxing
Fig. 13.4 The Time multiplexing data transfer. The left hand side transmits 16-bits of data in an 8-bit line MSB after the LSB. The transfer is synchronized with the req signal. In the example shown on the right hand side the same set of wires carry address followed by data in synchronism with the req signal. mux: stands for multiplexer
data
req data
1 2
3 4
1. 2. 3. 4.
Master asserts req to receive data Servant puts data on bus within time taccess Master receives data and deasserts req Servant ready for next request
Handshake Protocol
Master req ack Servant
data
1 2
3 4
Fig. 13.5(b) Handshake Protocol 1. 2. 3. 4. Master asserts req to receive data Servant puts data on bus and asserts ack Master receives data and deasserts req Servant ready for next request
req wait
1 2 3
2 taccess
data taccess
1. Master asserts req to receive data 1. Master asserts req to receive data 2. Servant puts data on bus within time taccess 2. Servant cant put data within taccess, asserts wait ack (wait line is unused) 3. Servant puts data on bus and deasserts wait 3. Master receives data and deasserts req 4. Master receives data and deasserts req 4. Servant ready for next request 5. Servant ready for next request Fast-response case Slow-response case Fig. 13.5(c) Strobe and Handshake Combined
LA23 to LA17
Unlatched Address bits 23:17 are used to address memory within the system. They are used along with SA19 to SA0 to address up to 16 megabytes of memory. These signals are valid when BALE is high. They are "unlatched" and do not stay valid for the entire bus cycle. Decodes of these signals should be latched on the falling edge of BALE.
AEN
Address Enable is used to degate the system microprocessor and other devices from the bus during DMA transfers. When this signal is active the system DMA controller has control of the Version 2 EE IIT, Kharagpur 11
address, data, and read/write signals. This signal should be included as part of ISA board select decodes to prevent incorrect board selects during DMA cycles.
BALE
Buffered Address Latch Enable is used to latch the LA23 to LA17 signals or decodes of these signals. Addresses are latched on the falling edge of BALE. It is forced high during DMA cycles. When used with AEN, it indicates a valid microprocessor or DMA address.
CLK
System Clock is a free running clock typically in the 8MHz to 10MHz range, although its exact frequency is not guaranteed. It is used in some ISA board applications to allow synchronization with the system microprocessor.
SD15 to SD0
System Data serves as the data bus bits for devices on the ISA bus. SD15 is the most significant bit. SD0 is the least significant bits. SD7 to SD0 are used for transfer of data with 8-bit devices. SD15 to SD0 are used for transfer of data with 16-bit devices. 16-bit devices transferring data with 8-bit devices shall convert the transfer into two 8-bit cycles using SD7 to SD0.
I/O CH CK
I/O Channel Check signal may be activated by ISA boards to request than an non-maskable interrupt (NMI) be generated to the system microprocessor. It is driven active to indicate a uncorrectable error has been detected.
I/O CH RDY
I/O Channel Ready allow slower ISA boards to lengthen I/O or memory cycles by inserting wait states. This signals normal state is active high (ready). ISA boards drive the signal inactive low (not ready) to insert wait states. Devices using this signal to insert wait states should drive it low
immediately after detecting a valid address decode and an active read or write command. The signal is release high when the device is ready to complete the cycle.
IOR
I/O Read is driven by the owner of the bus and instructs the selected I/O device to drive read data onto the data bus.
IOW
I/O Write is driven by the owner of the bus and instructs the selected I/O device to capture the write data on the data bus.
SMEMR
System Memory Read instructs a selected memory device to drive data onto the data bus. It is active only when the memory decode is within the low 1 megabyte of memory space. SMEMR is derived from MEMR and a decode of the low 1 megabyte of memory.
SMEMW
System Memory Write instructs a selected memory device to store the data currently on the data bus. It is active only when the memory decode is within the low 1 megabyte of memory space. SMEMW is derived from MEMW and a decode of the low 1 megabyte of memory.
MEMR
Memory Read instructs a selected memory device to drive data onto the data bus. It is active on all memory read cycles.
MEMW
Memory Write instructs a selected memory device to store the data currently on the data bus. It is active on all memory write cycles.
REFRESH
Memory Refresh is driven low to indicate a memory refresh operation is in progress.
OSC
Oscillator is a clock with a 70ns period (14.31818 MHz). This signal is not synchronous with the system clock (CLK).
RESET DRV
Reset Drive is driven high to reset or initialize system logic upon power up or subsequent system reset.
TC
Terminal Count provides a pulse to signal a terminal count has been reached on a DMA channel operation.
MASTER
Master is used by an ISA board along with a DRQ line to gain ownership of the ISA bus. Upon receiving a -DACK a device can pull -MASTER low which will allow it to control the system address, data, and control lines. After MASTER is low, the device should wait one CLK period before driving the address and data lines, and two clock periods before issuing a read or write command.
MEM CS16
Memory Chip Select 16 is driven low by a memory slave device to indicate it is capable of performing a 16-bit memory data transfer. This signal is driven from a decode of the LA23 to LA17 address lines.
I/O CS16
I/O Chip Select 16 is driven low by a I/O slave device to indicate it is capable of performing a 16-bit I/O data transfer. This signal is driven from a decode of the SA15 to SA0 address lines.
0WS
Zero Wait State is driven low by a bus slave device to indicate it is capable of performing a bus cycle without inserting any additional wait states. To perform a 16-bit memory cycle without wait states, -0WS is derived from an address decode.
SBHE
System Byte High Enable is driven low to indicate a transfer of data on the high half of the data bus (D15 to D8).
Port-based I/O (parallel I/O) Processor has one or more N-bit ports Processors software reads and writes a port just like a register Bus-based I/O Processor has address, data and control ports that form a single bus Communication protocol is built into the processor A single instruction carries out the read or write protocol on the bus Parallel I/O peripheral When processor only supports bus-based I/O but parallel I/O needed Each port on peripheral connected to a register within peripheral that is read/written by the processor Processor Memory System bus Processor Port 0 Port 1 Port 2 Port 3 Parallel I/O peripheral
Fig. 13.8 Parallel I/O and extended Parallel I/O Extended parallel I/O When processor supports port-based I/O but more ports needed One or more processor ports interface with parallel I/O peripheral extending total number of ports available for I/O e.g., extending 4 ports to 6 ports in figure Types of bus-based I/O: Memory-mapped I/O and standard I/O Processor talks to both memory and peripherals using same bus two ways to talk to peripherals Memory-mapped I/O Peripheral registers occupy addresses in same address space as memory e.g., Bus has 16-bit address lower 32K addresses may correspond to memory upper 32k addresses may correspond to peripherals Standard I/O (I/O-mapped I/O) Additional pin (M/IO) on bus indicates whether a memory or peripheral access e.g., Bus has 16-bit address all 64K addresses correspond to memory when M/IO set to 0 Version 2 EE IIT, Kharagpur 16
all 64K addresses correspond to peripherals when M/IO set to 1 Memory-mapped I/O vs. Standard I/O Memory-mapped I/O Requires no special instructions Assembly instructions involving memory like MOV and ADD work with peripherals as well Standard I/O requires special instructions (e.g., IN, OUT) to move data between peripheral registers and memory Standard I/O No loss of memory addresses to peripherals Simpler address decoding logic in peripherals possible When number of peripherals much smaller than address space then high-order address bits can be ignored smaller and/or faster comparators A basic memory protocol
D /CS G
74373
Fig. 13.9(b) The timing diagram The timing of the various signals is shown in Fig. 13.9(b). The lower byte of the address is placed along P0 and the address latch enable signal is enabled. The higher byte of the address is placed along P2. The ALE signal enables the 74373 chip to latch the address as the P0 bus will be used for data. The P0 bus goes into tri-state (high impedance state) and switches internally for data path. The RD (read) line is enabled. The bar over the read line indicates that it is active when low. The data is received from the memory on the P0 bus. A memory write cycle can be explained similarly.
13.3 Conclusion
In this lesson you learnt about the basics of Input Output interfacing. In the previous chapter you also studied about some input output concepts. But most of those I/O such as Timer, Watch Dog circuits, PWM generator, Serial and Parallel ports were part of the microcontroller. In this lesson the basics of interfacing with external devices have been discussed. The difference between a Bus and a Port should be kept in mind. The ISA bus is discussed to give an idea about the various bus architectures which will discussed in the later part of this course. You must browse various websites as listed below for further knowledge. http://esd.cs.ucr.edu/slide_index.html http://esd.cs.ucr.edu/wres.html www.techfest.com/hardware/bus/isa.htm You should be able to be in a position to learn any microcontroller and their interfacing protocols.
13.4 Questions
1. List at least 4 differences between the I/O devices for a Real Time Embedded System (RTES) and a Desktop PC? Version 2 EE IIT, Kharagpur 18
RTES I/O It has to operate in real time. The timing requirement has to met. The I/O devices need not be meant for the human user and may consists of analog interfaces, digital controllers, mixed signal circuits.
PC I/O May take little longer and need not satisfy the stringent timing requirement of the user The I/O for desktop encompasses a broad range. Generally the keypad, monitor, mouse etc which are meant for the human users are termed as I/O. But it could have also the similar I/Os as in case of RTES The power consumption of these I/O There is virtually no strict limit to the devices should be limited. power in such I/Os The size of the I/O devices should be small Generally the size is not a problem as it is to make it coexist with the processor and not meant to be portable other devices 2. Draw the timing diagram of a memory read protocol for slower memory. What additional handshaking signals are necessary? Ans: An additional handshaking signal from the memory namely /ready is necessary. The microcontroller inserts wait states as long as the /ready line is not inactive. The ready line in this case is sampled at the rising edge of the third clock phase. Fig.Q2 reveals the timing of such an operation. T1 Clock T2 Twait T4 T5
Address
/RD
/Ready
Data
Fig. Q2 The Timing Diagram of memory read from a slower 3. Enlist the handshaking signals in the ISA bus for dealing with slower I/O devices. Version 2 EE IIT, Kharagpur 19
Ans: I/O CH RDY I/O Channel Ready allow slower ISA boards to lengthen I/O or memory cycles by inserting wait states. This signals normal state is active high (ready). ISA boards drive the signal inactive low (not ready) to insert wait states. Devices using this signal to insert wait states should drive it low immediately after detecting a valid address decode and an active read or write command. The signal is release high when the device is ready to complete the cycle. 4. What additional handshaking signals are necessary for bidirectional data transfer over the same set data lines. Ans: For an 8-bit data transfer we need at least 4 additional lines for hand shaking. As shown in Fig.Q4 there are two ports shown. Port A acts as the 8-bit bidirectional data bus. Port C carries the handshaking signals. Write operation: When the data is ready the /OBFA (PC7 output buffer full acknowledge active low) signal is made 0. The device which is connected acknowledges through /ACKA( PC6 acknowledge that it is ready to accept data. It is active low). The data transfer takes place over PA0-PA7. Read operation: When the data is ready the external device makes the /STBA (PC4 Strobe acknowledge active low) line low. The acknowledgement is sent through IBFA (Input Buffer Empty Acknowledge that it is ready to accept data. It is active high). The data transfer takes place.
Fig. Q4 The master 5. List the various bus standards used in industry. Ans:
ISA Bus
The Industry Standard Architecture (ISA) bus is an open, 8-bit (PC and XT) or 16-bit (AT) asymmetrical I/O channel with numerous compatible hardware implementations. Version 2 EE IIT, Kharagpur 20
EISA Bus
The Extended Industry Standard Architecture (EISA) bus is an open, 32-bit, asymmetrical I/O channel with numerous compatible hardware implementations. The system bus and allows data transfer rates at a bandwidth of up to 33 MB per second, supports a 4 GB address space, 8 DMA channels, and is backward compatible with the Industry Standard Architecture (ISA) bus.
PCI Bus
The Peripheral Component Interconnect Local Bus (PCI) is an open, high-performance 32-bit or 64-bit synchronous bus with multiplexed address and data lines, and numerous compatible hardware implementations. PCI bus support a PCI frequency of 33 MHz and a transfer rate of 132 MB per second.
Futurebus+
Futurebus+ is an open bus, designed by the IEEE 896 committee, whose architecture and interfaces are publicly documented, and that is independent of any underlying architecture. It has broad-base, cross-industry support; very high throughput (the maximum rate for 64-bit bandwidth is 160 MB per second; for the 128-bit bandwidth, 180 MB per second). Futurebus+ supports a 64-bit address space and a set of control and status registers (CSRs) that provides all the necessary ability to enable or disable features; thus supporting multivendor interoperablity.
SCSI Bus
The Small Computer Systems Interface (SCSI) bus is an ANSI standard for the interconnection of computers with each other and with disks, floppies, tapes, printers, optical disks, and scanners. The SCSI standard includes all the mechanical, electrical, and Data transfer rates are individually negotiated with each device attached to a given SCSI bus. For example, a 4 MB per second device and a 10 MB per second device may share a fast narrow bus. When the 4 MB per second device is using the bus, the transfer rate is 4 MB per second. When the 10 MB per second device is using the bus, the transfer rate is 10 MB per second. However, when faster devices are placed on a slower bus, their transfer rate is reduced to allow for proper operation in that slower environment. Note that the speed of the SCSI bus is a function of cable length, with slow, single-ended SCSI buses supporting a maximum cable length of 6 meters, and fast, single-ended SCSI buses supporting a maximum cable length of 3 meters.
TURBOchannel Bus
The TURBOchannel bus is a synchronous, 32-bit, asymmetrical I/O channel that can be operated at any fixed frequency in the range 12.5 MHz to 25 MHz. It is also an open bus, developed by Digital, whose architecture and interfaces are publicly documented. At 12.5 MHz, the peak data rate is 50 MB per second. At 25 MHz, the peak data rate is 100 MB per second. The TURBOchannel is asymmetrical in that the base system processor and system memory are defined separately from the TURBOchannel architecture. The I/O operations do not directly Version 2 EE IIT, Kharagpur 21
address each other. All data is entered into system memory before being transferred to another I/O option. The design facilitates a concise and compact protocol with very high performance.
XMI Bus
The XMI bus is a 64-bit wide parallel bus that can sustain a 100 MB per second bandwidth in a single processor configuration. The bandwidth is exclusive of addressing overhead; the XMI bus can transmit 100 MB per second of data. The XMI bus implements a "pended protocol" design so that the bus does not stall between requests and transmissions of data. Several transactions can be in progress at a given time. Bus cycles not used by the requesting device are available to other devices on the bus. Arbitration and data transfers occur simultaneously, with multiplexed data and address lines. These design features are particularly significant when a combination of multiple devices has a wider bandwidth than the bus itself.
VME Bus
Digital UNIX includes a generic VME interface layer that provides customers with a consistent interface to VME devices across Alpha AXP workstation and server platforms. Currently, VME adapters are only supported on the TURBOchannel bus. To use the VME interface layer to write VMEbus device drivers, you must have the Digital UNIX TURBOchannel/VME Adapter Driver Version 2.0 software (Software Product Description 48.50.00) and its required processor and/or hardware configurations (Software Support Addendum 48.50.00-A).
Module 3
Embedded Systems I/O
Version 2 EE IIT, Kharagpur 1
Lesson 14
Timers
Version 2 EE IIT, Kharagpur 2
Instructional Objectives
After going through this lesson the student would learn Standard Peripheral Devices most commonly used in single purpose processors. They are Timers and Counter Basics Various Modes of Timer Operation The internal Timer of 8051 A programmable interval timer 8253 Watchdog Timer and Watchdog circuit
Pre-Requisite
Digital Electronics, Microprocessors
14
Introduction
The Peripherals of an embedded processor can either be on the same chip as the processor or can be connected externally.
External Interrupts Interrupt Control On-Chip Flash On-Chip RAM
CPU
Osc
Bus Control
4 I/O Ports
Serial Port
TXD RXD P0 P2 P1 P3
Fig. 14.1 Block Diagram of the basic 8051 Architecture For example in a typical embedded processor as shown in Fig.14.1 timer, interrupt. Serial port and parallel ports reside on the single chip. These dedicated units are otherwise termed as single-purpose processor. These units are designed to achieve the following objectives. They can be a part of the microcontroller or can reside outside the chip and therefore should be properly interfaced with the processor. The tasks generally carried out by such units are Timers, counters, watchdog timers serial transmission Version 2 EE IIT, Kharagpur 3
analog/digital conversions
Timer
Timer is a very common and useful peripheral. It is used to generate events at specific times or measures the duration of specific events which are external to the processor. It is a programmable device, i.e. the time period can be adjusted by writing specific bit patterns to some of the registers called timer-control registers.
Counter
A counter is a more general version of the timer. It is used to count events in the form of pulses which is fed to it. Fig.14.2(a) shows the block diagram of a simple timer. This has a 16-bit up counter which increments with each input clock pulse. Thus the output value Cnt represents the number of pulses since the counter was last reset to zero. An additional output top indicates when the terminal count has been reached. It may go high for a predetermined time as set by the programmable control word inside the timer unit. The count can be loaded by the external program. Fig.14.2(b) provides the structure of another timer where a multiplexer is used to choose between an internal clock or external clock. The mode bit when set or reset decided the selection. For internal clock(Clk) it behaves like the timer in Fig.14.2(a). For the external count in (cnt_in) it just counts the number of occurrences. Basic timer Clk 16 Cnt 16-bit up counter Top Reset Fig. 14.2(a) Mode Fig. 14.2(b) Fig.14.2(c) shows a timer with the terminal count. This can generate an event if a particular interval of time has been elapsed. The counter restarts after every terminal count. Reset Cnt_in Clk 2x1 mux 16-bit up counter 16 Cnt Timer/counter
Top
Clk
16 Cnt
Reset
=
Top
Terminal count
Fig. 14.2(c)
Clock Amplitude
-2 0 10
10
25
30
Counter Value
Reset and Reload the Timer with a new count each time 5
0 0 2
10
25
30
Output
10
25
30
Fig. 14.3 The Timer Count and Output. The timer is in count-down mode. In every clock pulse the count is decremented by 1. When the count value reaches zero the output of the counter i.e. TOP goes high for a predetermined time. The counter has to be loaded with a new or previous value of the count by external program or it can be loaded automatically every time the count reaches zero.
MODE0
Either Timer in Mode0 is an 8-bit Counter with a divide-by-32 pre-scaler. In this mode, the Timer register is configured as a 13-Bit register. As the count rolls over from all 1s to all 0s, it sets the Timer interrupt flag TF1. The counted input is enabled to the Timer whenTR1 = 1and either GATE = 0 or INT1 = 1. (Setting GATE = 1 allows the Timer to be controlled by external input INT1, to facilitate pulse width measurements.)
OSC
+ 12 C/T = 0 C/T = 1
TF1
INTERRUPT
(MSB) GATE
C/T
M1
M0
GATE
C/T
M1
(LSB) M0
Timer 1 GATE Gating control when set. Timer/Counter M1 x is enabled only while INTx pin is 0 high and TRx control pin is set. When cleared Timer x is enabled 0 whenever TRx control bit is set. C/T Timer or Counter Selector cleared for Timer operation (input from internal system clock). Set for Counter operation (input from Tx input pin). 1
Timer 0 Operating Mode M0 8-bit Timer/Counter THx with 0 TLx as 5-bit prescaler. 16-bit Timer/Counter THx and 1 TLx are cascaded; there is no prescaler. 8-bit auto-reload Timer/Counter 0 THx holds a value which is to be reloaded into TLx each time it overflows. (Timer 0) TL0 is an 8-bit 1 Timer/Counter controlled by the standard Timer 0 control bits. THO is an 8-bit timer only controlled by Timer 1 controls bits. (Timer 1) Timer/Counter 1 stopped. 1
TR1
TF0 TR0
IE1
IT1
IE0
(LSB) IT0 Name and Significance Interrupt 1 Edge flag. Set by hardware when external interrupt edge detected. Cleared when interrupt processed. Interrupt 1 Type control bit. Set/cleared by software to specify falling edge/low level triggered external interrupts. Interrupt 0 Edge flag. Set by hardware when external interrupt edge detected. Cleared when interrupt processed. Interrupt 0 Type control bit. Set/cleared by software to specify falling edge/low level triggered external interrupts.
TCON.7 Timer 1 overflow Flag. Set by hardware on Timer/Counter overflow. Cleared by hardware when processor vectors to interrupt routine. TCON.6 Timer 1 Run control bit. Set/cleared by software to turn Timer/Counter on/off. TCON.5 Timer 0 overflow Flag. Set by hardware on Timer/Counter overflow. Cleared by hardware when processor vectors to interrupt routine. TCON.4 Timer 1 Run control bit. Set/cleared by software to turn Timer/Counter on/off.
TR1
IT1
TCON.2
TF0
IE0
TCON.1
TR0
IT0
TCON.0
Timer/Counter Control Register (TCON) MODE 1: Mode 1 is the same as Mode 0, except that the Timer register is being run with all 16bits.
OSC
+ 12 C/T = 0 C/T = 1
TF1
INTERRUPT
Fig. 14.5 MODE 2 configures the Timer register as an 8-bit counter with automatic reload
OSC
1/12 f
+ 12
1/12 f
OSC
C/T = 0 C/T = 1
TF0
INTERRUPT
T1 PIN
OSC TR1
TF1
INTERRUPT
Fig. 14.6 MODE 3: Timer simply holds its count. Timer 0 in Mode 3 establishes TL0 and TH0 as two separate counters.
1 2 3 4 5 6 7 8 9 10 11 12
8 2 5 3
24 23 22 21 20 19 18 17 16 15 14 13
Microprocessor interface D7 D0
Counter input/output CLK 0 GATE 0 OUT 0 CLK 1 GATE 1 OUT 1 CLK 2 GATE 2 OUT 2
RD WR
8253
A0 A1
CS Fig. 14.7 The pin configuration of the timer Fig.14.8 shows the internal block diagram. There are three separate counter units controlled by configuration register (Fig.14.9). Each counter has two inputs, clock and gate and one output. The clock is signal that helps in counting by decrementing a preloaded value in the respective counter register. The gate serves as an enable input. If the gate is maintained low the counting is disabled. The timing diagram explains in detail about the various modes of operation of the timer.
CLK0
RD WR A1 A0 CS
Bus
D7D0
Counter GATE0 #0
OUT0
Internal
CLK1
Counter GATE1 #1
OUT1
Power supplies
Vcc GND
CLK2
Counter GATE2 #2
OUT2
Fig. 14.8 The internal block diagram of 8253 Table The address map Version 2 EE IIT, Kharagpur 10
CS 0 0 0 0
D7 SC1 D6 SC0
A1 0 0 1 1
D5 RL1
A0 0 1 0 1
D4 RL0
0 1 0 0
Binary counter (16-bit) BCD (4 decades) 0 0 1 1 0 0 0 1 0 1 0 1 Mode 0 Mode 1 Mode 2 Mode 3 Mode 4 Mode 5
1 1 0 0 1 1 0 0 1 1 0 1 0 1 0 1 0 1
Counter latching operation Road/load LSB only Road/load MSB only Road/load LSB first, then MSB Select counter 0 Select counter 1 Select counter 2 Illegal
Mode 0: The output goes high after the terminal count is reached. The counter stops if the Gate is low. (Fig.14.10(a) & (b)). The timer count register is loaded with a count (say 6) when the WR line is made low by the processor. The counter unit starts counting down with each clock pulse. The output goes high when the register value reaches zero. In the mean time if the GATE is made low (Fig.14.10(b)) the count is suspended at the value(3) till the GATE is enabled again. CLK
WR OUT 6 5 4 3 2 1
GATE Fig. 14.10(a) Mode 0 count when Gate is high (enabled) CLK
WR OUT 6 5 4 3 3 3 2 1
GATE Fig. 14.10(b) Mode 0 count when Gate is low temporarily (disabled)
completed then the counter will be suspended at that state as long as GATE is low (Fig.14.11(b)). Thus it works as a mono-shot. CLK
WR
GATE (trigger)
OUT
Fig. 14.11(a) Mode 1 The Gate goes high. The output goes low for the period depending on the count CLK
WR
GATE (trigger)
OUT
Fig. 14.11(b) Mode 1 The Gate pulse is disabled momentarily causing the counter to stop.
CLK
WR
GATE 3 OUT Fig. 14.12(a) Mode 2 Operation when the GATE is kept high CLK 2 1 3 2 1
WR
GATE OUT 3 2 1 3 3 2 1
CLK
OUT (n=5)
WR OUT 4 3 2 1
CLK
WR
GATE OUT 4 3 3 2 1
Fig. 14.14(b) Mode 4 Software Triggered Strobe when GATE is momentarily low
WR
GATE OUT 5 4 3 2 1
Watchdog timer
A Watchdog Timer is a circuit that automatically invokes a reset unless the system being watched sends regular hold-off signals to the Watchdog.
Watchdog Circuit
To make sure that a particular program is executing properly the Watchdog circuit is used. For instance the program may reset a particular flip-flop periodically. And the flip-flop is set by an external circuit. Suppose the flip-flop is not reset for long time it can be known by using external hardware. This will indicate that the program is not executed properly and hence an exception or interrupt can be generated. Watch Dog Timer(WDT) provides a unique clock, which is independent of any external clock. When the WDT is enabled, a counter starts at 00 and increments by 1 until it reaches FF. When it goes from FF to 00 (which is FF + 1) then the processor will be reset or an exception will be generated. The only way to stop the WDT from resetting the processor or generating an exception or interrupt is to periodically reset the WDT back to 00 throughout the program. If the program gets stuck for some reason, then the WDT will not be set. The WDT will then reset or interrupt the processor. An interrupt service routine will be invoked to take into account the erroneous operation of the program. (getting stuck or going into infinite loop).
Conclusion
In this chapter you have learnt about the programmable timer/counter. For most of the embedded processors the timer is internal and exists along with the processor on the same chip. The 8051 microcontroller has 3 different internal timers which can be programmed in various modes by the configuration and mode control register. An external timer chip namely 8253 has also been discussed. It has 8 data lines 2 data lines, 1 chip select line and one read and one write control line. The 16 bit counts of the corresponding registers can be loaded with two consecutive write operations. Counters and Timers are used for triggering, trapping and managing various real time events. The least count of the timer depend on the clock. The stability of the clock decides the accuracy of the timings. Timers can be used to generate specific baud rate clocks for asynchronous serial communications. It can be used to measure speed, frequency and analog voltages after Voltage to Frequency conversion. One important application of timer is to generate Pulse-Width-Modulated (PWM) waveforms. In 8253 the GATE and pulse together can be used together to generate pulse with different widths. These modulated pulses are used in electronic power control to reduce harmonics and hence distortions. You also learnt about the Watch dog circuit and Watch dog timers. These are used to monitor the activity of a program and the processor.
Questions
Q1. Design a circuit using 8253 to measure the speed of any motor by counting the number of pulses in definite period. Q2. Write a pseudo code (any assembly code) to generate sinusoidal pulse width modulated waveform from the 8253 timer. Version 2 EE IIT, Kharagpur 17
Q3. Design a scheme to read temperature from a thermister circuit using a V/F converter and Timer. Q4. What are the differences in Mode 4 and Mode 5 operation of 8253 Timer? Q5. Explain the circuit given in Fig.14.5.
Module 3
Embedded Systems I/O
Version 2 EE IIT, Kharagpur 1
Lesson 15
Interrupts
Version 2 EE IIT, Kharagpur 2
Instructional Objectives
After going through this lesson the student would learn Interrupts Interrupt Service Subroutines Polling Priority Resolving Daisy Chain Interrupts Interrupt Structure in 8051 Microcontroller Programmable Interrupt Controller
Pre-Requisite
Digital Electronics, Microprocessors
15
Introduction
Real Time Embedded System design requires that I/O devices receive servicing in an efficient manner so that large amounts of the total system tasks can be assumed by the processor with little or no effect on throughput. The most common method of servicing such devices is the polled approach. This is where the processor must test each device in sequence and in effect ask each one if it needs servicing. It is easy to see that a large portion of the main program is looping through this continuous polling cycle and that such a method would have a serious, detrimental effect on system throughput, thus, limiting the tasks that could be assumed by the microcomputer and reducing the cost effectiveness of using such devices. A more desirable method would be one that would allow the microprocessor to be executing its main program and only stop to service peripheral devices when it is told to do so by the device itself. In effect, the method would provide an external asynchronous input that would inform the processor that it should complete whatever instruction that is currently being executed and fetch a new routine that will service the requesting device. Once this servicing is complete, however, the processor would resume exactly where it left off. This can be effectively handled by interrupts. A signal informing a program or a device connected to the processor that an event has occurred. When a processor receives an interrupt signal, it takes a specified action depending on the priority and importance of the entity generating the signal. Interrupt signals can cause a program to suspend itself temporarily to service the interrupt by branching into another program called Interrupt Service Subroutines (ISS) for the specified device which has caused the interrupt.
Types of Interrupts
Interrupts can be broadly classified as - Hardware Interrupts These are interrupts caused by the connected devices. - Software Interrupts These are interrupts deliberately introduced by software instructions to generate user defined exceptions - Trap Version 2 EE IIT, Kharagpur 3
These are interrupts used by the processor alone to detect any exception such as divide by zero Depending on the service the interrupts also can be classified as - Fixed interrupt Address of the ISR built into microprocessor, cannot be changed Either ISR stored at address or a jump to actual ISR stored if not enough bytes available - Vectored interrupt Peripheral must provide the address of the ISR Common when microprocessor has multiple peripherals connected by a system bus Compromise between fixed and vectored interrupts One interrupt pin Table in memory holding ISR addresses (maybe 256 words) Peripheral doesnt provide ISR address, but rather index into table Fewer bits are sent by the peripheral Can move ISR location without changing peripheral Maskable vs. Non-maskable interrupts Maskable: programmer can set bit that causes processor to ignore interrupt This is important when the processor is executing a time-critical code Non-maskable: a separate interrupt pin that cant be masked Typically reserved for drastic situations, like power failure requiring immediate backup of data to non-volatile memory Example: Interrupt Driven Data Transfer (Fixed Interrupt) Fig.15.1(a) shows the block diagram of a system where it is required to read data from a input port P1, modify (according to some given algorithm) and send to port P2. The input port generates data at a very slow pace. There are two ways to transfer data (a) The processor waits till the input is ready with the data and performs a read operation from P1 followed by a write operation to P2. This is called Programmed Data Transfer (b) The other option is when the input/output device is slow then the device whenever is ready interrupts the microprocessor through an Int pin as shown in Fig.15.1. The processor which may be otherwise busy in executing another program (main program here) after receiving the interrupts calls an Interrupt Service Subroutine (ISR) to accomplish the required data transfer. This is known as Interrupt Driven Data Transfer.
Data memory C
System bus
Main program
... 100: instruction 101: instruction
Int
P1 0x8000
P2 0x8001
PC
After completing instruction at 100, C sees Int asserted, saves the PCs value of 100, and sets PC to the ISR fixed location of 16. The ISR reads data from 0x8000, modifies the data, and writes the resulting data to 0x8001.
The ISR returns, thus restoring PC to 100+1=101, where P resumes executing. Fig. 15.1(b) Flow chart for Interrupt Service Fig.15.1(b) describes the sequence of action taking place after the Port P1 is ready with the data. Example: Interrupt Driven Data Transfer (Vectored Interrupt) Version 2 EE IIT, Kharagpur 5
Data memory
System bus
Main program
... 100: instruction 101: instruction
PC
100 Fig. 15.2(a)
Time
P1 receives input data in a register with address 0x8000. P1 asserts Int to request servicing by the microprocessor. P1 detects Inta and puts interrupt address vector 16 on the data bus.
After completing instruction at 100, C sees Int asserted, saves the PCs value of 100, and asserts Inta.
C jumps to the address on the bus (16). The ISR there reads data from 0x8000, modifies the data, and writes the resulting data to 0x8001. The ISR returns, thus restoring PC to 100+1=101, where P resumes executing.
CPU
Osc
Bus Control
P0
P2
P1
P3
Address/Data
Fig. 15.3 The 8051 Architecture The 8051 has 5 interrupt sources: 2 external interrupts, 2 timer interrupts, and the serial port interrupt. These interrupts occur because of 1. timers overflowing 2. receiving character via the serial port 3. transmitting character via the serial port 4. Two external events
Interrupt Enables
Each interrupt source can be individually enabled or disabled by setting or clearing a bit in a Special Function Register (SFR) named IE (Interrupt Enable). This register also contains a global disable bit, which can be cleared to disable all interrupts at once.
Interrupt Priorities
Each interrupt source can also be individually programmed to one of two priority levels by setting or clearing a bit in the SFR named IP (Interrupt Priority). A low-priority interrupt can be interrupted by a high-priority interrupt, but not by another low-priority interrupt. A high-priority interrupt cant be interrupted by any other interrupt source. If two interrupt requests of different priority levels are received simultaneously, the request of higher priority is serviced. If interrupt requests of the same priority level are received simultaneously, an internal polling sequence determines which request is serviced. Thus within each priority level there is a second priority structure determined by the polling sequence. In operation, all the interrupt flags are latched into the interrupt control system during State 5 of every machine cycle. The samples are polled during the following machine cycle. If the flag for an enabled interrupt is found to be set (1), the Version 2 EE IIT, Kharagpur 7
interrupt system generates a CALL to the appropriate location in Program Memory, unless some other condition blocks the interrupt. Several conditions can block an interrupt, among them that an interrupt of equal or higher priority level is already in progress. The hardware-generated CALL causes the contents of the Program Counter to be pushed into the stack, and reloads the PC with the beginning address of the service routine. Interrupt Enable(IE) Register terrupt Priority (IP) Regist
IP Register
INT1
Global Disable
INT0 : External Interrupt 0 INT0 : External Interrupt 1 TF0: Timer 0 Interrupt TF1: Timer 1 Interrupt RI,TI: Serial Port Receive/Transmit Interrupt
The service routine for each interrupt begins at a fixed location (fixed address interrupts). Only the Program Counter (PC) is automatically pushed onto the stack, not the Processor Status Word (which includes the contents of the accumulator and flag register) or any other register. Having only the PC automatically saved allows the programmer to decide how much time should be spent saving other registers. This enhances the interrupt response time, albeit at the expense of increasing the programmers burden of responsibility. As a result, many interrupt functions that are typical in control applications toggling a port pin for example, or reloading a timer, or unloading a serial buffer can often be completed in less time than it takes other architectures to complete. Interrupt Number 0 1 2 3 4 Interrupt Vector Address 0003h 000Bh 0013h 001Bh 0023h Description EXTERNAL 0 TIMER/COUNTER 0 EXTERNAL 1 TIMER/COUNTER 1 SERIAL PORT
Simultaneously occurring interrupts are serviced in the following order: 1. 2. 3. 4. 5. External 0 Interrupt Timer 0 Interrupt External 1 Interrupt Timer 1 Interrupt Serial Interrupt
Priority Arbiter
C
System bus Inta Int 5 3 Priority arbiter Ireq1 Iack1 Ireq2 Iack2 Fig. 15.5 The Priority Arbitration Let us assume that the Priority of the devices are Device1 > Device 2 1. The Processor is executing its program. 2. Peripheral1 needs servicing so asserts Ireq1. Peripheral2 also needs servicing so asserts Ireq2. 3. Priority arbiter sees at least one Ireq input asserted, so asserts Int. 4. Processor stops executing its program and stores its state. 5. Processor asserts Inta. 6. Priority arbiter asserts Iack1 to acknowledge Peripheral1. 7. Peripheral1 puts its interrupt address vector on the system bus 8. Processor jumps to the address of ISR read from data bus, ISR executes and returns(and completes handshake with arbiter). Thus in case of simultaneous interrupts the device with the highest priority will be served. 6 7 Peripheral 1 2 Peripheral 2 2
8. The processor jumps to the address of ISR read from data bus, ISR executes and returns. 9. The flag is reset. The processor now check for the next device which has interrupted simultaneously. C System bus Peripheral 1 Ack_in Ack_out Req_out Req_in Peripheral 2 Ack_in Ack_out Req_out Req_in
Inta Int
Fig. 15.6 The Daisy Chain Arbitration In this case The device nearest to the processor has the highest priority The service to the subsequent stages is interrupted if the chain is broken at one place.
PIC
RAM
I/O (1)
ROM
I/O (2)
I/O (N)
Each peripheral device or structure usually has a special program or routine that is associated with its specific functional or operational requirements; this is referred to as a service routine. The PlC, after issuing an interrupt to the CPU, must somehow input information into the CPU that can point (vector) the Program Counter to the service routine associated with the requesting device. The PIC manages eight levels of requests and has built-in features for expandability to other PIC (up to 64 levels). It is programmed by system software as an I/O peripheral. The priority modes can be changed or reconfigured dynamically at any time during main program operation.
Priority Resolver
This logic block determines the priorities of the bits set in the lRR. The highest priority is selected and strobed into the corresponding bit of the lSR during the INTA sequence.
Intel 8259
INT
CONTROL LOGIC
PRIORITY RESOLVER
Fig. 15.9 The Functional Block Diagram Table of Signals of the PIC Signal D[7..0] A[0..0] Description These wires are connected to the system bus and are used by the microprocessor to write or read the internal registers of the 8259. This pin acts in conjunction with WR/RD signals. It is used by the 8259 to decipher various command words the microprocessor writes and status the microprocessor wishes to read. When this write signal is asserted, the 8259 accepts the command on the data line, i.e., the microprocessor writes to the 8259 by placing a command on the data lines and asserting this signal. When this read signal is asserted, the 8259 provides on the data lines its status, i.e., the microprocessor reads the status of the 8259 by asserting this signal and reading the data lines. This signal is asserted whenever a valid interrupt request is received by the 8259, i.e., it is used to interrupt the microprocessor.
WR
RD
INT
INTA
This signal, is used to enable 8259 interrupt-vector data onto the data bus by a sequence of interrupt acknowledge pulses issued by the microprocessor. An interrupt request is executed by a peripheral device when one of these signals is asserted. These are cascade signals to enable multiple 8259 chips to be chained together. This function is used in conjunction with the CAS signals for cascading purposes.
Fig.15.10 shows the daisy chain connection of a number of PICs. The extreme right PIC interrupts the processor. In this figure the processor can entertain up to 24 different interrupt requests. The SP/EN signal has been connected to Vcc for the master and grounded for the slaves.
ADDRESS BUS (16) CONTROL BUS INT REQ DATA BUS (8)
INTERRUPT REQUESTS
Software Interrupts
These are initiated by the program by specific instructions. On encountering such instructions the CPU executes an Interrupt service subroutine.
Conclusion
In this chapter you have learnt about the Interrupts and the Programmable Interrupt Controller. Different methods of interrupt services such as Priority arbitration and Daisy Chain arbitration have been discussed. In real time systems the interrupts are used for specific cases and the time of execution of these Interrupt Service Subroutines are almost fixed. Too many interrupts are not encouraged in real time as it may severely disrupt the services. Please look at problem no.1 in the exercise. Most of the embedded processors are equipped with an interrupt structure. Rarely there is a need to use a PIC. Some of the entry level microcontrollers do not have an inbuilt exception Version 2 EE IIT, Kharagpur 14
handler called trap. The trap is also an interrupt which is used to handle some extreme processor conditions such as divide by 0, overflow etc.
Question Answers
Q1. A computer system has three devices whose characteristics are summarized in the following table: Device D1 D2 D3 Service Time 150s 50s 100s Interrupt Frequency 1/(800s) 1/(1000s) 1/(800s) Allowable Latency 50s 50s 100s
Service time indicates how long it takes to run the interrupt handler for each device. The maximum time allowed to elapse between an interrupt request and the start of the interrupt handler is indicated by allowable latency. If a program P takes 100 seconds to execute when interrupts are disabled, how long will P take to run when interrupts are enabled? Ans: The CPU time taken to service the interrupts must be found out. Let us consider Device 1. It takes 400 s to execute and occurs at a frequency of 1/(800s) (1250 times a second). Consider a time quantum of 1 unit. The Device 1 shall take (150+50)/800= 1/4 unit The Device 2 shall take (50+50)/1000=1/10 unit The Device 3 shall take (100+100)/800=1/4 unit In one unit of real time the cpu time taken by all these devices is (1/4+1/10+1/4) = 0.6 units The cpu idle time 0.4 units which can be used by the Program P. For 100 seconds of CPU time the Real Time required will be 100/0.4= 250 seconds Q.2 What is TRAP? Ans: The term trap denotes a programmer initiated and expected transfer of control to a special handler routine. In many respects, a trap is nothing more than a specialized subroutine call. Many texts refer to traps as software interrupts. Traps are usually unconditional; that is, when you execute an Interrupt instruction, control always transfers to the procedure associated with the trap. Since traps execute via an explicit instruction, it is easy to determine exactly which instructions in a program will invoke a trap handling routine.
Q.3. Discuss about the Interrupt Acknowledge Machine Cycle. Ans: For vectored interrupts the processor expects the address from the external device. Once it receives the interrupt it starts an Interrupt acknowledge cycle as shown in the figure. In the figure TN is the last clock state of the previous instruction immediately after which the processor checks the status of the Intr pin which has already become high by the external device. Therefore the processor starts an INTA cycle in which it brings the interrupt vector through the data lines. If the data lines arte 8-bits and the address required is 16 bits there will be two I/O read. If the interrupt vector is a number which will be vectored to a look up table then only 8-bits are required and hence one I/O read will be there.
TN T1 T2 T3
CLK
INTREQ
INTACK
Data
Address code
Module 3
Embedded Systems I/O
Version 2 EE IIT, Kharagpur 1
Lesson 16
DMA
Version 2 EE IIT, Kharagpur 2
Instructional Objectives
After going through this lesson the student would learn The concept of Direct Memory Access When and where to use DMA? How to initiate an DMA cycle? What are the different steps of DMA? What is a typical DMA controller?
Pre-Requisite
Digital Electronics, Microprocessors
16(I)
Introduction
Drect Memory Access (DMA) allows devices to transfer data without subjecting the processor a heavy overhead. Otherwise, the processor would have to copy each piece of data from the source to the destination. This is typically slower than copying normal blocks of memory since access to I/O devices over a peripheral bus is generally slower than normal system RAM. During this time the processor would be unavailable for any other tasks involving processor bus access. But it can continue to work on any work which does not require bus access. DMA transfers are essential for high performance embedded systems where large chunks of data need to be transferred from the input/output devices to or from the primary memory.
16(II)
DMA Controller
A DMA controller is a device, usually peripheral to a CPU that is programmed to perform a sequence of data transfers on behalf of the CPU. A DMA controller can directly access memory and is used to transfer data from one memory location to another, or from an I/O device to memory and vice versa. A DMA controller manages several DMA channels, each of which can be programmed to perform a sequence of these DMA transfers. Devices, usually I/O peripherals, that acquire data that must be read (or devices that must output data and be written to) signal the DMA controller to perform a DMA transfer by asserting a hardware DMA request (DRQ) signal. A DMA request signal for each channel is routed to the DMA controller. This signal is monitored and responded to in much the same way that a processor handles interrupts. When the DMA controller sees a DMA request, it responds by performing one or many data transfers from that I/O device into system memory or vice versa. Channels must be enabled by the processor for the DMA controller to respond to DMA requests. The number of transfers performed, transfer modes used, and memory locations accessed depends on how the DMA channel is programmed. A DMA controller typically shares the system memory and I/O bus with the CPU and has both bus master and slave capability. Fig.16.1 shows the DMA controller architecture and how the DMA controller interacts with the CPU. In bus master mode, the DMA controller acquires the system bus (address, data, and control lines) from the CPU to perform the
DMA transfers. Because the CPU releases the system bus for the duration of the transfer, the process is sometimes referred to as cycle stealing. In bus slave mode, the DMA controller is accessed by the CPU, which programs the DMA controller's internal registers to set up DMA transfers. The internal registers consist of source and destination address registers and transfer count registers for each DMA channel, as well as control and status registers for initiating, monitoring, and sustaining the operation of the DMA controller.
DMA Controller
...
Status Register
...
Enable/ Disable
CPU
Mask Register
DMA Channel X Base Count TC Current Count Base Address Current Address
Base Request
DMA Arbitration Logic
Base Grant
DACKX DRQX
TC
PC Bus
Data
Fig. 16.2 Flyby DMA transfer The second type of DMA transfer is referred to as a dual-cycle, dual-address, flowthrough, or fetch-and-deposit DMA transfer. As these names imply, this type of transfer involves two memory or I/O cycles. The data being transferred is first read from the I/O device or memory into a temporary data register internal to the DMA controller. The data is then written to the memory or I/O device in the next cycle. Fig.16.3 shows the fetch-and-deposit DMA transfer signal protocol. Although inefficient because the DMA controller performs two cycles and thus retains the system bus longer, this type of transfer is useful for interfacing devices with different data bus sizes. For example, a DMA controller can perform two 16-bit read operations from one location followed by a 32-bit write operation to another location. A DMA controller supporting this type of transfer has two address registers per channel (source address and destination address) and bus-size registers, in addition to the usual transfer count and control registers. Version 2 EE IIT, Kharagpur 5
Unlike the flyby operation, this type of DMA transfer is suitable for both memory-to-memory and I/O transfers.
DMA Request I/O Device I/O Read* Memory Write*
(DMA Controller)
Address Data
Fig. 16.3 Fetch-and-Deposit DMA Transfer Single, block, and demand are the most common transfer modes. Single transfer mode transfers one data value for each DMA request assertion. This mode is the slowest method of transfer because it requires the DMA controller to arbitrate for the system bus with each transfer. This arbitration is not a major problem on a lightly loaded bus, but it can lead to latency problems when multiple devices are using the bus. Block and demand transfer modes increase system throughput by allowing the DMA controller to perform multiple DMA transfers when the DMA controller has gained the bus. For block mode transfers, the DMA controller performs the entire DMA sequence as specified by the transfer count register at the fastest possible rate in response to a single DMA request from the I/O device. For demand mode transfers, the DMA controller performs DMA transfers at the fastest possible rate as long as the I/O device asserts its DMA request. When the I/O device unasserts this DMA request, transfers are held off.
processor whenever a channel terminates. DMA controllers also have mechanisms for automatically reprogramming a DMA channel when the DMA transfer sequence completes. These mechanisms include auto initialization and buffer chaining. The auto initialization feature repeats the DMA transfer sequence by reloading the DMA channel's current registers from the base registers at the end of a DMA sequence and re-enabling the channel. Buffer chaining is useful for transferring blocks of data into noncontiguous buffer areas or for handling doublebuffered data acquisition. With buffer chaining, a channel interrupts the CPU and is programmed with the next address and count parameters while DMA transfers are being performed on the current buffer. Some DMA controllers minimize CPU intervention further by having a chain address register that points to a chain control table in memory. The DMA controller then loads its own channel parameters from memory. Generally, the more sophisticated the DMA controller, the less servicing the CPU has to perform. A DMA controller has one or more status registers that are read by the CPU to determine the state of each DMA channel. The status register typically indicates whether a DMA request is asserted on a channel and whether a channel has reached TC. Reading the status register often clears the terminal count information in the register, which leads to problems when multiple programs are trying to use different DMA channels. Steps in a Typical DMA cycle Device wishing to perform DMA asserts the processors bus request signal. 1. Processor completes the current bus cycle and then asserts the bus grant signal to the device. 2. The device then asserts the bus grant ack signal. 3. The processor senses in the change in the state of bus grant ack signal and starts listening to the data and address bus for DMA activity. 4. The DMA device performs the transfer from the source to destination address. 5. During these transfers, the processor monitors the addresses on the bus and checks if any location modified during DMA operations is cached in the processor. If the processor detects a cached address on the bus, it can take one of the two actions:
o
Processor invalidates the internal cache entry for the address involved in DMA write operation Processor updates the internal cache when a DMA write is detected
6. Once the DMA operations have been completed, the device releases the bus by asserting the bus release signal. 7. Processor acknowledges the bus release and resumes its bus cycles from the point it left off.
16(III)
A7 A6 A5 A4 EOP A3 A2 A1 A0 VCC DB0 DB1 DB2 DB3 DB4 DACK0 DACK1 DB5 DB6 DB7
AEN 9 HRQ 10 CS 11 CLK 12 RESET 13 DACK2 14 DACK3 15 DREQ3 16 DREQ2 17 DREQ1 18 DREQ0 19 (GND) VSS 20
16-BIT BUS 16-BIT BUS READ WRITE BUFFER CURRENT ADDRESS (16) CURRENT WORD COUNT (16) OUTPUT BUFFER A8-A15
A4-A7
WRITE BUFFER DREQ0DREQ3 HLDA HRQ DACK0DACK3 4 PRIORITY ENCODER AND ROTATING PRIORITY LOGIC COMMAND (8) MASK (4) REQUEST (4)
READ BUFFER
IO BUFFER DB0-DB7
MODE (4 x 6)
STATUS (8)
TEMPORARY (8)
Signal Description (Fig.16.4 and Fig.16.5) VCC: is the +5V power supply pin GND Ground CLK: CLOCK INPUT: The Clock Input is used to generate the timing signals which control 82C37A operations. CS: CHIP SELECT: Chip Select is an active low input used to enable the controller onto the data bus for CPU communications. RESET: This is an active high input which clears the Command, Status, Request, and Temporary registers, the First/Last Flip-Flop, and the mode register counter. The Mask register is set to ignore requests. Following a Reset, the controller is in an idle cycle. READY: This signal can be used to extend the memory read and write pulses from the 82C37A to accommodate slow memories or I/O devices. HLDA: HOLD ACKNOWLEDGE: The active high Hold Acknowledge from the CPU indicates that it has relinquished control of the system busses. DREQ0-DREQ3: DMA REQUEST: The DMA Request (DREQ) lines are individual asynchronous channel request inputs used by peripheral circuits to obtain DMA service. In Fixed Priority, DREQ0 has the highest priority and DREQ3 has the lowest priority. A request is generated by activating the DREQ line of a channel. DACK will acknowledge the recognition of a DREQ signal. Polarity of DREQ is programmable. RESET initializes these lines to active high. DREQ must be maintained until the corresponding DACK goes active. DREQ will not be recognized while the clock is stopped. Unused DREQ inputs should be pulled High or Low (inactive) and the corresponding mask bit set. DB0-DB7: DATA BUS: The Data Bus lines are bidirectional three-state signals connected to the system data bus. The outputs are enabled in the Program condition during the I/O Read to output the contents of a register to the CPU. The outputs are disabled and the inputs are read during an I/O Write cycle when the CPU is programming the 82C37A control registers. During DMA cycles, the most significant 8-bits of the address are output onto the data bus to be strobed into an external latch by ADSTB. In memory-to-memory operations, data from the memory enters the 82C37A on the data bus during the read-from-memory transfer, then during the write-to-memory transfer, the data bus outputs write the data into the new memory location. IOR: READ: I/O Read is a bidirectional active low three-state line. In the Idle cycle, it is an input control signal used by the CPU to read the control registers. In the Active cycle, it is an output control signal used by the 82C37A to access data from the peripheral during a DMA Write transfer. IOW: WRITE: I/O Write is a bidirectional active low three-state line. In the Idle cycle, it is an input control signal used by the CPU to load information into the 82C37A. In the Active cycle, it is an output control signal used by the 82C37A to load data to the peripheral during a DMA Read transfer. EOP: END OF PROCESS: End of Process (EOP) is an active low bidirectional signal. Information concerning the completion of DMA services is available at the bidirectional EOP pin. The 82C37A allows an external signal to terminate an active DMA service by pulling the EOP pin low. A pulse is generated by the 82C37A when terminal count (TC) for any channel is reached, except for channel 0 in memory-to-memory mode. During memory-to-memory Version 2 EE IIT, Kharagpur 9
transfers, EOP will be output when the TC for channel 1 occurs. The EOP pin is driven by an open drain transistor on-chip, and requires an external pull-up resistor to VCC. When an EOP pulse occurs, whether internally or externally generated, the 82C37A will terminate the service, and if auto-initialize is enabled, the base registers will be written to the current registers of that channel. The mask bit and TC bit in the status word will be set for the currently active channel by EOP unless the channel is programmed for autoinitialize. In that case, the mask bit remains clear. A0-A3: ADDRESS: The four least significant address lines are bidirectional three-state signals. In the Idle cycle, they are inputs and are used by the 82C37A to address the control register to be loaded or read. In the Active cycle, they are outputs and provide the lower 4-bits of the output address. A4-A7: ADDRESS: The four most significant address lines are three-state outputs and provide 4-bits of address. These lines are enabled only during the DMA service. HRQ: HOLD REQUEST: The Hold Request (HRQ) output is used to request control of the system bus. When a DREQ occurs and the corresponding mask bit is clear, or a software DMA request is made, the 82C37A issues HRQ. The HLDA signal then informs the controller when access to the system busses is permitted. For stand-alone operation where the 82C37A always controls the busses, HRQ may be tied to HLDA. This will result in one S0 state before the transfer. DACK0-DACK3: DMA ACKNOWLEDGE: DMA acknowledge is used to notify the individual peripherals when one has been granted a DMA cycle. The sense of these lines is programmable. RESET initializes them to active low. AEN: ADDRESS ENABLE: Address Enable enables the 8-bit latch containing the upper 8 address bits onto the system address bus. AEN can also be used to disable other system bus drivers during DMA transfers. AEN is active high. ADSTB: ADDRESS STROBE: This is an active high signal used to control latching of the upper address byte. It will drive directly the strobe input of external transparent octal latches, such as the 82C82. During block operations, ADSTB will only be issued when the upper address byte must be updated, thus speeding operation through elimination of S1 states. ADSTB timing is referenced to the falling edge of the 82C37A clock. MEMR: MEMORY READ: The Memory Read signal is an active low three-state output used to access data from the selected memory location during a DMA Read or a memory-to-memory transfer. MEMW MEMORY WRITE: The Memory Write signal is an active low three-state output used to write data to the selected memory location during a DMA Write or a memory-to-memory transfer. NC: NO CONNECT: Pin 5 is open and should not be tested for continuity.
Functional Description
The 82C37A direct memory access controller is designed to improve the data transfer rate in systems which must transfer data from an I/O device to memory, or move a block of memory to an I/O device. It will also perform memory-to-memory block moves, or fill a block of memory with data from a single location. Operating modes are provided to handle single byte transfers as Version 2 EE IIT, Kharagpur 10
well as discontinuous data streams, which allows the 82C37A to control data movement with software transparency. The DMA controller is a state-driven address and control signal generator, which permits data to be transferred directly from an I/O device to memory or vice versa without ever being stored in a temporary register. This can greatly increase the data transfer rate for sequential operations, compared with processor move or repeated string instructions. Memory-to-memory operations require temporary internal storage of the data byte between generation of the source and destination addresses, so memory-to-memory transfers take place at less than half the rate of I/O operations, but still much faster than with central processor techniques. The block diagram of the 82C37A is shown in Fig.16.6. The timing and control block, priority block, and internal registers are the main components. The timing and control block derives internal timing from clock input, and generates external control signals. The Priority Encoder block resolves priority contention between DMA channels requesting service simultaneously.
DMA Operation
In a system, the 82C37A address and control outputs and data bus pins are basically connected in parallel with the system busses. An external latch is required for the upper address byte. While inactive, the controllers outputs are in a high impedance state. When activated by a DMA request and bus control is relinquished by the host, the 82C37A drives the busses and generates the control signals to perform the data transfer. The operation performed by activating one of the four DMA request inputs has previously been programmed into the controller via the Command, Mode, Address, and Word Count registers. For example, if a block of data is to be transferred from RAM to an I/O device, the starting address of the data is loaded into the 82C37A Current and Base Address registers for a particular channel, and the length of the block is loaded into the channels Word Count register. The corresponding Mode register is programmed for a memoryto-I/O operation (read transfer), and various options are selected by the Command register and the other Mode register bits. The channels mask bit is cleared to enable recognition of a DMA request (DREQ). The DREQ can either be a hardware signal or a software command. Once initiated, the block DMA transfer will proceed as the controller outputs the data address, simultaneous MEMR and IOW pulses, and selects an I/O device via the DMA acknowledge (DACK) outputs. The data byte flows directly from the RAM to the I/O device. After each byte is transferred, the address is automatically incremented (or decremented) and the word count is decremented. The operation is then repeated for the next byte. The controller stops transferring data when the Word Count register underflows, or an external EOP is applied. To further understand 82C37A operation, the states generated by each clock cycle must be considered. The DMA controller operates in two major cycles, active and idle. After being programmed, the controller is normally idle until a DMA request occurs on an unmasked channel, or a software request is given. The 82C37A will then request control of the system busses and enter the active cycle. The active cycle is composed of several internal states, depending on what options have been selected and what type of operation has been requested. The 82C37A can assume seven separate states, each composed of one full clock period. State I (SI) is the idle state. It is entered when the 82C37A has no valid DMA requests pending, at the end of a transfer sequence, or when a Reset or Master Clear has occurred. While in SI, the DMA controller is inactive but may be in the Program Condition (being programmed by the processor). State 0 (S0) is the first state of a DMA service. The 82C37A has requested a hold but the processor has not yet returned an acknowledge. The 82C37A may still be programmed until it Version 2 EE IIT, Kharagpur 11
has received HLDA from the CPU. An acknowledge from the CPU will signal the DMA transfer may begin. S1, S2, S3, and S4 are the working state of the DMA service. If more time is needed to complete a transfer than is available with normal timing, wait states (SW) can be inserted between S3 and S4 in normal transfers by the use of the Ready line on the 82C37A. For compressed transfers, wait states can be inserted between S2 and S4. Note that the data is transferred directly from the I/O device to memory (or vice versa) with IOR and MEMW (or MEMR and IOW) being active at the same time. The data is not read into or driven out of the 82C37A in I/O-to-memory or memory-to-I/O DMA transfers. Memory-to-memory transfers require a read-from and a write-to memory to complete each transfer. The states, which resemble the normal working states, use two-digit numbers for identification. Eight states are required for a single transfer. The first four states (S11, S12, S13, S14) are used for the read-from-memory half and the last four state (S21, S22, S23, S24) for the write-to-memory half of the transfer.
16(IV)
Conclusion
This lesson has given an overview of DMA controller. The controllers are normally used in highperformance embedded systems where large bulks of data need to transferred from the input to the memory. One such system is a on-board Digital Signal Processor in a mobile telephone. Besides fast digital coding and decoding at times this processor is required to process the voice signals to improve the quality. This has to take place in real time. While the voice message is streaming in through the AD-converter it need to be transferred and windowed for filtering. DMA offers a great help here. For simpler systems DMA is not normally used. The signals and functional architecture of a very familiar DMA controller(8237) used in personal computers has been discussed. For more detailed discussions the readers are requested to visit www.intel.com or any other manufactures and read the datasheet.
16(V)
Q.1. Can you use 82C37A in embedded systems? Justify your answers Ans: Only high performance systems where the power supply constraints are not stringent. The supply voltage is 5V and the current may reach up to 16 mA resulting in 80 mW of power consumption. Q.2 Highlight on different modes of DMA data transfer. Which mode consumes the list power and which mode is the fastest? Ans: Refer to text Q.3. Draw the architecture of 8237 and explain the various parts. Ans: Refer to text
Module 3
Embedded Systems I/O
Version 2 EE IIT, Kharagpur 1
Lesson 17
USB and IrDA
Version 2 EE IIT, Kharagpur 2
Instructional Objectives
After going through this lesson the student would be able to learn basics of The Universal Serial Bus Signals The IrDA standard
Pre-Requisite
Digital Electronics, Microprocessors
17(I)
As personal computers and other microprocessor based embedded systems began handling photographic images, audio, video and other bulky data, the traditional communications buses are not enough to carry the data as fast as it is desired. So a group of leading computer and telecom firms including IBM, Intel, Microsoft, Compaq, Digital Equipment, NEC and Northern Telecom got together and developed USB. The USB is a medium-speed serial data bus designed to carry relatively large amounts of data over relatively short cables: up to about five meters long. It can support data rates of up to 12Mb/s (megabits per second). The USB is an addressable bus system, with a seven-bit address code so it can support up to 127 different devices or nodes at once (the all zeroes code is not a valid address). However it can have only one host. The host with its peripherals connected via the USB forms a star network. On the other hand any device connected to the USB can have a number of other nodes connected to it in daisy-chain fashion, so it can also form the hub for a mini-star sub-network. Similarly you can have a device which purely functions as a hub for other node devices, with no separate function of its own. This expansion via hubs is because the USB supports a tiered star topology, as shown in Fig.17.1. Each USB hub acts as a kind of traffic cop for its part of the network, routing data from the host to its correct address and preventing bus contention clashes between devices trying to send data at the same time. On a USB hub device, the single port used to connect to the host PC either directly or via another hub is known as the upstream port, while the ports used for connecting other devices to the USB are known as the downstream ports. This is illustrated in Fig.17.2. USB hubs work transparently as far as the host PC and its operating system are concerned. Most hubs provide either four or seven downstream ports, or less if they already include a USB device of their own. Another important feature of the USB is that it is designed to allow hot swapping i.e. devices can be plugged into and unplugged from the bus without having to turn the power off and on again, re-boot the PC or even manually start a driver program. A new device can simply be connected to the USB, and the PCs operating system should recognize it and automatically set up the necessary driver to service it.
Phone line
USB Hub Fig. 17.1 The USB is a medium speed serial bus used to transfer data between a PC and its peripherals. It uses a tiered star configuration, with expansion via hubs (either separate, or in USB devices).
PC
Fig. 17.2 The port on a USB device or hub which connects to the PC host (either directly or via another hub) is known as the upstream port, while hub ports which connect to additional USB devices are downstream ports. Downstream ports use Type A sockets, while upstream ports use Type B sockets.
port, so if they require less than this figure for operation they can be bus powered. If they need more, they have to use their own power supply such as a plug-pack adaptor. Hubs should be able to supply up to 500mA at 5V from each downstream port, if they are not bus powered. Serial data is sent along the USB in differential or push-pull mode, with opposite polarities on the two signal lines. This improves the signal-to-noise ratio (SNR), by doubling the effective signal amplitude and also allowing the cancellation of any common-mode noise induced into the cable. The data is sent in non-return-to-zero (NRTZ) format, with signal levels of 3.3V peak (i.e., 6V peak differential). USB cables use two different types of connectors: Type-A plugs for the upstream end, and Type B plugs for the downstream end. Hence the USB ports of PCs are provided with matching Type-A sockets, as are the downstream ports of hubs, while the upstream ports of USB devices (including hubs) have Type B sockets. Type-A plugs and sockets are flat in shape and have the four connections in line, while Type B plugs and sockets are much squarer in shape and have two connections on either side of the centre spigot (Fig.17.3). Both types of connector are polarized so they cannot be inserted the wrong way around. Fig.17.3 shows the pin connections for both type of connector, with sockets shown and viewed from the front. Note that although USB cables having a Type-A plug at each end are available, they should never be used to connect two PCs together, via their USB ports. This is because a USB network can only have one host, and both would try to claim that role. In any case, the cable would also short their 5V power rails together, which could cause a damaging current to flow. USB is not designed for direct data transfer between PCs. All normal USB connections should be made using cables with a Type A plug at one end and a Type B plug at the other, although extension cables with a Type A plug at one end and a Type A socket at the other can also be used, providing the total extended length of a cable doesnt exceed 5m. By the way, USB cables are usually easy to identify as the plugs have a distinctive symbol molded into them (Fig.17.4).
significant-bit (LSB) first. Luckily all of the fine details of USB handshaking and data transfer are looked after by the driver software in the host and the firmware built into the USB controller inside each USB peripheral device and hub
2 1 2 3 4 3 1 4
Pin connections Pin No. Signal 1 + 5V Power 2 - Data 3 + Data 4 Ground Fig. 17.3 Pin connections for the two different types of USB socket, as viewed from the front.
Fig. 17.4 Most USB plugs have this distinctive marking symbol.
SYNC 00000001
PID xxxx,xxxx
CRC xxxxx
Token packets
SYNC 00000001
PID xxxx,xxxx
CRC xxxxx
Data packets
SYNC 00000001
PID xxxx,xxxx
Handshake packets Packet Identifier Nibble Codes: OUTPUT = 0001 INPUT SET UP DATA0 DATA1 ACK NAK STALL = 1001 = 1101 = 0011 = 1011 = 0010 = 1010 = 1110 Hankshake Data Tokens
Fig. 17.5 Examples of the various kinds of USB signaling and data packets.
17(II)
IrDA Standard
IrDA is the abbreviation for the Infrared Data Association, a nonprofit organization for setting standards in IR serial computer connections. The transmission in an IrDAcompatible mode (sometimes called SIR for serial IR) uses, in the simplest case, the RS232 port, a builtin standard of all compatible PCs. With a simple interface, Version 2 EE IIT, Kharagpur 7
shortening the bit length to a maximum of 3/16 of its original length for powersaving requirements, an infrared emitting diode is driven to transmit an optical signal to the receiver. This type of transmission covers the data range up to115.2 kbit/s which is the maximum data rate supported by standard UARTs (Fig.17.7). The minimum demand for transmission speed for IrDA is only 9600 bit/s. All transmissions must be started at this frequency to enable compatibility. Higher speeds are a matter of negotiation of the ports after establishing the links. IR output Pulse shaping UART 16550/RS232 TOIM3000 or TOIM3232 Pulse recovery Transmitter 4000 series transceiver Receiver IR input
Fig. 17.7 One end of the over all serial link. Please browse www.irda.org for details
1 6
9 5 MAX 232
Fig. 17.8(a) A simple circuit for Infrared interface to RS232 port. 7805- is a voltage regulator which supplies 5V to the MAX232 the Level converter. It converts the signal which is at 5V and Ground to 12V compatible with RS232 standard.
VS
OUT
GND
Question
Q.1. From the internet find out a microcontroller with in-built USB port and draw its architecture Ans:
DP0 DM0 XTAL1
CLOCK GENERATOR
DP2 USB HUB REPEATER DM2 DP3 DM3 GPIO PA[0:7] PD[0:6] AVR ADC ADC[0:11] TIMER/ COUNTER
VCC[1,2,A] VSS[1,2,A]
RSTN TEST
VOLTAGE REGULATORS
V33[1,2,A]
The architecture of a typical microcontroller from Atmel with an on-chip USB controller Q.2 Draw the circuit diagram for interfacing an IrDA receiver with a typical microcontroller Version 2 EE IIT, Kharagpur 9
Further Reference
1. www.usb.org 2. www.irda.org
Module 3
Embedded Systems I/O
Version 2 EE IIT, Kharagpur 1
Lesson 18
AD and DA Converters
Version 2 EE IIT, Kharagpur 2
Instructional Objectives
After going through this lesson the student would be able to Learn about Real Time Signal Processing Sampling Theorem DA Conversion Different Methods of AD Conversions o Successive Approximation o Flash o Sigma Delta
Pre-Requisite
Digital Electronics, Microprocessors
18
Introduction
The real time embedded controller is expected to process the real world signals within a specified time. Most of the real world signals are analog in nature. Take the examples of your mobile phone. The overall architecture is shown on Fig.18.1. The Digital Signal Processor (DSP) is fed with the analog data from the microphone. It also receives the digital signals after demodulation from the RF receiver and generates the filtered and noise free analog signal through the speaker. All the processing is done in real time. The processing of signals in real time is termed as Real Time Signal Processing which has been coined beautifully in the Signal Processing industry.
Fig. 18.1 The block diagram The detailed steps of such a processing task is outlined in Fig.18.2 Version 2 EE IIT, Kharagpur 3
Signal Processing Analog Processing Analog Processing Measurand Sensor Conditioner Analog Processor LPF ADC
Fig. 18.2 Real Time Processing of Analog Signals Measurand is the quantity which is measured. In this case it is the analog speech signal. The sensor is a microphone. In case of your mobile set it is the microphone which is embedded in it. The conditioner can be a preamplifier or a demodulator. The Analog Processor mostly is a Low Pass Filter (LPF). This is primarily used to prevent aliasing which is a term to be explained later in this chapter. The following is the Analog to Digital Converter which has a number of stages to convert an analog signal into digital form. The Digital Signal Processing is carried out by a system with a processor. Further the processed signal is converted into analog signal by the Digital to Analog Converter which finally sends the output to the real world through another Low Pass Filter. The functional layout of the ADC and DAC is depicted in Fig.18.3
ADC
x(t) Sampler p(t) xs(t) x(t) Quantizer xq(t) xq(n) b bits Coder [xb(n)]
DAC
b bits [yb(n)] Decoder Sample/hold y(n)
The DA Converter
In theory, the simplest method for digital-to-analog conversion is to pull the samples from memory and convert them into an impulse train. 3 2 Amplitude 1 0 -1 -2 -3 0 1 2 Time Fig. 18.4(a) The analog equivalent of digital words 3 2 Amplitude 1 0 -1 -2 -3 0 4 3 5 Time Fig. 18.4(b) The analog voltage after zero-order hold 1 2 c. Zeroth-order hold 3 4 5 a. Impulse train
3 2 Amplitude 1 0 -1 -2 -3 0 1 2 3 4 5 Time Fig. 18.4(c) The reconstructed analog signal after filtering A digital word (8-bits or 16-bits) can be converted to its analog equivalent by weighted averaging. Fig. 18.5(a) shows the weighted averaging method for a 3-bit converter. A switch connects an input either to a common voltage V or to a common ground. Only switches currently connected to the voltage source contribute current to the non-inverting input summing node. The output voltage is given by the expression drawn below the circuit diagram; SX = 1 if switch X connects to V, SX = 0 if it connects to ground. There are eight possible combinations of connections for the three switches, and these are indicated in the columns of the table to the right of the diagram. Each combination is associated with a decimal integer as shown. The inputs are weighted in a 4:2:1 relationship, so that the sequence of values for 4S3 + 2S2 + S1 form a binary-coded decimal number representation. The magnitude of Vo varies in units (steps) of (Rf/4R)V from 0 to 7. This circuit provides a simplified Digital to Analog Converter (DAC). The digital input controls the switches, and the amplifier provides the analog output. f. Reconstructed analog signal
V R S3 Rf S2 2R + V0 0 1 2 3 4
S3 0 0 0 0 1 1
S2 0 0 1 1 0 0
S1 0 1 0 1 0 1 0 1
S1
4R
1 6 1 V0 = -R f S3 V + S2 V + S1 V R 2R 4R 1 7 1 -R = f V(4S3 + 2S2 + S1) 4R Fig. 18.5(a) The binary weighted register method
V S3 2R
Rf
2R
+ V0
S2
2R
S1
2R
The AD Converter
The ADC consists of a sampler, quantizer and a coder. Each of them is explained below.
Sampler
The sampler in the simplest form is a semiconductor switch as shown below. It is followed by a hold circuit which a capacitor with a very low leakage path.
Analog signal
Sampled signal
Control signal
1 0.8 0.6 0.4 0.2 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
2
Amplitude
Analog Signal
1.5 1 0.5 0 0.5 2 2.5 3 time(ms) Sampled Signal after the capacitor 1 1.5 3.5 4
-0.5 0 2
Amplitude
-0.5 0
Quantizer
The hold circuit tries to maintain a constant voltage till the next switching. The quantizer is responsible to convert this voltage to a binary number. The number of bits in a binary number decides the approximation and accuracy. The sample hand hold output can assume any real number in a given range. However because of finite number of bits (say N) the levels possible in the digital domain 0 to 2N-1 which corresponds to a voltage range of 0 to V volts
Coder
This is an optional device which is used after the conversion is complete. In microprocessor based systems the Coder is responsible for packing several samples and transmitting them onwards either in synchronous or in asynchronous manner. For example in TI DSK kits you will find the AD converters with CODECs are interfaced to McBSP ports (short form of Multichannel Buffered Serial Ports). Several 16-bit sampled values are packed into a frame and transmitted to the processor or to the memory by Direct Memory Access (DMA). The Coder is responsible for controlling the ADC and transferring the Data quickly for processing. Sometimes the Codec is responsible for compressing several samples together and transmitting them. In your desktop computers you will find audio interfaces which can digitize and record your voice and store them in .wav format. Basically this AD conversion followed by coding. The wav format is the Pulse-Code-Modulated (PCM) format of the original digital voice samples.
another way, there are 11.1 samples taken over each complete cycle of the sinusoid. This situation is more complicated than the previous case, because the analog signal cannot be reconstructed by simply drawing straight lines between the data points. Do these samples properly represent the analog signal? The answer is yes, because no other sinusoid, or combination of sinusoids, will produce this pattern of samples (within the reasonable constraints listed below). These samples correspond to only one analog signal, and therefore the analog signal can be exactly reconstructed. Again, an instance of proper sampling. In (c), the situation is made more difficult by increasing the sine wave's frequency to 0.31 of the sampling rate. This results in only 3.2 samples per sine wave cycle. Here the samples are so sparse that they don't even appear to follow the general trend of the analog signal. Do these samples properly represent the analog waveform? Again, the answer is yes, and for exactly the same reason. The samples are a unique representation of the analog signal. All of the information needed to reconstruct the continuous waveform is contained in the digital data. Obviously, it must be more sophisticated than just drawing straight lines between the data points. As strange as it seems, this is proper sampling according to our definition. In (d), the analog frequency is pushed even higher to 0.95 of the sampling rate, with a mere 1.05 samples per sine wave cycle. Do these samples properly represent the data? No, they don't! The samples represent a different sine wave from the one contained in the analog signal. In particular, the original sine wave of 0.95 frequency misrepresents itself as a sine wave of 0.05 frequency in the digital signal. This phenomenon of sinusoids changing frequency during sampling is called aliasing. Just as a criminal might take on an assumed name or identity (an alias), the sinusoid assumes another frequency that is not its own. Since the digital data is no longer uniquely related to a particular analog signal, an unambiguous reconstruction is impossible. There is nothing in the sampled data to suggest that the original analog signal had a frequency of 0.95 rather than 0.05. The sine wave has hidden its true identity completely; the perfect crime has been committed! According to our definition, this is an example of improper sampling. This line of reasoning leads to a milestone in DSP, the sampling theorem. Frequently this is called the Shannon sampling theorem, or the Nyquist Sampling theorem, after the authors of 1940s papers on the topic. The sampling theorem indicates that a continuous signal can be properly sampled, only if it does not contain frequency components above one-half of the sampling rate. For instance, a sampling rate of 2,000 samples/second requires the analog signal to be composed of frequencies below 1000 cycles/second. If frequencies above this limit are present in the signal, they will be aliased to frequencies between 0 and 1000 cycles/second, combining with whatever information that was legitimately there.
3 a. Analog frequency = 0.0 (i.e., DC) 2 Amplitude Amplitude Time (or sample number) 1 0 -1
3
b. Analog frequency = 0.09 of sampling rate
2 1 0 -1
-2 -3
3
c. Analog frequency = 0.31 of sampling rate
3
d. Analog frequency = 0.95 of sampling rate
2 1 0 -1
-2 -3
Methods of AD Conversion
The analog voltage samples are converted to digital equivalent at the quantizer. There are various ways to convert the analog values to the nearest finite length digital word. Some of these methods are explained below.
V DAC V+ 3 7 + V+
OG2 OG1
V+
6 5
V3
V+ V4
V- V7 0
0 U1 V+ uA741
5 ADC 6 1 U2 uA741 G
2 - V4 V8
0 0
100
110
010
111
101
011 000
001
Flash Converter
Making all the comparisons between the digital states and the analog signal concurrently makes for a fast conversion cycle. A resistive voltage divider (see figure) can provide all the digital reference states required. There are eight reference values (including zero) for the three-bit converter illustrated. Note that the voltage reference states are offset so that they are midway between reference step values. The analog signal is compared concurrently with each reference state; therefore a separate comparator is required for each comparison. Digital logic then combines the several comparator outputs to determine the appropriate binary code to present.
V0
3R 2 111
Analog input
R R
101
R
2.5 Vo 8 1.5 Vo 8 0.5 Vo 8
R R
R 2
Sigma-Delta () AD converters
The analog side of a sigma-delta converter (a 1-bit ADC) is very simple. The digital side, which is what makes the sigma-delta ADC inexpensive to produce, is more complex. It performs filtering and decimation. The concepts of over-sampling, noise shaping, digital filtering, and decimation are used to make a sigma-delta ADC.
Over-sampling
First, consider the frequency-domain transfer function of a traditional multi-bit ADC with a sinewave input signal. This input is sampled at a frequency Fs. According to Nyquist theory, Fs must be at least twice the bandwidth of the input signal. When observing the result of an FFT analysis on the digital output, we see a single tone and lots of random noise extending from DC to Fs/2 (Fig.18.13). Known as quantization noise, this effect results from the following consideration: the ADC input is a continuous signal with an infinite number of possible states, but the digital output is a discrete function, whose number of different states is determined by the converter's
resolution. So, the conversion from analog to digital loses some information and introduces some distortion into the signal. The magnitude of this error is random, with values up to LSB.
Fig. 18.14 FFT diagram of a multi-bit ADC with a sampling frequency kFS and effect of Digital Filter on Noise Bandwidth
Noise Shaping
It includes a difference amplifier, an integrator, and a comparator with feedback loop that contains a 1-bit DAC. (This DAC is simply a switch that connects the negative input of the difference amplifier to a positive or a negative reference voltage.) The purpose of the feedback DAC is to maintain the average output of the integrator near the comparator's reference level. The density of "ones" at the modulator output is proportional to the input signal. For an increasing input the comparator generates a greater number of "ones," and vice versa for a decreasing input. By summing the error voltage, the integrator acts as a lowpass filter to the input signal and a highpass filter to the quantization noise. Thus, most of the quantization noise is pushed into higher frequencies. Oversampling has changed not the total noise power, but its distribution. If we apply a digital filter to the noise-shaped delta-sigma modulator, it removes more noise than does simple oversampling.(Fig.18.16).
Signal Input, X1
X2
X3 Integrator
Difference Amp X5
X4
To Digital Filter
(1-bit ADC) Fig. 18.15 Block Diagram of 1-bit Sigma Delta Converter
Fig. 18.16 The Effect of Integrator and Digital Filter on the Spectrum
1-bit Data Stream Analog Input Delta Sigma Modulator Digital Low Pass Filter
Output Data
Digital Filtering
The output of the sigma-delta modulator is a 1-bit data stream at the sampling rate, which can be in the megahertz range. The purpose of the digital-and-decimation filter (Fig.18.17) is to extract information from this data stream and reduce the data rate to a more useful value. In a sigmadelta ADC, the digital filter averages the 1-bit data stream, improves the ADC resolution, and removes quantization noise that is outside the band of interest. It determines the signal bandwidth, settling time, and stop band rejection.
Conclusion
In this chapter you have learnt about the basics of Real Time Signal Processing, DA and AD conversion methods. Some microcontrollers are already equipped with DA and AD converters on the same chip. Generally the real world signals are broad band. For instance a triangular wave though periodic will have frequencies ranging till infinite. Therefore anti-aliasing filter is always desirable before AD conversion. This limits the signal bandwidth and hence finite sampling frequency. The question answer session shall discuss about the quantization error, specifications of the AD and DA converters and errors at the various stages of real time signal processing. The details of interfacing shall be discussed in the next lesson. The AD and DA converter fall under mixed VLSI circuits. The digital and analog circuits coexist on the same chip. This poses design difficulties for VLSI engineers for embedding fast and high resolution AD converters along with the processors. Sigma-Delta ADCs are most complex and hence rarely found embedded on microcontrollers.
Question Answers
Q1. What are the errors at different stages in a Real Time Signal Processing system? Elaborate on the quantization error.
Ans: No. of bits (8-bits, 16-bits etc), Settling Time, Power Supply range, Power Consumption, Various Temperature ratings, Packaging
Q3. What are the various specifications of an A-D converter?
Ans: No. of bits (8-bits, 16-bits etc), No. of channels, Conversion Time, Power Supply range, Power Consumption, Various Temperature ratings, Packaging
Q4. How to construct a second order Delta-Sigma AD Converter.
Module 3
Embedded Systems I/O
Version 2 EE IIT, Kharagpur 1
Lesson 19
Analog Interfacing
Version 2 EE IIT, Kharagpur 2
Instructional Objectives
After going through this lesson the student would be able to Know the interfacing of analog signals to microcontrollers/microprocessors Generating Analog Signals Designing AD and DA interfaces Various Methods of acquiring and generating analog data
Pre-Requisite
Digital Electronics, Microprocessors
19(I)
Introduction
Fig.19.1 shows a typical sensor network. You will find a number of sensors and actuators connected to a common bus to share information and derive a collective decision. This is a complex embedded system. Digital camera falls under such a system. Only the analog signals are shown here. Last lesson discussed in detail about the AD and DA conversion methods. This chapter shall discuss the inbuilt AD-DA converter and standalone converters and their interfacing.
Fig. 19.2 The Analog-Digital-Analog signal path with real time processing
19(II)
Fig.19.3 shows the block diagram of the AD converter inbuilt to 80196 embedded processor. The details of the subsystems are given as follows:
Analog Inputs VREF ANGND Control Logic Status AD_COMMAND AD_TIME AD_TEST EPA or PTS Command
Fig. 19.3 The block diagram of the Internal AD converter Analog Inputs: There are 12 input channels which are multiplexed with the Port P0 and Port P1 of the processor.
ANGND: It is the analog ground which is separately connected to the circuit from where analog voltage is brought inside the processor. Vref: It is reference voltage which decides the range of the input voltage. By making it negative bipolar inputs can be used.
AD_RESULT
For an A/D conversion, the high byte contains the eight MSBs from the conversion, while the low byte contains the two LSBs from a 10- bit conversion (undefined for an 8-bit conversion), indicates which A/D channel was used, and indicates whether the channel is idle. For a Version 2 EE IIT, Kharagpur 5
threshold-detection, calculate the value for the successive approximation register and write that value to the high byte of AD_RESULT. Clear the low byte or leave it in its default state. AD_TEST A/D Conversion Test This register specifies adjustments for zero-offset errors. AD_TIME A/D Conversion Time This register defines the sample window time and the conversion time for each bit. INT_MASK Interrupt Mask The AD bit in this register enables or disables the A/D interrupt. Set the AD bit to enable the interrupt request. INT_PEND Interrupt Pending The AD bit in this register, when set, indicates that an A/D interrupt request is pending.
the analog input, performing a binary search for the reference voltage that most closely matches the input. The full scale reference voltage is the first tested. This corresponds to a 10-bit result where the most-significant bit is zero and all other bits are ones (0111111111). If the analog input was less than the test voltage, bit 10 of the SAR is left at zero, and a new test voltage of full scale (0011111111) is tried. If the analog input was greater than the test voltage, bit 9 of SAR is set. Bit 8 is then cleared for the next test (0101111111). This binary search continues until 10 (or 8) tests have occurred, at which time the valid conversion result resides in the AD_RESULT register where it can be read by software. The result is equal to the ratio of the input voltage divided by the analog supply voltage. If the ratio is 1.00, the result will be all ones. The following A/D converter parameters are programmable: conversion input input channel zero-offset adjustment no adjustment, plus 2.5 mV, minus 2.5 mV, or minus 5.0 mV conversion times sample window time and conversion time for each bit operating mode 8- or 10-bit conversion or 8-bit high or low threshold detection conversion trigger immediate or EPA starts
19(III)
8-BIT A/D CONTROL & TIMING 8 CHANNELS MULTIPLEXING ANALOG SWITCHES END OF CONVERSION (INTERRUPT)
8 ANALOG INPUTS
8-BIT OUTPUTS
SWITCH TREE 3-BIT ADDRESS ADDRESS LATCH ENABLE ADDRESS LATCH AND DECODER
1 2 3 4 5 6 7 8 9 10 11 12 13 14
28 27 26 25 24 23 22 21 20 19 18 17 16 15
IN2 IN1 IN0 ADD A ADD B ADD C ALE 2-1MSB 2-2 2-3 2-4 2-8LSB VREF (-) 2-6
A L H L H L H L H
The Converter
This 8-bit converter is partitioned into 3 major sections: the 256R ladder network, the successive approximation register, and the comparator. The converters digital outputs are positive true. The Version 2 EE IIT, Kharagpur 8
256R ladder network approach (Figure 1) was chosen over the conventional R/2R ladder because of its inherent monotonicity, which guarantees no missing digital codes. Monotonicity is particularly important in closed loop feedback control systems. A non-monotonic relationship can cause oscillations that will be catastrophic for the system. Additionally, the 256R network does not cause load variations on the reference voltage.
CONTROLS FROM S.A.R. REF(+)
1 R
REF(-)
Fig. 19.6 The 256R ladder network The bottom resistor and the top resistor of the ladder network in Fig.19.6 are not the same value as the remainder of the network. The difference in these resistors causes the output characteristic to be symmetrical with the zero and full-scale points of the transfer curve. The first output transition occur when the analog signal has reached +12 LSB and succeeding output transitions occur every 1 LSB later up to full-scale. The successive approximation register (SAR) performs 8-iterations to approximate the input voltage. For any SAR type converter, n-iterations are required for an n-bit converter. Fig.19.7 shows a typical example of a 3-bit converter. The A/D converters successive approximation register (SAR) is reset on the positive edge of the start conversion (SC) pulse. The conversion is begun on the falling edge of the start conversion pulse. A conversion in process will be interrupted by receipt of a new start conversion pulse. Continuous conversion may be accomplished by tying the end-of-conversion (EOC) output to the SC input. If used in this mode, an external start conversion pulse should be applied after power up. End-of-conversion will go low between 0 and 8 clock pulses after the rising edge of start conversion. The most important section of the A/D converter is the comparator. It is this section which is responsible for the ultimate accuracy of the entire converter.
111 A/D OUTPUT CODE 110 101 100 011 010 001 IDEAL CURVE
+1/2 LSB TOTAL UNADJUSTED 110 ERROR 101 111 100 011 010 001
ZERO ERROR = -1/4 LSB 000 VIN 0/8 1/8 2/8 3/8 4/8 5/8 6/8 7/8 VIN AS FRACTION OF FULL-SCALE
-1/2 LSB QUANTIZATION ERROR 000 VIN 0/8 1/8 2/8 3/8 4/8 5/8 6/8 7/8 VIN AS FRACTION OF FULL-SCALE
0E E0C DB7 DB6 DB5 DB4 DB3 DB2 DB1 DB0 LSB INTERRUPT MSB
2-8
19(IV)
The DAC0808 is an 8-bit monolithic digital-to-analog converter (DAC). Fig.19.9 shows the architecture and pin diagram of such a chip.
MSB A1 A2 LSB A8
A3
A4
A5
A6
A7
RANGE CONTROL
CURRENT SWITCHES
I0
R-2R LADDER
BIAS CIRCUIT
GND
VREF (+) NPN CURRENT SOURCE PAIR VREF (-) REFERENCE CURRENT AMP
VCC
COMPEN
VEE
16 15 14 13 12 11 10 9
A3 7 A4 8
Fig. 19.9 The DAC 0808 Signals The pins are labeled A1 through A8, but note that A1 is the Most Significant Bit, and A8 is the Least Significant Bit (the opposite of the normal convention). The D/A converter has an output current, instead of an output voltage. An op-amp converts the current to a voltage. The output current from pin 4 ranges between 0 (when the inputs are all 0) to Imax*255/256 when all the inputs are 1. The current, Imax, is determined by the current into pin 14 (which is at 0 volts). Since we are using 8 bits, the maximum value is Imax*255/256. The output of the D/A converter takes some time to settle. Therefore there should be a small delay before sending the next data to the DA. However this delay is very small compared to the conversion time of an AD Converter, therefore, does not matter in most real time signal processing platforms. Fig.19.10 shows a typical interface.
VCC = 5V
13 5 14 6 7 15 8 2 9 DAC0808 10 4 11 12 16 3
VEE = -15V
Fig. 19.10 Typical connection of DAC0808 LF351 is an operational amplifier used as current to proportional voltage converter. The 8-digital inputs at A8-A1 is converted into proportional current at pin no.4 of the DAC. The reference voltages(10V) are supplied at pin 14 and 15(grounded through resistance). A capacitor is connected across the Compensation pin 16 and the negative supply to bypass high frequency noise. Important Specifications 0.19% Error Settling time: 150 ns Slew rate: 8 mA/s Power supply voltage range: 4.5V to 18V Power consumption: 33 mW @ 5V
19(V)
Conclusion
In this lesson you learnt about the following The internal AD converters of 80196 family of processor The external microprocessor compatible AD0809 converter A typical 8-bit DA Converter Both the ADCs use successive approximation technique. Flash ADCs are complex and therefore generate difficult VLSI circuits unsuitable for coexistence on the same chip. Sigma-Delta need very high sampling rate.
Question Answers
Q.1. What are the possible errors in a system as shown in Fig. 19.2? Ans: Stage-1 Signal Amplification and Conditioning This can also amplify the noise. Stage-2 Anti-aliasing Filter Some useful information such as transients in the real systems cannot be captured. Stage-3 Sample and Hold The leakage and electromagnetic interference due to switching Stage-4 Analog to Digital Converter Quantization error due to finite bit length Stage-5 Digital Processing and Data manipulation in a Processor: Numerical round up errors due to finite word length and the delay caused by the algorithm. Stage-6 Processed Digital Values are temporarily stored in a latch before D-A conversion: Error in reconstruction due to zero-order approximation Q.2 Why it is necessary to separate the digital ground from analog ground in a typical ADC? Ans: Digital circuit noise can get to analogue signal path if separate grounding systems are not used for digital and analogue parts. Digital grounds are invariably noisier than analog grounds because of the switching noise generated in digital chips when they change state. For large current transients, PCB trace inductances causes voltage drops between various ground points on the board (ground bounce). Ground bounce translates into varying voltage level bounce on signal lines. For digital lines this isn't a problem unless one crosses a logic threshold. For analog it's just plain noise to be added to the signals.
Module 4
Design of Embedded Processors
Version 2 EE IIT, Kharagpur
1
Lesson 20
Field Programmable Gate Arrays and Applications
Version 2 EE IIT, Kharagpur
2
Instructional Objectives
After going through this lesson the student will be able to Define what is a field programmable gate array (FPGA) Distinguish between an FPGA and a stored-memory processor List and explain the principle of operation of the various functional units within an FPGA Compare the architecture and performance specifications of various commercially available FPGA Describe the steps in using an FPGA in an embedded system
Introduction
An FPGA is a device that contains a matrix of reconfigurable gate array logic circuitry. When a FPGA is configured, the internal circuitry is connected in a way that creates a hardware implementation of the software application. Unlike processors, FPGAs use dedicated hardware for processing logic and do not have an operating system. FPGAs are truly parallel in nature so different processing operations do not have to compete for the same resources. As a result, the performance of one part of the application is not affected when additional processing is added. Also, multiple control loops can run on a single FPGA device at different rates. FPGA-based control systems can enforce critical interlock logic and can be designed to prevent I/O forcing by an operator. However, unlike hard-wired printed circuit board (PCB) designs which have fixed hardware resources, FPGA-based systems can literally rewire their internal circuitry to allow reconfiguration after the control system is deployed to the field. FPGA devices deliver the performance and reliability of dedicated hardware circuitry. A single FPGA can replace thousands of discrete components by incorporating millions of logic gates in a single integrated circuit (IC) chip. The internal resources of an FPGA chip consist of a matrix of configurable logic blocks (CLBs) surrounded by a periphery of I/O blocks shown in Fig. 20.1. Signals are routed within the FPGA matrix by programmable interconnect switches and wire routes.
PROGRAMMABLE INTERCONNECT
I/O BLOCKS
LOGIC BLOCKS
Fig. 20.1 Internal Structure of FPGA In an FPGA logic blocks are implemented using multiple level low fan-in gates, which gives it a more compact design compared to an implementation with two-level AND-OR logic. FPGA provides its user a way to configure: 1. The intersection between the logic blocks and 2. The function of each logic block. Logic block of an FPGA can be configured in such a way that it can provide functionality as simple as that of transistor or as complex as that of a microprocessor. It can used to implement different combinations of combinational and sequential logic functions. Logic blocks of an FPGA can be implemented by any of the following: 1. 2. 3. 4. 5. Transistor pairs combinational gates like basic NAND gates or XOR gates n-input Lookup tables Multiplexers Wide fan-in And-OR structure.
Routing in FPGAs consists of wire segments of varying lengths which can be interconnected via electrically programmable switches. Density of logic block used in an FPGA depends on length and number of wire segments used for routing. Number of segments used for interconnection typically is a tradeoff between density of logic blocks used and amount of area used up for routing. Simplified version of FPGA internal architecture with routing is shown in Fig. 20.2.
Evaluation of FPGA
In the world of digital electronic systems, there are three basic kinds of devices: memory, microprocessors, and logic. Memory devices store random information such as the contents of a Version 2 EE IIT, Kharagpur
5
spreadsheet or database. Microprocessors execute software instructions to perform a wide variety of tasks such as running a word processing program or video game. Logic devices provide specific functions, including device-to-device interfacing, data communication, signal processing, data display, timing and control operations, and almost every other function a system must perform. The first type of user-programmable chip that could implement logic circuits was the Programmable Read-Only Memory (PROM), in which address lines can be used as logic circuit inputs and data lines as outputs. Logic functions, however, rarely require more than a few product terms, and a PROM contains a full decoder for its address inputs. PROMS are thus an inefficient architecture for realizing logic circuits, and so are rarely used in practice for that purpose. The device that came as a replacement for the PROMs are programmable logic devices or in short PLA. Logically, a PLA is a circuit that allows implementing Boolean functions in sum-of-product form. The typical implementation consists of input buffers for all inputs, the programmable AND-matrix followed by the programmable OR-matrix, and output buffers. The input buffers provide both the original and the inverted values of each PLA input. The input lines run horizontally into the AND matrix, while the so-called product-term lines run vertically. Therefore, the size of the AND matrix is twice the number of inputs times the number of product-terms. When PLAs were introduced in the early 1970s, by Philips, their main drawbacks were that they were expensive to manufacture and offered somewhat poor speed-performance. Both disadvantages were due to the two levels of configurable logic, because programmable logic planes were difficult to manufacture and introduced significant propagation delays. To overcome these weaknesses, Programmable Array Logic (PAL) devices were developed. PALs provide only a single level of programmability, consisting of a programmable wired AND plane that feeds fixed OR-gates. PALs usually contain flip-flops connected to the OR-gate outputs so that sequential circuits can be realized. These are often referred to as Simple Programmable Logic Devices (SPLDs). Fig. 20.3 shows a simplified structure of PLA and PAL.
Inputs
PLA
Inputs
PAL
Outputs
Outputs
With the advancement of technology, it has become possible to produce devices with higher capacities than SPLDs.As chip densities increased, it was natural for the PLD manufacturers to evolve their products into larger (logically, but not necessarily physically) parts called Complex Programmable Logic Devices (CPLDs). For most practical purposes, CPLDs can be thought of as multiple PLDs (plus some programmable interconnect) in a single chip. The larger size of a CPLD allows to implement either more logic equations or a more complicated design.
Logic block
Logic block
Fig. 20.4 Internal structure of a CPLD Fig. 20.4 contains a block diagram of a hypothetical CPLD. Each of the four logic blocks shown there is the equivalent of one PLD. However, in an actual CPLD there may be more (or less) than four logic blocks. These logic blocks are themselves comprised of macrocells and interconnect wiring, just like an ordinary PLD. Unlike the programmable interconnect within a PLD, the switch matrix within a CPLD may or may not be fully connected. In other words, some of the theoretically possible connections between logic block outputs and inputs may not actually be supported within a given CPLD. The effect of this is most often to make 100% utilization of the macrocells very difficult to achieve. Some hardware designs simply won't fit within a given CPLD, even though there are sufficient logic gates and flip-flops available. Because CPLDs can hold larger designs than PLDs, their potential uses are more varied. They are still sometimes used for simple applications like address decoding, but more often contain high-performance control-logic or complex finite state machines. At the high-end (in terms of numbers of gates), there is also a lot of overlap in potential applications with FPGAs. Traditionally, CPLDs have been chosen over FPGAs whenever high-performance logic is required. Because of its less flexible internal architecture, the delay through a CPLD (measured in nanoseconds) is more predictable and usually shorter. The development of the FPGA was distinct from the SPLD/CPLD evolution just described.This is apparent from the architecture of FPGA shown in Fig 20.1. FPGAs offer the highest amount of logic density, the most features, and the highest performance. The largest FPGA now shipping, part of the Xilinx Virtex line of devices, provides eight million "system gates" (the relative density of logic). These advanced devices also offer features such as built-in hardwired processors (such as the IBM Power PC), substantial amounts of memory, clock management systems, and support for many of the latest, very fast device-to-device signaling technologies. FPGAs are used in a wide variety of applications ranging from data processing and storage, to instrumentation, telecommunications, and digital signal processing. The value of programmable logic has always been its ability to shorten development cycles for electronic equipment manufacturers and help them get their product to market faster. As PLD (Programmable Logic Device) suppliers continue to integrate more functions inside their devices, reduce costs, and increase the availability of time-saving IP cores, programmable logic is certain to expand its popularity with digital designers. Version 2 EE IIT, Kharagpur
7
Symmetrical arrays
This architecture consists of logic elements (called CLBs) arranged in rows and columns of a matrix and interconnect laid out between them shown in Fig 20.2. This symmetrical matrix is surrounded by I/O blocks which connect it to outside world. Each CLB consists of n-input Lookup table and a pair of programmable flip flops. I/O blocks also control functions such as tristate control, output transition speed. Interconnects provide routing path. Direct interconnects between adjacent logic elements have smaller delay compared to general purpose interconnect
Hierarchical PLDs
This architecture is designed in hierarchical manner with top level containing only logic blocks and interconnects. Each logic block contains number of logic modules. And each logic module has combinatorial as well as sequential functional elements. Each of these functional elements is controlled by the programmed memory. Communication between logic blocks is achieved by programmable interconnect arrays. Input output blocks surround this scheme of logic blocks and interconnects. This type of architecture is shown in Fig 20.6.
Logic Module
I/O Block
I/O Block
Interconnects
I/O Block
FPGA AntifuseProgrammed
SRAMProgrammed
EEPROMProgrammed
SRAM Based
The major advantage of SRAM based device is that they are infinitely re-programmable and can be soldered into the system and have their function changed quickly by merely changing the contents of a PROM. They therefore have simple development mechanics. They can also be changed in the field by uploading new application code, a feature attractive to designers. It does however come with a price as the interconnect element has high impedance and capacitance as well as consuming much more area than other technologies. Hence wires are very expensive and slow. The FPGA architect is therefore forced to make large inefficient logic modules (typically a look up table or LUT).The other disadvantages are: They needs to be reprogrammed each time when power is applied, needs an external memory to store program and require large area. Fig. 20.8 shows two applications of SRAM cells: for controlling the gate nodes of pass-transistor switches and to control the select lines of multiplexers that drive logic block inputs. The figures gives an example of the connection of one logic block (represented by the AND-gate in the upper left corner) to another through two pass-transistor switches, and then a multiplexer, all controlled by SRAM cells . Whether an FPGA uses pass-transistors or multiplexers or both depends on the particular product.
Logic Cell
SRAM
Logic Cell
SRAM
SRAM
Logic Cell
Logic Cell
Antifuse Based
The antifuse based cell is the highest density interconnect by being a true cross point. Thus the designer has a much larger number of interconnects so logic modules can be smaller and more efficient. Place and route software also has a much easier time. These devices however are only one-time programmable and therefore have to be thrown out every time a change is made in the design. The Antifuse has an inherently low capacitance and resistance such that the fastest parts are all Antifuse based. The disadvantage of the antifuse is the requirement to integrate the fabrication of the antifuses into the IC process, which means the process will always lag the SRAM process in scaling. Antifuses are suitable for FPGAs because they can be built using modified CMOS technology. As an example, Actels antifuse structure is depicted in Fig. 20.9. The figure shows that an antifuse is positioned between two interconnect wires and physically consists of three sandwiched layers: the top and bottom layers are conductors, and the middle layer is an insulator. When unprogrammed, the insulator isolates the top and bottom layers, but when programmed the insulator changes to become a low-resistance link. It uses Poly-Si and n+ diffusion as conductors and ONO as an insulator, but other antifuses rely on metal for conductors, with amorphous silicon as the middle layer.
oxide dielectric
Poly-Si
EEPROM Based
The EEPROM/FLASH cell in FPGAs can be used in two ways, as a control device as in an SRAM cell or as a directly programmable switch. When used as a switch they can be very efficient as interconnect and can be reprogrammable at the same time. They are also non-volatile so they do not require an extra PROM for loading. They, however, do have their detractions. The EEPROM process is complicated and therefore also lags SRAM technology.
Transistor Pair
Fig. 20.10 Transistor pair tiles in cross-point FPGA. second type of logic blocks are RAM logic which can be used to implement random access memory. Plessey FPGA: Basic building block here is 2-input NAND gate which is connected to each other to implement desired function.
Latch 8-2 multiplexer
CLK Data
8 interconnect lines
Config RAM
Both Crosspoint and Plessey are fine grain logic blocks. Fine grain logic blocks have an advantage in high percentage usage of logic blocks but they require large number of wire segments and programmable switches which occupy lot of area. Actel Logic Block: If inputs of a multiplexer are connected to a constant or to a signal, it can be used to implement different logic functions. For example a 2-input multiplexer with inputs a and b, select, will implement function ac + bc. If b=0 then it will implement ac, and if a=0 it will implement bc.
w x 0 1 0 n1 y z 0 1 n3 n4 n2 1
Fig. 20.12 Actel Logic Block Typically an Actel logic block consists of multiple number of multiplexers and logic gates.
Data in
M U X
S R X Outputs Y
Inputs
A B C D E
Look-up Table
M U X
S R
Vix
OR
A k-input logic function is implemented using 2^k * 1 size SRAM. Number of different possible functions for k input LUT is 2^2^k. Advantage of such an architecture is that it supports implementation of so many logic functions, however the disadvantage is unusually large number of memory cells required to implement such a logic block in case number of inputs is large. Fig. 20.13 shows 5-input LUT based implementation of logic block LUT based design provides for better logic block utilization. A k-input LUT based logic block can be implemented in number of different ways with tradeoff between performance and logic density.
Set by configuration bit-stream
Logic Block
latch
An n-lut can be shown as a direct implementation of a function truth-table. Each of the latch holds the value of the function corresponding to one input combination. For Example: 2-lut shown in figure below implements 2 input AND and OR functions.
Example: 2-lut INPUTS AND OR 00 01 10 11 0 0 0 1 0 1 1 1
a combination of FPGA and CPLD technologies. FLEX 8000 is SRAM-based and features a four-input LUT as its basic logic block. Logic capacity ranges from about 4000 gates to more than 15,000 for the 8000 series. The overall architecture of FLEX 8000 is illustrated in Fig. 20.14.
I/O I/O
Fig. 20.14 Architecture of Altera FLEX 8000 FPGAs. The basic logic block, called a Logic Element (LE) contains a four-input LUT, a flip-flop, and special-purpose carry circuitry for arithmetic circuits. The LE also includes cascade circuitry that allows for efficient implementation of wide AND functions. Details of the LE are illustrated in Fig. 20.15.
Cascade out
Look-up Table
Cascade
S DQ R
LE out
Carry in
Carry
Carry out
set/clear
clock
Fig. 20.15 Altera FLEX 8000 Logic Element (LE). Version 2 EE IIT, Kharagpur 15
In the FLEX 8000, LEs are grouped into sets of 8, called Logic Array Blocks (LABs, a term borrowed from Alteras CPLDs). As shown in Fig. 20.16, each LAB contains local interconnect and each local wire can connect any LE to any other LE within the same LAB. Local interconnect also connects to the FLEX 8000s global interconnect, called FastTrack. All FastTrack wires horizontal wires are identical, and so interconnect delays in the FLEX 8000 are more predictable than FPGAs that employ many smaller length segments because there are fewer programmable switches in the longer path
From Fast Track interconnect cntrl Cascade, carry 4 2 data 4 LE To Fast Track interconnect
Local interconnect
LE
LE
System Design
At this stage designer has to decide what portion of his functionality has to be implemented on FPGA and how to integrate that functionality with rest of the system.
Design Description
Designer describes design functionality either by using schematic editors or by using one of the various Hardware Description Languages (HDLs) like Verilog or VHDL.
Synthesis
Once design has been defined CAD tools are used to implement the design on a given FPGA. Synthesis includes generic optimization, slack optimizations, power optimizations followed by placement and routing. Implementation includes Partition, Place and route. The output of design implementation phase is bit-stream file.
Design Verification
Bit stream file is fed to a simulator which simulates the design functionality and reports errors in desired behavior of the design. Timing tools are used to determine maximum clock frequency of the design. Now the design is loading onto the target FPGA device and testing is done in real environment.
Design Entry
Simulation
Design Constraints
Synthesis
Fig. 20.17 Programmable logic design process Typically, the design entry step is followed or interspersed with periods of functional simulation. That's where a simulator is used to execute the design and confirm that the correct outputs are produced for a given set of test inputs. Although problems with the size or timing of the hardware may still crop up later, the designer can at least be sure that his logic is functionally correct before going on to the next stage of development. Compilation only begins after a functionally correct representation of the hardware exists. This hardware compilation consists of two distinct steps. First, an intermediate representation of the hardware design is produced. This step is called synthesis and the result is a representation called a netlist. The netlist is device independent, so its contents do not depend on the particulars of the FPGA or CPLD; it is usually stored in a standard format called the Electronic Design Interchange Format (EDIF). The second step in the translation process is called place & route. This step involves mapping the logical structures described in the netlist onto actual macrocells, interconnections, and input and output pins. This process is similar to the equivalent step in the development of a printed circuit board, and it may likewise allow for either automatic or manual layout optimizations. The result of the place & route process is a bitstream. This name is used generically, despite the fact that each CPLD or FPGA (or family) has its own, usually proprietary, bitstream format. Suffice it to say that the bitstream is the binary data that must be loaded into the FPGA or CPLD to cause that chip to execute a particular hardware design. Increasingly there are also debuggers available that at least allow for single-stepping the hardware design as it executes in the programmable logic device. But those only complement a simulation environment that is able to use some of the information generated during the place & route step to provide gate-level simulation. Obviously, this type of integration of device-specific information into a generic simulator requires a good working relationship between the chip and simulation tool vendors.
Things to Ponder
Q.1 Define the following acronyms as they apply to digital logic circuits: ASIC PAL PLA PLD CPLD FPGA Q2.How granularity of logic block influences the performance of an FPGA? Q3. Why would anyone use programmable logic devices (PLD, PAL, PLA, CPLD, FPGA, etc.) in place of traditional "hard-wired" logic such as NAND, NOR, AND, and OR gates? Are there any applications where hard-wired logic would do a better job than a programmable device? Q4.Some programmable logic devices (and PROM memory devices as well) use tiny fuses which are intentionally "blown" in specific patterns to represent the desired program. Programming a device by blowing tiny fuses inside of it carries certain advantages and disadvantages - describe what some of these are. Q5. Use one 4 x 8 x 4 PLA to implement the function. F1 ( w, x, y, z ) = wx ' y ' z + wx ' yz '+ wxy ' F2 ( w, x, y, z )= wx ' y + x ' y ' z
Module 4
Design of Embedded Processors
Version 2 EE IIT, Kharagpur 1
Lesson 21
Introduction to Hardware Description Languages - I
Version 2 EE IIT, Kharagpur 2
Instructional Objectives
At the end of the lesson the student should be able to Describe a digital IC design flow and explain its various abstraction levels. Explain the need for a hardware description language in the IC desing flow Model simple hardware devices at various levels of abstraction using Verilog (Gate/Switch/Behavioral) Write Verilog codes meeting the prescribed requirement at a specified level
1.1 1.1.1
HDL is an abbreviation of Hardware Description Language. Any digital system can be represented in a REGISTER TRANSFER LEVEL (RTL) and HDLs are used to describe this RTL. Verilog is one such HDL and it is a general-purpose language easy to learn and use. Its syntax is similar to C. The idea is to specify how the data flows between registers and how the design processes the data. To define RTL, hierarchical design concepts play a very significant role. Hierarchical design methodology facilitates the digital design flow with several levels of abstraction. Verilog HDL can utilize these levels of abstraction to produce a simplified and efficient representation of the RTL description of any digital design. For example, an HDL might describe the layout of the wires, resistors and transistors on an Integrated Circuit (IC) chip, i.e., the switch level or, it may describe the design at a more micro level in terms of logical gates and flip flops in a digital system, i.e., the gate level. Verilog supports all of these levels.
1.1.2
Bottom-Up Design
The traditional method of electronic design is bottom-up (designing from transistors and moving to a higher level of gates and, finally, the system). But with the increase in design complexity traditional bottom-up designs have to give way to new structural, hierarchical design methods.
Top-Down Design
For HDL representation it is convenient and efficient to adapt this design-style. A real top-down design allows early testing, fabrication technology independence, a structured system design and offers many other advantages. But it is very difficult to follow a pure top-down design. Due to this fact most designs are mix of both the methods, implementing some key elements of both design styles.
1.1.3
To follow the hierarchical design concepts briefly mentioned above one has to describe the design in terms of entities called MODULES.
Modules
A module is the basic building block in Verilog. It can be an element or a collection of low level design blocks. Typically, elements are grouped into modules to provide common functionality used in places of the design through its port interfaces, but hides the internal implementation.
1.1.4
Abstraction Levels
Behavioral level Register-Transfer Level Gate Level Switch level
Register-Transfer Level
Designs using the Register-Transfer Level specify the characteristics of a circuit by operations and the transfer of data between the registers. An explicit clock is used. RTL design contains exact timing possibility, operations are scheduled to occur at certain times. Modern definition of a RTL code is "Any code that is synthesizable is called RTL code".
Gate Level
Within the logic level the characteristics of a system are described by logical links and their timing properties. All signals are discrete signals. They can only have definite logical values (`0', `1', `X', `Z`). The usable operations are predefined logic primitives (AND, OR, NOT etc gates). It must be indicated here that using the gate level modeling may not be a good idea in logic design. Gate level code is generated by tools like synthesis tools in the form of netlists which are used for gate level simulation and for backend.
Switch Level
This is the lowest level of abstraction. A module can be implemented in terms of switches, storage nodes and interconnection between them. However, as has been mentioned earlier, one can mix and match all the levels of abstraction in a design. RTL is frequently used for Verilog description that is a combination of behavioral and dataflow while being acceptable for synthesis.
Instances
A module provides a template from where one can create objects. When a module is invoked Verilog creates a unique object from the template, each having its own name, variables, parameters and I/O interfaces. These are known as instances.
1.1.5
This block diagram describes a typical design flow for the description of the digital design for both ASIC and FPGA realizations.
Post Si Validation
TOOLS USED Word processor like Word, Kwriter, AbiWord, Open Office Word processor like Word, Kwriter, AbiWord, for drawing waveform use tools like waveformer or testbencher or Word, Open Office. Word processor like Word, Kwriter, AbiWord, for drawing waveform use tools like waveformer or testbencher or Word. For FSM StateCAD or some similar tool, Open Office Vim, Emacs, conTEXT, HDL TurboWriter Modelsim, VCS, Verilog-XL, Veriwell, Finsim, iVerilog, VeriDOS Design Compiler, FPGA Compiler, Synplify, Leonardo Spectrum. You can download this from FPGA vendors like Altera and Xilinx for free For FPGA use FPGA' vendors P&R tool. ASIC tools require expensive P&R tools like Apollo. Students can use LASI, Magic For ASIC and FPGA, the chip needs to be tested in real environment. Board design, device drivers needs to be in place
Specification
This is the stage at which we define the important parameters of the system that has to be designed. For example for designing a counter one has to decide its bit-size, whether it should have synchronous reset whether it must be active high enable etc.
RTL Coding
In RTL coding, Micro Design is converted into Verilog/VHDL code, using synthesizable constructs of the language. Normally, vim editor is used, and conTEXT, Nedit and Emacs are other choices.
Simulation
Simulation is the process of verifying the functional characteristics of models at any level of abstraction. We use simulators to simulate the the Hardware models. To test if the RTL code meets the functional requirements of the specification, see if all the RTL blocks are functionally correct. To achieve this we need to write testbench, which generates clk, reset and required test vectors. A sample testbench for a counter is as shown below. Normally, we spend 60-70% of time in verification of design.
We use waveform output from the simulator to see if the DUT (Device Under Test) is functionally correct. Most of the simulators come with waveform viewer, as design becomes complex, we write self checking testbench, where testbench applies the test vector, compares the output of DUT with expected value. There is another kind of simulation, called timing simulation, which is done after synthesis or after P&R (Place and Route). Here we include the gate delays and wire delays and see if DUT works at the rated clock speed. This is also called as SDF simulation or gate level simulation Version 2 EE IIT, Kharagpur 7
Synthesis
Synthesis is the process in which a synthesis tool like design compiler takes in the RTL in Verilog or VHDL, target technology, and constrains as input and maps the RTL to target technology primitives. The synthesis tool after mapping the RTL to gates, also does the minimal amount of timing analysis to see if the mapped design is meeting the timing requirements. (Important thing to note is, synthesis tools are not aware of wire delays, they know only gate delays). After the synthesis there are a couple of things that are normally done before passing the netlist to backend (Place and Route)
Verification: Check if the RTL to gate mapping is correct. Scan insertion: Insert the scan chain in the case of ASIC.
foundry for fabricating the ASIC. Normally the P&R tool are used to output the SDF file, which is back annotated along with the gatelevel netlist from P&R into static analysis tool like Prime Time to do timing analysis.
1.2 1.2.1
The basic lexical conventions used by Verilog HDL are similar to those in the C programming language. Verilog HDL is a case-sensitive language. All keywords are in lowercase.
1.2.2
Data Types
Verilog Language has two primary data types : Nets - represents structural connections between components. Registers - represent variables used to store data. Every signal has a data type associated with it. Data types are: Explicitly declared with a declaration in the Verilog code. Implicitly declared with no declaration but used to connect structural building blocks in the code. Implicit declarations are always net type "wire" and only one bit wide.
Types of Net
Each net type has functionality that is used to model different types of hardware (such as PMOS, NMOS, CMOS, etc).This has been tabularized as follows: Net Data Type wire, tri wor, trior wand,triand tri0,tri1 supply0,suppy1 Functionality Interconnecting wire - no special resolution function Wired outputs OR together (models ECL) Wired outputs AND together (models open-collector) Net pulls-down or pulls-up when not driven Net has a constant logic 0 or logic 1 (supply strength)
Registers store the last value assigned to them until another assignment statement changes their value. Registers represent data storage constructs. Register arrays are called memories. Version 2 EE IIT, Kharagpur 9
Register data types are used as variables in procedural blocks. A register data type is required if a signal is assigned a value within a procedural block Procedural blocks begin with keyword initial and always.
Some common data types are listed in the following table: Data Types reg integer time real Functionality Unsigned variable Signed variable 32 bits Unsigned integer- 64 bits Double precision floating point variable
1.2.3
Apart from these there are vectors, integer, real & time register data types.
Some examples are as follows: Integer integer counter; // general purpose variable used as a counter. initial counter= -1; // a negative one is stored in the counter Real real delta; // Define a real variable called delta. initial begin delta= 4e10; // delta is assigned in scientific notation delta = 2.13; // delta is assigned a value 2.13 end integer i; // define an integer I; initial i = delta ; // I gets the value 2(rounded value of 2.13) Time time save_sim_time; // define a time variable save_sim_time initial save_sim_time = $time; // save the current simulation time. n.b. $time is invoked to get the current simulation time Arrays integer count [0:7]; // an array of 8 count variables reg [4:0] port_id[0:7]; // Array of 8 port _ids, each 5 bit wide. integer matrix[4:0] [0:255] ; // two dimensional array of integers. Version 2 EE IIT, Kharagpur 10
1.2.4
Memories
Memories are modeled simply as one dimensional array of registers each element of the array is know as an element of word and is addressed by a single array index. reg membit [0:1023] ; // memory meme1bit with 1K 1- bit words reg [7:0] membyte [0:1023]; memory membyte with 1K 8 bit words membyte [511] // fetches 1 byte word whose address is 511.
Strings
A string is a sequence of characters enclosed by double quotes and all contained on a single line. Strings used as operands in expressions and assignments are treated as a sequence of eight-bit ASCII values, with one eight-bit ASCII value representing one character. To declare a variable to store a string, declare a register large enough to hold the maximum number of characters the variable will hold. Note that no extra bits are required to hold a termination character; Verilog does not store a string termination character. Strings can be manipulated using the standard operators. When a variable is larger than required to hold a value being assigned, Verilog pads the contents on the left with zeros after the assignment. This is consistent with the padding that occurs during assignment of non-string values. Certain characters can be used in strings only when preceded by an introductory character called an escape character. The following table lists these characters in the right-hand column with the escape sequence that represents the character in the left-hand column.
Modules
Module are the building blocks of Verilog designs You create design hierarchy by instantiating modules in other modules. An instance of a module can be called in another, higher-level module.
Ports
Ports allow communication between a module and its environment. All but the top-level modules in a hierarchy have ports. Ports can be associated by order or by name. You declare ports to be input, output or inout. The port declaration syntax is : input [range_val:range_var] list_of_identifiers; output [range_val:range_var] list_of_identifiers; inout [range_val:range_var] list_of_identifiers;
Schematic
1.2.5
Width matching: It is legal to connect internal and external ports of different sizes. But beware, synthesis tools could report problems. Unconnected ports : unconnected ports are allowed by using a "," The net data types are used to connect structure A net data type is required if a signal can be driven a structural connection.
Example Implicit
dff u0 ( q,,clk,d,rst,pre); // Here second port is not connected
Example Explicit
dff u0 (.q (q_out), .q_bar (), .clk (clk_in), .d (d_in), .rst (rst_in), .pre (pre_in)); // Here second port is not connected
1.3
In this level of abstraction the system modeling is done at the gate level ,i.e., the properties of the gates etc. to be used by the behavioral description of the system are defined. These definitions are known as primitives. Verilog has built in primitives for gates, transmission gates, switches, buffers etc.. These primitives are instantiated like modules except that they are predefined in verilog and do not need a module definition. Two basic types of gates are and/or gates & buf /not gates.
1.3.1
Gate Primitives
And/Or Gates: These have one scalar output and multiple scalar inputs. The output of the gate is evaluated as soon as the input changes . wire OUT, IN1, IN2; // basic gate instantiations and a1(OUT, IN1, IN2); nand na1(OUT, IN1, IN2); or or1(OUT, IN1, IN2); Version 2 EE IIT, Kharagpur 13
nor nor1(OUT, IN1, IN2); xor x1(OUT, IN1, IN2); xnor nx1(OUT, IN1, IN2); // more than two inputs; 3 input nand gate nand na1_3inp(OUT, IN1, IN2, IN3); // gate instantiation without instance name and (OUT, IN1, IN2); // legal gate instantiation Buf/Not Gates: These gates however have one scalar input and multiple scalar outputs \// basic gate instantiations for bufif bufif1 b1(out, in, ctrl); bufif0 b0(out, in, ctrl); // basic gate instantiations for notif notif1 n1(out, in, ctrl); notif0 n0(out, in, ctrl);
Array of instantiations
wire [7:0] OUT, IN1, IN2; // basic gate instantiations nand n_gate[7:0](OUT, IN1, IN2);
Gate-level multiplexer
A multiplexer serves a very efficient basic logic design element // module 4:1 multiplexer module mux4_to_1(out, i1, i2 , i3, s1, s0); // port declarations output out; input i1, i2, i3; input s1, s0; // internal wire declarations wire s1n, s0n; wire y0, y1, y2, y3 ; //gate instantiations // create s1n and s0n signals not (s1n, s1); not (s0n, s0); // 3-input and gates instantiated and (y0, i0, s1n, s0n); and (y1, i1, s1n, s0); and (y2, i2, s1, s0n); and (y3, i3, s1, s0); // 4- input gate instantiated or (out, y0, y1, y2, y3); endmodule Version 2 EE IIT, Kharagpur 14
1.3.2
In real circuits, logic gates haves delays associated with them. Verilog provides the mechanism to associate delays with gates. Rise, Fall and Turn-off delays. Minimal, Typical, and Maximum delays
Rise Delay
The rise delay is associated with a gate output transition to 1 from another value (0,x,z).
Fall Delay
The fall delay is associated with a gate output transition to 0 from another value (1,x,z).
Turn-off Delay The Turn-off delay is associated with a gate output transition to z from another value (0,1,x). Min Value The min value is the minimum delay value that the gate is expected to have. Typ Value The typ value is the typical delay value that the gate is expected to have. Max Value The max value is the maximum delay value that the gate is expected to have.
1.4 1.4.1
Verilog behavioral code is inside procedures blocks, but there is an exception, some behavioral code also exist outside procedures blocks. We can see this in detail as we make progress. There are two types of procedural blocks in Verilog
initial : initial blocks execute only once at time zero (start execution at time zero). always : always blocks loop to execute over and over again, in other words as the name means, it executes always.
Example initial module initial_example(); reg clk,reset,enable,data; initial begin clk = 0; reset = 0; enable = 0; data = 0; end endmodule In the above example, the initial block execution and always block execution starts at time 0. Always blocks wait for the the event, here positive edge of clock, where as initial block without waiting just executes all the statements within begin and end statement. Example always module always_example(); reg clk,reset,enable,q_in,data; always @ (posedge clk) if (reset) begin data <= 0; end else if (enable) begin data <= q_in; end endmodule In always block, when the trigger event occurs, the code inside begin and end is executed and then once again the always block waits for next posedge of clock. This process of waiting and executing on event is repeated till simulation stops.
1.4.2
1.4.3
If a procedure block contains more then one statement, those statements must be enclosed within Sequential begin - end block Parallel fork - join block Example - "begin-end" module initial_begin_end(); reg clk,reset,enable,data; initial begin Version 2 EE IIT, Kharagpur 16
#1 clk = 0; #10 reset = 0; #5 enable = 0; #3 data = 0; end endmodule Begin : clk gets 0 after 1 time unit, reset gets 0 after 6 time units, enable after 11 time units, data after 13 units. All the statements are executed sequentially. Example - "fork-join" module initial_fork_join(); reg clk,reset,enable,data; initial fork #1 clk = 0; #10 reset = 0; #5 enable = 0; #3 data = 0; join endmodule
1.4.4
The begin - end keywords: Group several statements together. Cause the statements to be evaluated sequentially (one at a time) o Any timing within the sequential groups is relative to the previous statement. o Delays in the sequence accumulate (each delay is added to the previous delay) o Block finishes after the last statement in the block.
1.4.5
The fork - join keywords: Group several statements together. Cause the statements to be evaluated in parallel ( all at the same time). o Timing within parallel group is absolute to the beginning of the group. o Block finishes after the last statement completes( Statement with high delay, it can be the first statement in the block). Example Parallel module parallel(); reg a; initial fork #10 a = 0; #11 a = 1; #12 a = 0; #13 a = 1; Version 2 EE IIT, Kharagpur 17
#14 a = $finish; join endmodule Example - Mixing "begin-end" and "fork - join" module fork_join(); reg clk,reset,enable,data; initial begin $display ( "Starting simulation" ); fork : FORK_VAL #1 clk = 0; #5 reset = 0; #5 enable = 0; #2 data = 0; join $display ( "Terminating simulation" ); #10 $finish; end endmodule
1.4.6
Blocking assignments are executed in the order they are coded, Hence they are sequential. Since they block the execution of the next statement, till the current statement is executed, they are called blocking assignments. Assignment are made with "=" symbol. Example a = b; Nonblocking assignments are executed in parallel. Since the execution of next statement is not blocked due to execution of current statement, they are called nonblocking statement. Assignment are made with "<=" symbol. Example a <= b; Example - blocking and nonblocking module blocking_nonblocking(); reg a, b, c, d ; // Blocking Assignment initial begin #10 a = 0; #11 a = 1; #12 a = 0; #13 a = 1; end initial begin #10 b <= 0; #11 b <=1; #12 b <=0; #13 b <=1; end initial begin c = #10 0; c = #11 1; Version 2 EE IIT, Kharagpur 18
c = #12 0; c = #13 1; end initial begin d <= #10 0; d <= #11 1; d <= #12 0; d <= #13 1; end initial begin $monitor( " TIME = %t A = %b B = %b C = %b D = %b" ,$time, a, b, c, d ); #50 $finish(1); end endmodule
1.4.7
The if - else statement controls the execution of other statements. In programming language like c, if - else controls the flow of program. When more than one statement needs to be executed for an if conditions, then we need to use begin and end as seen in earlier examples. Syntax: if if (condition) statements; Syntax: if-else if (condition) statements; else statements;
1.4.8
if (condition) statements; else if (condition) statements; ................ ................ else statements; Example- simple if module simple_if(); reg latch; wire enable,din; always @ (enable or din) if (enable) begin latch <= din; end endmodule Example- if-else module if_else(); Version 2 EE IIT, Kharagpur 19
reg dff; wire clk,din,reset; always @ (posedge clk) if (reset) begin dff <= 0; end else begin dff <= din; end endmodule Example- nested-if-else-if module nested_if(); reg [3:0] counter; wire clk,reset,enable, up_en, down_en; always @ (posedge clk) // If reset is asserted if (reset == 1'b0) begin counter <= 4'b0000; // If counter is enable and up count is mode end else if (enable == 1'b1 && up_en == 1'b1) begin counter <= counter + 1'b1; // If counter is enable and down count is mode end else if (enable == 1'b1 && down_en == 1'b1) begin counter <= counter - 1'b0; // If counting is disabled end else begin counter <= counter; // Redundant code end endmodule
Parallel if-else
In the above example, the (enable == 1'b1 && up_en == 1'b1) is given highest pritority and condition (enable == 1'b1 && down_en == 1'b1) is given lowest priority. We normally don't include reset checking in priority as this does not fall in the combo logic input to the flip-flop as shown in figure below.
So when we need priority logic, we use nested if-else statements. On the other end if we don't want to implement priority logic, knowing that only one input is active at a time i.e. all inputs are mutually exclusive, then we can write the code as shown below. It is a known fact that priority implementation takes more logic to implement then parallel implementation. So if you know the inputs are mutually exclusive, then you can code the logic in parallel if. module parallel_if(); reg [3:0] counter; wire clk,reset,enable, up_en, down_en; always @ (posedge clk) // If reset is asserted if (reset == 1'b0) begin counter <= 4'b0000; end else begin // If counter is enable and up count is mode if (enable == 1'b1 && up_en == 1'b1) begin counter <= counter + 1'b1; end // If counter is enable and down count is mode if (enable == 1'b1 && down_en == 1'b1) begin counter <= counter - 1'b0; end end endmodule
1.4.9
The case statement compares an expression with a series of cases and executes the statement or statement group associated with the first matching case case statement supports single or multiple statements. Group multiple statements using begin and end keywords. Syntax of a case statement look as shown below. case () < case1 > : < statement > < case2 > : < statement > default : < statement > endcase
The forever statement The forever loop executes continually, the loop never ends. Normally we use forever statement in initial blocks. syntax : forever < statement > Once should be very careful in using a forever statement, if no timing construct is present in the forever statement, simulation could hang. The repeat statement The repeat loop executes statement fixed < number > of times. syntax : repeat (< number >) (< statement >) The while loop statement The while loop executes as long as an evaluates as true. This is same as in any other programming language. syntax: while (expression)<statement> The for loop statement The for loop is same as the for loop used in any other programming language. Executes an < initial assignment > once at the start of the loop. Executes the loop as long as an < expression > evaluates as true. Executes a at the end of each pass through the loop syntax : for (< initial assignment >; < expression >, < step assignment >) < statement > Note : verilog does not have ++ operator as in the case of C language.
1.5
1.5.1 Verilog provides the ability to design at MOS-transistor level, however with increase in complexity of the circuits design at this level is growing tough. Verilog however only provides digital design capability and drive strengths associated to them. Analog capability is not into picture still. As a matter of fact transistors are only used as switches. MOS switches //MOS switch keywords nmos pmos Whereas the keyword nmos is used to model a NMOS transistor, pmos is used for PMOS transistors. Instantiation of NMOS and PMOS switches nmos n1(out, data, control); // instantiate a NMOS switch pmos p1(out, data, control); // instantiate a PMOS switch
CMOS switches
Instantiation of a CMOS switch.
cmos c1(out, data, ncontrol, pcontrol ); // instantiate a cmos switch The ncontrol and pcontrol signals are normally complements of each other
Bidirectional switches
These switches allow signal flow in both directions and are defined by keywords tran,tranif0 , and tranif1
Instantiation
tran t1(inout1, inout2); // instance name t1 is optional tranif0(inout1, inout2, control); // instance name is not specified tranif1(inout1, inout2, control); // instance name t1 is not specified
1.5.2
1.5.3
// define a nor gate, my_nor module my_nor(out, a, b); output out; input a, b; //internal wires wire c; // set up pwr n ground lines supply1 pwr;// power is connected to Vdd supply0 gnd; // connected to Vss // instantiate pmos switches pmos (c, pwr, b); pmos (out, c, a); //instantiate nmos switches nmos (out, gnd, a); Stimulus to test the NOR-gate // stimulus to test the gate Version 2 EE IIT, Kharagpur 23
module stimulus; reg A, B; wire OUT; //instantiate the my_nor module my_nor n1(OUT, A, B); //Apply stimulus initial begin //test all possible combinations A=1b0; B=1b0; #5 A=1b0; B=1b1; #5 A=1b1; B=1b0; #5 A=1b1; B=1b1; end //check results initial $ monitor($time, OUT = %b, B=%b, OUT, A, B); endmodule
1.6 1.6.1
i) A 2 inp xor gate can be build from my_and, my_or and my_not gates. Construct an xor module in verilog that realises the logic function z= xy'+x'y. Inputs are x, y and z is the output. Write a stimulus module that exercises all the four combinations of x and y ii) The logic diagram for an RS latch with delay is being shown.
Write the verilog description for the RS latch, including delays of 1 unit when instantiating the nor gates. Write the stimulus module for the RS latch using the following table and verify the outputs.
Set 0 0 1 1
Reset 0 1 0 1
Qn+1 qn 0 1 ?
iii) Design a 2-input multiplexer using bufif0 and bufif1 gates as shown below
The delay specification for gates b1 and b2 are as follows Min 1 3 5 Typ 2 4 6 Max 3 5 7
Module 4
Design of Embedded Processors
Version 2 EE IIT, Kharagpur 1
Lesson 22
Introduction to Hardware Description Languages - II
Version 2 EE IIT, Kharagpur 2
Instructional Objectives
At the end of the lesson the student should be able to Call a task and a function in a Verilog code and distinguish between them Plan and write test benches to a Verilog code such that it can be simulated to check the desired results and also test the source code Explain what are User Defined Primitives, classify them and use them in code
2.1 2.1.1
Tasks are used in all programming languages, generally known as procedures or sub- routines. Many lines of code are enclosed in -task....end task- brackets. Data is passed to the task, processing done, and the result returned to the main program. They have to be specifically called, with data in and out, rather than just wired in to the general netlist. Included in the main body of code, they can be called many times, reducing code repetition.
Tasks are defined in the module in which they are used. it is possible to define a task in a separate file and use compile directive 'include to include the task in the file which instantiates the task. Tasks can include timing delays, like posedge, negedge, # delay and wait. Tasks can have any number of inputs and outputs. The variables declared within the task are local to that task. The order of declaration within the task defines how the variables passed to the task by the caller are used. Task can take, drive and source global variables, when no local variables are used. When local variables are used it assigns the output only at the end of task execution. One task can call another task or function. Task can be used for modeling both combinational and sequential logics. A task must be specifically called with a statement, it cannot be used within an expression as a function can.
Syntax
task begins with the keyword task and ends with the keyword endtask Input and output are declared after the keyword task. Local variables are declared after input and output declaration.
module simple_task(); task convert; input [7:0] temp_in; Version 2 EE IIT, Kharagpur 3
output [7:0] temp_out; begin temp_out = (9/5) *( temp_in + 32) end endtask endmodule Example - Task using Global Variables module task_global (); reg[7:0] temp_in; reg [7:0] temp_out; task convert; always@(temp_in) begin temp_out = (9/5) *( temp_in + 32) end endtask endmodule
Calling a task
Lets assume that the task in example 1 is stored in a file called mytask.v. Advantage of coding the task in a separate file is that it can then be used in multiple modules. module task_calling (temp_a, temp_b, temp_c, temp_d); input [7:0] temp_a, temp_c; output [7:0] temp_b, temp_d; reg [7:0] temp_b, temp_d; `include "mytask.v" always @ (temp_a) Begin convert (temp_a, temp_b); End always @ (temp_c) Begin convert (temp_c, temp_d); End Endmodule
Example // Module that contains an automatic re-entrant task //there are two clocks, clk2 runs at twice the frequency of clk and is synchronous with it. module top; reg[15:0] cd_xor, ef_xor; // variables in module top reg[15:0] c,d,e,f ; // variables in module top task automatic bitwise_xor output[15:0] ab_xor ; // outputs from the task input[15:0] a,b ; // inputs to the task begin #delay ab_and = a & b ab_or= a| b; ab_xor= a^ b; end endtask // these two always blocks will call the bitwise_xor task // concurrently at each positive edge of the clk, however since the task is re-entrant, the //concurrent calls will work efficiently always @(posedge clk) bitwise_xor(ef_xor, e ,f ); always @(posedge clk2)// twice the frequency as that of the previous clk bitwise_xor(cd_xor, c ,d ); endmodule
2.1.2
Function
Function is very much similar to a task, with very little difference, e.g., a function cannot drive more then one output and, also, it can not contain delays.
Functions are defined in the module in which they are used. It is possible to define function in separate file and use compile directive 'include to include the function in the file which instantiates the task. Function can not include timing delays, like posedge, negedge, # delay. This means that a function should be executed in "zero" time delay. Function can have any number of inputs but only one output. The variables declared within the function are local to that function. The order of declaration within the function defines how the variables are passed to it by the caller. Function can take, drive and source global variables when no local variables are used. When local variables are used, it basically assigns output only at the end of function execution. Function can be used for modeling combinational logic. Function can call other functions, but can not call a task.
Syntax
A function begins with the keyword function and ends with the keyword endfunction Version 2 EE IIT, Kharagpur 5
Example - Simple Function module simple_function(); function myfunction; input a, b, c, d; begin myfunction = ((a+b) + (c-d)); end endfunction endmodule Example - Calling a Function module function_calling(a, b, c, d, e, f); input a, b, c, d, e ; output f; wire f; `include "myfunction.v" assign f = (myfunction (a,b,c,d)) ? e :0; endmodule
Constant function
A constant function is a regular verilog function and is used to reference complex values, can be used instead of constants.
Signed function
These functions allow the use of signed operation on function return values. module top; // signed function declaration // returns a 64 bit signed value function signed [63:0] compute _signed (input [63:0] vector); --endfunction // call to the signed function from a higher module if ( compute_signed(vector)<-3) begin -end -endmodule
2.1.3
Introduction
There are tasks and functions that are used to generate inputs and check the output during simulation. Their names begin with a dollar sign ($). The synthesis tools parse and ignore system functions, and, hence, they can be included even in synthesizable models.
Syntax
$strobe ("format_string", par_1, par_2, ... ); $monitor ("format_string", par_1, par_2, ... ); $displayb ( as above but defaults to binary..); $strobeh (as above but defaults to hex..); $monitoro (as above but defaults to octal..);
$scope, $showscope
$scope(hierarchy_name) sets the current hierarchical scope to hierarchy_name. $showscopes(n) lists all modules, tasks and block names in (and below, if n is set to 1) the current scope.
$random
$random generates a random integer every time it is called. If the sequence is to be repeatable, the first time one invokes random give it a numerical argument (a seed). Otherwise, the seed is derived from the computer clock.
Syntax
$dumpfile("filename.dmp") $dumpvar dumps all variables in the design. $dumpvar(1, top) dumps all the variables in module top and below, but not modules instantiated in top. $dumpvar(2, top) dumps all the variables in module top and 1 level below. $dumpvar(n, top) dumps all the variables in module top and n-1 levels below. Version 2 EE IIT, Kharagpur 8
$dumpvar(0, top) dumps all the variables in module top and all level below. $dumpon initiates the dump. $dumpoff stop dumping.
$fopen opens an output file and gives the open file a handle for use by the other commands. $fclose closes the file and lets other programs access it. $fdisplay and $fwrite write formatted data to a file whenever they are executed. They are the same except $fdisplay inserts a new line after every execution and $write does not. $strobe also writes to a file when executed, but it waits until all other operations in the time step are complete before writing. Thus initial #1 a=1; b=0; $fstrobe(hand1, a,b); b=1; will write write 1 1 for a and b. $monitor writes to a file whenever any one of its arguments changes.
Syntax
handle1=$fopen("filenam1.suffix") handle2=$fopen("filenam2.suffix") $fstrobe(handle1, format, variable list) //strobe data into filenam1.suffix $fdisplay(handle2, format, variable list) //write data into filenam2.suffix $fwrite(handle2, format, variable list) //write data into filenam2.suffix all on one line. //put in the format string where a new line is // desired.
2.2 2.2.1
are codes written in HDL to test the design blocks. A testbench is also known as stimulus, because the coding is such that a stimulus is applied to the designed block and its functionality is tested by checking the results. For writing a testbench it is important to have the design specifications of the "design under test" (DUT). Specifications need to be understood clearly and test plan made accordingly. The test plan, basically, documents the test bench architecture and the test scenarios (test cases) in detail. Example Counter Consider a simple 4-bit up counter, which increments its count when ever enable is high and resets to zero, when reset is asserted high. Reset is synchronous with clock. Version 2 EE IIT, Kharagpur 9
Code for Counter // Function : 4 bit up counter module counter (clk, reset, enable, count); input clk, reset, enable; output [3:0] count; reg [3:0] count; always @ (posedge clk) if (reset == 1'b1) begin count <= 0; end else if ( enable == 1'b1) begin count <= count + 1; end endmodule
2.2.2
Test Plan
We will write self checking test bench, but we will do this in steps to help you understand the concept of writing automated test benches. Our testbench environment will look something like shown in the figure.
DUT is instantiated in testbench which contains a clock generator, reset generator, enable logic generator, compare logic. The compare logic calculates the expected count value of the counter and compares its output with the calculated value
2.2.3
Test Cases
Reset Test : We can start with reset deasserted, followed by asserting reset for few clock ticks and deasserting the reset, See if counter sets its output to zero. Enable Test : Assert/deassert enable after reset is applied. Random Assert/deassert of enable and reset.
The first way is to simply instantiate the design block(DUT) and write the code such that it directly drives the signals in the design block. In this case the stimulus block itself is the toplevel block. In the second style a dummy module acts as the top-level module and both the design(DUT) and the stimulus blocks are instantiated within it. Generally, in the stimulus block the inputs to DUT are defined as reg and outputs from DUT are defined as wire. An important point is that there is no port list for the test bench. An example of the stimulus block is given below. Note that the initial block below is used to set the various inputs of the DUT to a predefined logic state.
reset = 0; enable = 0; end always #5 clk = !clk; initial begin $dumpfile ( "counter.vcd" ); $dumpvars; end initial begin $display( "\t\ttime,\tclk,\treset,\tenable,\tcount" ); $monitor( "%d,\t%b,\t%b,\t%b,\t%d" ,$time, clk,reset,enable,count); end initial #100 $finish; //Rest of testbench code after this line Endmodule $dumpfile is used for specifying the file that simulator will use to store the waveform, that can be used later to view using a waveform viewer. (Please refer to tools section for freeware version of viewers.) $dumpvars basically instructs the Verilog compiler to start dumping all the signals to "counter.vcd". $display is used for printing text or variables to stdout (screen), \t is for inserting tab. Syntax is same as printf. Second line $monitor is bit different, $monitor keeps track of changes to the variables that are in the list (clk, reset, enable, count). When ever anyone of them changes, it prints their value, in the respective radix specified. $finish is used for terminating simulation after #100 time units (note, all the initial, always blocks start execution at time 0)
event reset_done_trigger; initial begin forever begin @ (reset_trigger); @ (negedge clk); reset = 1; @ (negedge clk); reset = 0; reset_done_trigger; end end
Syntax
UDP begins with the keyword primitive and ends with the keyword endprimitive. UDPs must be defined outside the main module definition. This code shows how input/output ports and primitve is declared. primitive udp_syntax ( a, // Port a b, // Port b c, // Port c d // Port d ) output a; input b,c,d; // UDP function code here endprimitive Note:
A UDP can contain only one output and up to 10 inputs max. Output Port should be the first port followed by one or more input ports. All UDP ports are scalar, i.e. Vector ports are not allowed. UDP's can not have bidirectional ports.
Body
Functionality of primitive (both combinational and sequential) is described inside a table, and it ends with reserve word endtable (as shown in the code below). For sequential UDPs, one can use initial to assign initial value to output. // This code shows how UDP body looks like primitive udp_body ( a, // Port a b, // Port b c // Port c ); input b,c; Version 2 EE IIT, Kharagpur 14
// UDP function code here // A = B | C; table // B C : A ? 1 : 1; 1 ? : 1; 0 0 : 0; endtable endprimitive Note: A UDP cannot use 'z' in input table and instead it uses x.
2.3.2
Combinational UDPs
In combinational UDPs, the output is determined as a function of the current input. Whenever an input changes value, the UDP is evaluated and one of the state table rows is matched. The output state is set to the value indicated by that row. Let us consider the previously mentioned UDP.
Sequential UDPs
Sequential UDPs differ in the following manner from the combinational UDPs The output of a sequential UDP is always defined as a reg An initial statement can be used to initialize output of sequential UDPs Version 2 EE IIT, Kharagpur 15
The format of a state table entry is somewhat different There are 3 sections in a state table entry: inputs, current state and next state. The three states are separated by a colon(:) symbol. The input specification of state table can be in term of input levels or edge transitions The current state is the current value of the output register. The next state is computed based on inputs and the current state. The next state becomes the new value of the output register. All possible combinations of inputs must be specified to avoid unknown output.
Level sensitive UDPs // define level sensitive latch by using UDP primitive latch (q, d, clock, clear) //declarations output q; reg q; // q declared as reg to create internal storage input d, clock, clear; // sequential UDP initialization // only one initial statement allowed initial q=0; // initialize output to value 0 // state table table // d clock clear : q : q+ ; ? ? 1 : ? : 0 ;// clear condition // q+ is the new output value 1 1 0 : ? : 1 ;// latch q = data = 1 0 1 0 : ? : 0 ;// latch q = data = 0
? 0 endtable endprimitive
Edgesensitive UDPs //Define edge sensitive sequential UDP; primitive edge_dff(output reg q = 0 input d, clock, clear); // state table table // d clock clear : q : q+ ; ? ? 1 : ? : 0 ; // output=0 if clear =1 Version 2 EE IIT, Kharagpur 16
? ? (10): ? : - ; // ignore negative transition of clear 1 (10) 0 : ? : 1 ;// latch data on negative transition 0 (10) 0 : ? : 0 ;// clock ? ? ? (1x) (0?) (x1) 0 : ? : - ;// hold q if clock transitions to unknown state 0 : ? : - ;// ignore positive transitions of clock 0 : ? : - ;// ignore positive transitions of clock 0 : ? : - ;// ignore any change in d if clock is steady
v.
2. Timing
i) a. Consider the negative edge triggered with the asynchronous reset D-FF shown below. Write the verilog description for the module D-FF. describe path delays using parallel connection.
b Modify the above if all the path delays are 5. ii) Assume that a six delay specification is to be specified for all the path delays. All path delays are equal. In the specify block define parameters t_01=4, t_10=5, t_0z=7,t_z1=2, t_z0=8. Using the previous DFF write the six delay specifications for all the paths.
3. UDP
i. ii. Define a positive edge triggered d-f/f with clear as a UDP. Signal clear is active low. Define a level sensitive latch with a preset signal. Inputs are d, clock, and preset. Output is q. If clock=0, then q=d. If clock=1or x then q is unchanged. If preset=1, then q=1. If preset=0 then q is decided by clock and d signals. If preset=x then q=x. Define a negative edge triggered JK FF, jk_ff with asynchronous preset and clear as a UDP. Q=1when preset=1 and q=0 when clear=1
iii.
Module 4
Design of Embedded Processors
Version 2 EE IIT, Kharagpur 1
Lesson 23
Introduction to Hardware Description Languages-III
Version 2 EE IIT, Kharagpur 2
Instructional Objectives
At the end of the lesson the student should be able to Interface Verilog code to C & C++ using Programming Language Interface Synthesize a Verilog code and generate a netlist for layout Verify the generated code, and carry out optimization and debugging Classify various types of flows in Verification
3.1 3.1.1
PLI (Programming Language Interface) is a facility to invoke C or C++ functions from Verilog code. The function invoked in Verilog code is called a system call. Examples of built-in system calls are $display, $stop, $random. PLI allows the user to create custom system calls, something that Verilog syntax does not allow to do. Some of these are:
Power analysis. Code coverage tools. Can modify the Verilog simulation data structure - more accurate delays. Custom output displays. Co-simulation. Designs debug utilities. Simulation analysis. C-model interface to accelerate simulation. Testbench modeling.
To achieve the above few application of PLI, C code should have the access to the internal data structure of the Verilog simulator. To facilitate this Verilog PLI provides with something called acc routines or access routines
How it Works?
Write the functions in C/C++ code. Compile them to generate shared lib (*.DLL in Windows and *.so in UNIX). Simulator like VCS allows static linking. Use this Functions in Verilog code (Mostly Verilog Testbench).
Based on simulator, pass the C/C++ function details to simulator during compile process of Verilog Code (This is called linking, and you need to refer to simulator user guide to understand how this is done). Once linked just run the simulator like any other Verilog simulation.
During execution of the Verilog code by the simulator, whenever the simulator encounters the user defined system tasks (the one which starts with $), the execution control is passed to PLI routine (C/C++ function). Example - Hello World Define a function hello ( ), which when called will print "Hello World". This example does not use any of the PLI standard functions (ACC, TF and VPI). For exact linking details, the simulator manuals must be referred. Each simulator implements its own strategy for linking with the C/C++ functions.
C Code
#include < stdio.h > Void hello () { printf ( "\nHello World\n" );
Verilog Code
module hello_pli (); initial begin $hello; #10 $finish; end endmodule
3.1.2
Running a Simulation
Once linking is done, simulation is run as a normal simulation with slight modification to the command line options. These modifications tell the simulator that the PLI routines are being used (e.g. Modelsim needs to know which shared objects to load in command line). Writing PLI Application (counter example) Write the DUT reference model and Checker in C and link that to the Verilog Testbench. The requirements for writing a C model using PLI
Means of calling the C model, when ever there is change in input signals (Could be wire or reg or types). Means to get the value of the changes signals in Verilog code or any other signals in Verilog code from inside the C code. Means to drive the value on any signal inside the Verilog code from C code.
There are set of routines (functions), that Verilog PLI provides which satisfy the above requirements
3.1.3
This can be well understood in context to the above counter logic. The objective is to design the PLI function $counter_monitor and check the response of the designed counter using it. This problem can be addressed to in the following steps: Implement the Counter logic in C. Implement the Checker logic in C. Terminate the simulation, whenever the checker fails. This is represented in the block diagram in the figure 23.2.
acc_vcl_add routine basically monitors the list of signals and whenever any of the monitor signals change, it calls the user defined function (this function is called the Consumer C routine). The vcl routine has four arguments.
Handle to the monitored object Consumer C routine to call when the object value changes String to be passed to consumer C routine Predefined VCL flags: vcl_verilog_logic for logic monitoring vcl_verilog_strength for strength monitoring
C Code Basic
The desired C function is Counter_monitor , which is called from the Verilog Testbench. As like any other C code, header files specific to the application are included.Here the include e file comprises of the acc routines. The access routine acc_initialize initializes the environment for access routines and must be called from the C-language application program before the program invokes any other access routines. Before exiting a C-language application program that calls access routines, it is necessary to exit the access routine environment by calling acc_close at the end of the program. #include < stdio.h > #include "acc_user.h" typedef char * string; handle clk ; handle reset ; handle enable ; handle dut_count ; int count ; void counter_monitor() { acc_initialize(); clk = acc_handle_tfarg(1); reset = acc_handle_tfarg(2); enable = acc_handle_tfarg(3); dut_count = acc_handle_tfarg(4); acc_vcl_add(clk,counter,null,vcl_verilog_logic); acc_close(); } void counter () printf( "Clock changed state\n" ); Handles are used for accessing the Verilog objects. The handle is a predefined data type that is a pointer to a specific object in the design hierarchy. Each handle conveys information to access routines about a unique instance of an accessible object information about the object type and, also, how and where the data pertaining to it can be obtained. The information of specific object Version 2 EE IIT, Kharagpur 6
to handle can be passed from the Verilog code as a parameter to the function $counter_monitor. This parameters can be accessed through the C-program with acc_handle_tfarg( ) routine. For instance clk = acc_handle_tfarg(1) basically makes that the clk is a handle to the first parameter passed. Similarly, all the other handles are assigned clk can now be added to the signal list that needs to be monitored using the routine acc_vcl_add(clk, counter ,null , vcl_verilog_logic). Here clk is the handle, counter is the user function to execute, when the clk changes.
Verilog Code
Below is the code of a simple testbench for the counter example. If the object being passed is an instance, then it should be passed inside double quotes. Since here all the objects are nets or wires, there is no need to pass them inside the double quotes. module counter_tb(); reg enable;; reg reset; reg clk_reg; wire clk; wire [3:0] count; initial begin clk = 0; reset = 0; $display( "Asserting reset" ); #10 reset = 1; #10 reset = 0; $display ( "Asserting Enable" ); #10 enable = 1; #20 enable = 0; $display ( "Terminating Simulator" ); #10 $finish; End Always #5 clk_reg = !clk_reg; assign clk = clk_reg; initial begin $counter_monitor(top.clk,top.reset,top.enable,top.count); end counter U( clk (clk), reset (reset), enable (enable), count (count) ); endmodule Version 2 EE IIT, Kharagpur 7
Access Routines
Access routines are C programming language routines that provide procedural access to information within Verilog. Access routines perform one of two operations: Extract information pertaining to an object from the internal data representation. Write information pertaining to an object into the internal data representation.
acc_user.h : all data-structure related to access routines acc_initialize( ) : initialize variables and set up environment main body : User-defined application acc_close( ) : Undo the actions taken by the function acc_initialize( )
Utility Routines
Interaction between the Verilog tool and the users routines is handled by a set of programs that are supplied with the Verilog toolset. Library functions defined in PLI1.0 perform a wide variety of operations on the parameters passed to the system call and are used to do simulation synchronization or implementing conditional program breakpoint.
3.2 3.2.1
Logic synthesis is the process of converting a high-level description of design into an optimized gate-level netlist representation. Logic synthesis uses standard cell libraries which consist of simple cells, such as basic logic gates like and, or, and nor, or macro cells, such as adder, muxes, memory, and flip-flops. Standard cells put together form the technology library. Normally, technology library is known by the minimum feature size (0.18u, 90nm). A circuit description is written in Hardware description language (HDL) such as Verilog Design constraints such as timing, area, testability, and power are considered during synthesis. Typical design flow with a large example is given in the last example of this lesson.
3.2.2
For large designs, manual conversions of the behavioral description to the gate-level representation are more prone to error. Prior to the development of modern sophisticated synthesis tools the earlier designers could never be sure that whether after fabrication the design constraints will be met. Moreover, a significant time of the design cycle was consumed in converting the highlevel design into its gate level representation. On account of these, if the gate level design did not meet the requirements then the turnaround time for redesigning the blocks was also very high. Each designer implemented design blocks and there was very little consistency in design cycles, hence, although the individual blocks were optimized but the overall design still contained redundant logics. Moreover, timing, area and power dissipation was fabrication process specific and, hence, with the change of processes the entire process needed to be changed with the design methodology. However, now automated logic synthesis has solved these problems. The high level design is less prone to human error because designs are described at higher levels of abstraction. High level design is done without much concentration on the constraints. The tool takes care of all the constraints and sees to it that the constraints are taken care of. The designer can go back, redesign and synthesize once again very easily if some aspect is found unaddressed. The turnaround time has also fallen down considerably. Automated logic synthesis tools synthesize the design as a whole and, thus, an overall design optimization is achieved. Logic synthesis allows a technology independent design. The tools convert the design into gates using cells from the standard cell library provided by the vendor. Design reuse is possible for technology independent designs. If the technology changes the tool is capable of mapping accordingly. Constructs Not Supported in Synthesis Notes Only in testbenches Events make more sense for syncing test bench components Real data type not supported Version 2 EE IIT, Kharagpur 9
Time data type not supported force and release of data types not supported assign and deassign of reg data types is not supported, but, assign on wire data type is supported
3.2.3
Construct Type
ports parameters module definition signals and variables instantiation function and tasks
Description
Use inout only at IO level. This makes design more generic
always, if, then, else, case, casex, casez begin, end, named blocks, disable assign disable for, while, forever
initial is not supported Disabling of named blocks allowed Delay information is ignored Disabling of named block supported. While and forever loops must contain @(posedge clk) or @(negedge clk)
3.2.4
Operator Type
Arithmetic
DESCRIPTION
Multiply Division Add Subtract Modulus Unary plus Unary minus Logical negation Logical and Logical or Greater than Less than Greater than or equal Less than or equal Equality inequality Bitwise negation nand or nor xor xnor Right shift Version 2 EE IIT, Kharagpur 11
Logical
Relational
Equality Reduction
Shift
Concatenation Conditional
<< {} ?
Keyword
input, inout, output
Description
Use inout only at IO level. This makes design more generic Vectors are allowed Eg- nand (out,a,b) bad idea to code RTL this way. Timing constructs ignored initial is not supported Disabling of named blocks allowed Delay information is ignored Disabling of named block supported. While and forever loops must contain @(posedge clk) or @(negedge clk)
3.2.5
Translation
The RTL description is converted by the logic synthesis tool to an optimized, intermediate, internal representation. It understands the basic primitives and operators in the Verilog RTL description but overlooks any of the constraints.
Logic optimization
The logic is optimized to remove the redundant logic. It generates the optimized internal representation.
Technology library
The technology library contains standard library cells which are used during synthesis to replace the behavioral description by the actual circuit components. These are the basic building blocks. Physical layout of these, are done first and then area is estimated. Finally, modeling techniques are used to estimate the power and timing characteristics. The library includes the following: Functionality of the cells Area of the different cell layout Timing information about the various cells Power information of various cells
The synthesis tools use these cells to implement the design. // Library cells for abc_100 technology VNAND// 2 input nand gate VAND// 2 input and gate VNOR // 2 input nor gate VOR// 2 input or gate VNOT// not gate VBUF// buffer
Design constraints
Any circuit must satisfy at least three constraints viz. area, power and timing. Optimization demands a compromise among each of these three constraints. Apart from these operating conditions-temperature etc. also contribute to synthesis complexity.
Logic synthesis
The logic synthesis tool takes in the RTL design, and generates an optimized gate level description with the help of technology library, keeping in pace with design constraints. Version 2 EE IIT, Kharagpur 13
Functional verification
Identical stimulus is run with the original RTL and synthesized gate-level description of the design. The output is compared for matches. module stimulus reg [3:0] A, B; wire A_GT_B, A_LT_B, A_EQ_B; // instantiate the magnitude comparator MC (A_GT_B, A_LT_B, A_EQ_B,. A, B); initial $ monitor ($time, A=%b, B=%b, A_GT_B=%b, A_LT_B=%b, A_EQ_B=%b, A_GT_B, A_LT_B, A_EQ_B, A, B) // stimulate the magnitude comparator endmodule
3.3 3.3.1
Traditional verification follows the following steps in general. 1. To verify, first a design specification must be set. This requires analysis of architectural trade-offs and is usually done by simulating various architectural models of the design. 2. Based on this specification a functional test plan is created. This forms the framework for verification. Based on this plan various test vectors are applied to the DUT (design under test), written in verilog. Functional test environments are needed to apply these test vectors. 3. The DUT is then simulated using traditional software simulators. 4. The output is then analyzed and checked against the expected results. This can be done manually using waveform viewers and debugging tools or else can be done automatically by verification tools. If the output matches expected results then verification is complete.
5. Optionally, additional steps can be taken to decrease the risk of future design respin. These include Hardware Acceleration, Hardware Emulation and assertion based Verification.
Functional verification
When the specifications for a design are ready, a functional test plan is created based on them. This is the fundamental framework of the functional verification. Based on this test plan, test vectors are selected and given as input to the design_under_test(DUT). The DUT is simulated to compare its output with the desired results. If the observed results match the expected values, the verification part is over.
3.3.2
Formal Verification
A formal verification tool proves a design by manipulating it as much as possible. All input changes must, however, conform to the constraints for behaviour validation. Assertions on interfaces act as constraints to the formal tool. Assertions are made to prove the assertions in the RTL code false. However, if the constraints are too tight then the tool will not explore all possible behaviours and may wrongly report the design as faulty. Both the formal and the semi-formal methodologies have come into precedence with the increasing complexity of design.
3.3.3
Semi formal verification combines the traditional verification flow using test vectors with the power and thoroughness of formal verification. Semi-formal methods supplement simulation with test vectors Embedded assertion checks define the properties targeted by formal methods Embedded assertion checks defines the input constraints Semi-formal methods explore limited space exhaustibility from the states reached by simulation, thus, maximizing the effect of simulation.The exploration is limited to a certain point around the state reached by simulation.
3.3.4
Equivalence checking
After logic synthesis and place and route tools create a gate level netlist and physical implementations of the RTL design, respectively, it is necessary to check whether these functionalities match the original RTL design. Here comes equivalence checking. It is an application of formal verification. It ensures that the gate level or physical netlist has the same functionality as the Verilog RTL that was simulated. A logical model of both the RTL and gate level representations is constructed. It is mathematically proved that their functionality are same.
3.4 3.4.1
i) Write a user defined system task, $count_and_gates, which counts the number of and gate primitive in a module instance. Hierarchical module instance name is the input to the task. Use this task to count the number of and gates in a 4-to-1 multiplexer.
3.4.2
i) A 1-bit full subtractor has three inputs x, y, z(previous borrow) and two outputs D(difference) and B(borrow). The logic equations for D & B are as follows D=xyz+ xyz+ xyz + xyz B= xy + xz+ yz Write the verilog RTL description for the full subtractor. Synthesize the full using any technology library available. Apply identical stimulus to the RTL and gate level netlist and compare the outputs. ii) Design a 3-8 decoder, using a Verilog RTL description. A 3-bit input a[2:0] is provided to the decoder. The output of the decoder is out[7:0]. The output bit indexed by a[2:0] gets the value 1, the other bits are 0. Synthesize the decoder, using any technology library available to you. Optimize for smallest area. Apply identical stimulus to the RTL and gate level netlist and compare the outputs. iii) Write the verilog RTL description for a 4-bit binary counter with synchronous reset that is active high.(hint: use always loop with the @ (posedge clock)statement.) synthesize the counter using any technology library available to you. Optimize for smallest area. Apply identical stimulus to the RTL and gate level netlist and compare the outputs.
Module 5
Embedded Communications
Version 2 EE IIT, Kharagpur 1
Lesson 24
Parallel Data Communication
Version 2 EE IIT, Kharagpur 2
Instructional Objectives
After going through this lesson the student would be able to Explain why a parallel interface is needed in an embedded system List the names of common parallel bus standards along with their important features Distinguish between the GPIB and other parallel data communication standards Describe how data communication takes place between the controller, talker and listener devices connected via a GPIB interface
Questions
Question
Parallel Data Communication is preferred when the following conditions are satisfied: i) distance between the devices is small ii) the volume of traffic is small iii) the required data rate is high The IEEE 488 standard was originally developed by The devices connected in a GPIB system are classified into the following types of categories Each device connected in a GPIB system has an n-bit address where n=
Ans.
D
T T T
T F F
F T T
T F T C C
go to top
Thus GPIB has several versions and makes which reflect the same thing, courtesy to the various developments pertaining to its history. GPIB Electrical and Mechanical Specifications: The BUS actually comprises a 24 Wire Cable with both MALE and FEMALE Connectors at each of the individual ends to facilitate the connectivity in a daisy-chain network topology. Standard TTL level signals are assumed for the ACTIVE, INACTIVE and TRANSITION states both for Control and Communication. Specified Transfer Rate: 1 Mega Byte per second. Cable length: Twenty meters between Controller and one Device or Two meters between two devices
Device fanout : Number of instruments may range from Eight to Ten. CLASSIFICATION of Instruments or Devices (as are called in the Standard) connected through this bus system: TALKER: Designated to send data to other instruments eg., Tape Readers, Data Recorders, Digital Voltmeters, Digital Oscilloscopes etc. LISTENER:Designated to receive data from other instruments or Controllers, eg., Printers, Display devices, Programmable Power Supplies, Programmable Signal Generators etc. CONTROLLER: Decision maker for the designation of an instrument either as a TALKER or a LISTNER. Usually this role is carried out by a computer.
go to top
All the Talkers, Listeners and the Controller are connected to each other via the following three different SYSTEM BUSES: (Also see A TYPICAL SEQUENCE of DATA FLOW) Bidirectional Databus Bus management Lines Handshake Lines Eight BI-DIRECTIONAL DATALINES have the following functionalities. These are used to transfer Data, Addresses, Commands and Status information in the form of Bytes. DATA : Transferred as BYTES with the reception of each data byte being duly acknowledged. ADDRESSES :Instruments intended for use on a GPIB usually have some switches which allow a selection of 5-bit address the instrument will be assuming on the BUS. Addresses are characterized as : o TALK ADDRESSES o LISTEN ADDRESSES CONTROL and COMMAND: BYTES containing information for orienting the devices to perform the functions like listen, talk etc. These commands can be referred to as the CONTROL WORDs necessary for establishing efficient communication between the Controller and the other class of devices. The various commands are: (also see the COMMAND TABLE) o o o o UNIVERSAL Commands UNLISTEN Commands UNTALK Commands SECONDARY Commands
Note: The Commands are sent by the Controller to the instruments. Five BUS MANAGEMENT LINES have the functionalities as follows: Version 2 EE IIT, Kharagpur 5
o o o o o
go to top
go to top
Three HANDSHAKE LINES having the functionality of coordinating the transfer of data bytes on the data bus.These functions can be elaborated as : o DAV : Data Valid o NRFD : Not Ready For Data o NDAC : Not Data Accepted Note: The Handshake Signals are necessary to facilitate transmission at different BANDWIDTHS (Data Rates).
go to top
go to top
go to top
go to top
The SEQUENCE of events pertaining to the actual communication is as follows: o o o o Power On: Controller takes up the Control of Buses and sends out the IFC signal to set all instruments on the bus to a known state. Controller starts performing the desired series of measurements or tests. Controller asserts the ATN line low and starts sending the command address codes to the talkers and the listeners. The CONTROL WORD Structure:
The Control Words are given in brief in the Command Table: The Command Table COMMAND Ignored X1111111 CONTROL WORD
Listen Command X01 + 5 LSBs (actual address) Talk Command X10 + 5 LSBs (actual address) Universal Command X000 + 4 LSBs (16 Commands) Unlisten Command X0111111 Untalk Command X1011111 Secondary Commands X11 + 5 LSBs (actual address) Note: All the Commands Control words are activated only if the ATN line is asserted low; otherwise, they are in a disabled state. X here represents the dont care condition. + here represents the NEXT indicated number of LSBs. The following are the most important features: The Universal Commands go to all the Listeners and Talkers. The Untalk or Unlisten Commands are for TURNING on or off the indicated device. In addition to all the above-indicated tasks the controller checks for the SRQ line in the context of SERVICE REQUEST.
On finding it as LOW, it POLLS each device on the bus in a serial top go to fashion, that is,one-by-one or in parallel. o It then determines the source of the SRQ, and asserts the ATN line low. o It then sends the relevant information or command to all the listners and the talkers depending on the data utility. The controller again asserts the ATN line high and data is transferred directlytop go to from the TALKER to the LISTENERS using a double-handshake-signal sequence. Some information about DAV, NRFD, and NDAC are given below: All are OPEN-COLLECTOR. o A listener can hold NRFD low to indicate that it is not ready for data. o A listener can hold NDAC low to indicate that it has not yet accepted a data byte. An Instance for the above two points can be sited as follows: o All Listeners release the NRFD line indicating that they are ready to receive data. o The Talker assets the DAV low to indicate that a valid data is on the bus. o All the addressed listeners then pull NRFD low and start accepting the data, NDAC line being asserted as high. o The talker, on sensing the NDAC line getting high unasserts the corresponding DAV signal. The listeners pull NDAC low again, and the sequence is repeated until the talker has sent all the data bytes it has to send. o The Data Transfer Rate depends on the rate at which the slowest listener can accept the data. o On completion of the data transfer the talker pulls the EOI line of the management group of signals low to indicate the transfer completion. o Finally, the controller takes control of all the data bus and sends Untalk and Unlisten commands to all the talkers and the listeners, and continues executing its pre-specified internal instructions.
current standard PCI Super allows upto 800 Mbps on a 64-bit bus. It supports automatic detection of devices via a 64- byte configuration register which makes it easy to interface plug-and-play devices in a system. 3. IEEE-796 (Multi bus): Originally introduced by Intel as a means of connecting multiple processors on the system board, this bus is no longer very popular. It works with 16 bit data & 24 bit address buses. 4. VME Bus: (Euro-standard) Introduced for the same purpose as Intel Multibus it works with 24 bit address 8/16/32 bit data buses. 5. SCSI Bus (Small Computer System Interface): This standard was originally designed for use with Apple Mcintosh computers and then popularized by the Workstation Vendors. The main purpose is to interface peripherals like harddisks, CD-ROM Drives and similar relatively slow peripheral which use a data rate less than 100Mbps. The following varieties of SCSI are currently implemented: SCSI-1: Uses an 8-bit bus, and supports data rates of 4 Mbps SCSI-2: Same as SCSI-1, but uses a 50-pin connector instead of a 25-pin connector, and supports multiple devices. This is what most people mean when they refer to plain SCSI. Wide SCSI: Uses a wider cable (168 cable lines to 68 pins) to support 16-bit transfers. Fast SCSI: Uses an 8-bit bus, but doubles the clock rate to support data rates of 10 Mbps. Fast Wide SCSI: Uses a 16-bit bus and supports data rates of 20 Mbps. Ultra SCSI: Uses an 8-bit bus, and supports data rates of 20 Mbps. SCSI-3: Uses a 16-bit bus and supports data rates of 40 Mbps. Also called Ultra Wide SCSI. Ultra2 SCSI: Uses an 8-bit bus and supports data rates of 40 Mbps. Wide Ultra2 SCSI: Uses a 16-bit bus and supports data rates of 80 Mbps.
However, for the kind of applications targeted by GPIB, it is now facing a very strong competition from the recently introduced high speed serial bus standards. Currently there are four major candidates for future bus systems in Test & Measurement: The Universal Serial bus (USB) is now very popular. The current implementation provides transfer rates of up to 12MBit/s. From that viewpoint, there is no speed enhancement in comparison to GPIB; in fact, it is a drawback. USB II is an enhanced USB bus capable of transferring up to 480MBit/s. It is backwards compatible to USB. The IEC SC65C Working group 3 (that developed also the IEC625.1 and IEC625.2 standards) is planning to work on this.
IEEE1394 (Fire Wire) is now available with transfer rates up to 400MBit/s. A specification to simulate GPIB was developed by a working group inside the IEEE1394 Trade Association. It is called IICP (Industrial and Instrumentation Control Protocol). Ethernet and related networks using TCP/IP protocol. Transfer rates up to 1GBit/s are possible. For simulating GPIB, a specification called VXI-11, introduced by the VXI plug play alliance, exists.
Module 5
Embedded Communications
Version 2 EE IIT, Kharagpur 1
Lesson 25
Serial Data Communication
Version 2 EE IIT, Kharagpur 2
Instructional Objectives
After going through this lesson the student would be able to Distinguish between serial and parallel data communication Explain why a communication protocol is needed Distinguish between the RS-232 and other serial communication standards Describe how serial communication can be used to interconnect two remote computers using the telephone line
B B
RS232.WHAT IS?.
STANDARD SIGNALLING/COMMUNICATION TECHNIQUE ADVANTAGES/APPLICATIONS Disadvantage
CONNECTERS and PIN DESCRIPTION Differences Between the various standards at a glance!!
(home..)
(home..)
Serial data communication strategies and, standards are used in situations having a limitation of the number of lines that can be spared for communication. This is the primary mode of transfer in long-distance communication. But it is also the situation in embedded systems where various subsystems share the communication channel and the speed is not a very critical issue. Standards incorporate both the software and hardware aspects of the system while buses mainly define the cable characteristics for the same communication type. Serial data communication is the most common low-level protocol for communicating between two or more devices. Normally, one device is a computer, while the other device can be a modem, a printer, another computer, or a scientific instrument such as an oscilloscope or a function generator. As the name suggests, the serial port sends and receives bytes of information, rather characters (used in the other modes of communication), in a serial fashion - one bit at a time. These bytes are transmitted using either a binary (numerical) format or a text format.
All the data communication systems follow some specific set of standards defined for their communication capabilities so that the systems are not Vendor specific but for each system the user has the advantage of selecting the device and interface according to his own choice of make and range. The most common serial communication system protocols can be studied under the following categories: Asynchronous, Synchronous and Bit-Synchronous communication standards.
(home..)
This protocol allows bits of information to be transmitted between two devices at an arbitrary point of time. The protocol defines that the data, more appropriately a character is sent as frames which in turn is a collection of bits. The start of a frame is identified according to a START bit(s) and a STOP bit(s) identifies the end of data frame. Thus, the START and the STOP bits are part of the frame being sent or received. The protocol assumes that both the transmitter and the receiver are configured in the same way, i.e., follow the same definitions for the start, stop and the actual data bits. Both devices, namely, the transmitter and the receiver, need to communicate at an agreed upon data rate (baud rate) such as 19,200 KB/s or 115,200 KB/s. This protocol has been in use for 15 years and is used to connect PC peripherals such as modems and the applications include the classic Internet dial-up modem systems. Asynchronous systems allow a number of variations including the number of bits in a character (5, 6, 7 or 8 bits), the number of stops bits used (1, 1.5 or 2) and an optional parity bit. Today the most common standard has 8 bit characters, with 1 stop bit and no parity and this is frequently abbreviated as '8-1-n'. A single 8-bit character, therefore, consists of 10 bits on the line, i.e., One Start bit, Eight Data bits and One Stop bit (as shown in the figure below). Most important observation here is that the individual characters are framed (unlike all the other standards of serial communication) and NO CLOCK data is communicated between the two ends.
Serial Data
DATA BITS
Serial Data
for
Asynchronous
Serial
Data
The serial port interface for connecting two devices is specified by the TIA (Telecommunications Industry Association) / EIA-232C (Electronic Industries Alliance) Version 2 EE IIT, Kharagpur 6
standard published by the Telecommunications Industry Association; both the physical and electrical characteristics of the interfaces have been detailed in these publications. RS-232, RS-422, RS-423 and RS-485 are each a recommended standard (RS-XXX) of the Electronic Industry Association (EIA) for asynchronous serial communication and have more recently been rebranded as EIA-232, EIA-422, EIA-423 and EIA-485. It must be mentioned here that, although, some of the more advanced standards for serial communication like the USB and FIREWIRE are being popularized these days to fill the gap for high-speed, relatively short-run, heavy-data-handling applications, but still, the above four satisfy the needs of all those high-speed and longer run applications found most often in industrial settings for plant-wide security and equipment networking. RS-232, 423, 422 and 485 specify the communication system characteristics of the hardware such as voltage levels, terminating resistances, cable lengths, etc. The standards, however, say nothing about the software protocol or how data is framed, addressed, checked for errors or interpreted
THE RS-232
(home..)
This is the original serial port interface standard and it stands for Recommended Standard Number 232 or more appropriately EIA Recommended Standard 232 is the oldest and the most popular serial communication standard. It was first introduced in 1962 to help ensure connectivity and compatibility across manufacturers for simple serial data communications.
Applications
(home..)
Peripheral connectivity for PCs (the PC COM port hardware), which can range beyond modems and printers to many different handheld devices and modern scientific instruments.
All the various characteristics and definitions pertaining to this standard can be summarized according to:
The maximum bit transfer rate capability and cable length. Communication Technique: names, electrical characteristics and functions of signals. The mechanical connections and pin assignments.
The Standard Maximum Bit Transfer Rate, Signal Voltages and Cable Length
RS-232s capabilities range from the original slow data rate of up to 20 kbps to over 1 Mbps for some of the modern applications. RS-232 is mainly intended for short cable runs, or local data transfers in a range up to 50 feet maximum, but it must be mentioned here that it also depends on the Baud Rate. Version 2 EE IIT, Kharagpur 7
It is a robust interface with speeds to 115,200 baud, and It can withstand a short circuit between any 2 pins. It can handle signal voltages as high / low as 15 volts.
(home..)
Signals can be in either an active state or an inactive state. RS232 is an Active LOW voltage driven interface where: ACTIVE STATE: An active state corresponds to the binary value 1. An active signal state can also be indicated as logic 1, on, true, or a mark. INACTIVE STATE: An inactive signal state is stated as logic 0, off, false, or a space. For data signals, the true state occurs when the received signal voltage is more negative than -3 volts, while the "false" state occurs for voltages more positive than 3 volts. For control signals, the "true" state occurs when the received signal voltage is more positive than 3 volts, while the "false" state occurs for voltages more negative than -3 volts.
6 V 3 O L T 0 A G E -3 -6
Signal State 0
Signal State 1
A factor that limits the distance of reliable data transfer using RS-232 is the signaling technique that it uses. This interface is single-ended meaning that communication occurs over a SINGLE WIRE referenced to GROUND, the ground wire serving as a second wire. Over that single wire, marks and spaces are created. While this is very adequate for slower applications, it is not suitable for faster and longer applications.
+
Tx
+
Data
D -
+ Data flow -
Rx
Disadvantage
(home..)
Being a single-ended system it is more susceptible to induced noise, ground loops and ground shifts, a ground at one end not the same potential as at the other end of the cable e.g. in applications under the proximity of heavy electrical installations and machineries But these vulnerabilities at very high data rates and for those applications a different standard, like the RS422 etc., is required which have been explained further.
The Standard
(home..)
Communication Technique
The flair of this standard lies in its capability in tolerating the ground voltage differences between sender and receiver. Ground voltage differences can occur in electrically noisy environments where heavy electrical machinery is operating. The criterion here is the differential-data communication technique, also referred to as balanced-differential signaling. In this, the driver uses two wires over which the signal is transmitted. However, each wire is driven and floating separate from ground, meaning, neither is grounded and in this respect this system is different to the single-ended systems. Correspondingly, the receiver has two inputs, each floating above ground and electrically balanced with the other when no data is being transmitted. Data on the line causes a desired electrical imbalance, which is recognized and amplified by the receiver. The common-mode signals, such as induced electrical noise on the lines caused from machinery or radio transmissions, are, for the most part, canceled by the receiver. That is because the induced noise is identical on each wire and the receiver inverts the signal on one wire to place it out of phase with the other causing a subtraction to occur which results in a Zero difference. Thus, noise picked up by the long data lines is eliminated at the receiver and does not interfere with data transfer. Also, because the line is balanced and separate from ground, there is no problem associated with ground shifts or ground loops.
Rx
+
Tx
+ + R
Rx
D -
R +
Rx
RS-422 Differential Signaling, Unidirectional, Half Duplex, Multi-drop It may be mentioned here to avoid any ambiguity in understanding the RS-422 and the RS-423 standards, that, the standard RS-423 is an advanced counterpart of RS-422 which has been designed to tolerate the ground voltage differences between the sender and the receiver for the more advanced version of RS-232, that is, the RS-232C. Unlike RS-232, an RS-422 driver can service up to 10 receivers on the same line (bus). This is often referred to as a half-duplex single-source multi-drop network, (not to be confused with multi-point networks associated with RS-485), this will be explained further in conjugation with RS-485. Version 2 EE IIT, Kharagpur 11
Like RS-232, however, RS-422 is still half-duplex one-way data communications over a two-wire line. If bi-directional or full-duplex operation is desired, another set of driver, receiver(s) and two-wire line is needed. In which case, RS-485 is worth considering.
Applications
This fits well in process control applications in which instructions are sent out to many actuators or responders. Ground voltage differences can occur in electrically noisy environments where heavy electrical machinery is operating.
RS-485
This is an improved RS-422 with the capability of connecting a number of devices (transceivers) on one serial bus to form a network.
The Standard Maximum Bit Transfer Rate, Signal Voltages and Cable Length
Such a network can have a "daisy chain" topology where each device is connected to two other devices except for the devices on the ends. Only one device may drive data onto the bus at a time. The standard does not specify the rules for deciding who transmits and when on such a network. That solely depends upon the system designer to define. Variable data rates are available for this standards but the standard max. data rate is 10 Mbps, however ,some manufacturers do offer up to double the standard range i.e. around 20 Mbps,but of course, it is at the expense of cable width. It can connect upto 32 drivers and receivers in fully differential mode similar to the RS 422. (home)
Communication Technique
EIA Recommended Standard 485 is designed to provide bi-directional half-duplex multi-point data communications over a single two-wire bus. Like RS-232 and RS-422, full-duplex operation is possible using a four-wire, two-bus network but the RS-485 transceiver ICs must have separate transmit and receive pins to accomplish this. RS-485 has the same distance and data rate specifications as RS-422 and uses differential signaling but, unlike RS-422, allows multiple drivers on the same bus. As depicted in the Figure below, each node on the bus can include both a driver and receiver forming a multi-point star network. Each driver at each node remains in a disabled highimpedance state until called upon to transmit. This is different than drivers made for RS422 where there is only one driver and it is always enabled and cannot be disabled. With automatic repeaters and tri-state drivers the 32-node limit can be greatly exceeded. In fact, the ANSI-based SCSI-2 and SCSI-3 bus specifications use RS-485 for the physical (hardware) layer. Version 2 EE IIT, Kharagpur 12
RX TX
Enable Enable
D R Enable
TX RX
RX
TX
Advantages
Among all of the asynchronous standards mentioned above this standard offers the maximum data rate. Apart from that special hardware for avoiding bus contention and , A higher receiver input impedance with lower Driver load impedances are its other assets.
(home..)
Signaling Technique
Drivers and
Differential (Balanced) 1 Driver 10 Receivers 4000 feet 10 Mbps down to 100 kbps +/-2.0 V
Differential (Balanced) 32 Drivers 32 Receivers 4000 feet 10 Mbps down to 100 kbps +/-1.5 V Version 2 EE IIT, Kharagpur 13
Receivers on
Bus
Maximum Cable Length Original Standard Maximum Data Rate Minimum Loaded Driver Output Voltage Levels
+/-5.0 V
3 to 7 k 3 to 7 k
100 4k or greater
54 12 k or greater
(home..)
Serial Port Pin and Signal Assignments Pin Label 1 2 3 The DB9 male connector 4 5 6 7 8 9 CD RD TD Signal Name Carrier Detect Received Data Transmitted Data Signal Type Control Data Data Control Ground Control Control Control Control
DTR Data Terminal Ready GND DSR RTS CTS RI Signal Ground Data Set Ready Request to Send Clear to Send Ring Indicator
(The RS-232 standard can be referred for a description of the signals and pin assignments used for a 25-pin connector) Because RS-232 mainly involves connecting a DTE to a DCE, the pin assignments are defined such that straight-through cabling is used, where pin 1 is connected to pin 1, pin 2 is connected to pin 2, and so on. A DTE to DCE serial connection using the Transmit Data (TD) pin and the Receive Data (RD) pin is shown below.
TD (pin 3)
DTE
RD (pin 3) RD (pin 2)
DCE
TD (pin 2)
Connecting two DTE's or two DCE's using a straight serial cable, means that the TD pin on each device are connected to each other, and the RD pin on each device are connected to each other. Therefore, to connect two like devices, a null modem cable has to be used. As shown below, null modem cables crosses the transmit and receive lines in the cable. TD (pin 3)
DTE
TD (pin 3) RD (pin 2)
DTE
RD (pin 2)
Serial ports consist of two signal types: data signals and control signals. To support these signal types, as well as the signal ground, the RS-232 standard defines a 25-pin connection. However, most PC's and UNIX platforms use a 9-pin connection. In fact, only three pins are required for serial port communications: one for receiving data, one for transmitting data, and one for the signal ground. Throughout this discussion computer is considered a DTE, while peripheral devices such as modems and printers are considered DCE's. Note that many scientific instruments function as DTE's. The term "data set" is synonymous with "modem" or "device," while the term "data terminal" is synonymous with "computer."
(Detail PC PC communication.)
(home..)
The schematic for a connection between the PC UART port and the Modem serial port is as shown below:
TxD
RxD
Note: The serial port pin and signal assignments are with respect to the DTE. For example, data is transmitted from the TD pin of the DTE to the RD pin of the DCE.
The control pins include RTS and CTS, DTR and DSR, CD, and RI.
(home..)
PROBLEM: Suppose one PC needs to send data to another computer located far away from its vicinity. Now, the actual data is in the parallel form, it needs to be converted into its serial counterpart. This is done by a Parallel-in-Serial-out Shift register and a Serial-in-Parallel-out Shift register (some electronic component). It has to be made sure that the transmitter must not send the data at a rate faster than with which the receiver can receive it. This is done by introducing some handshaking signals or circuitry in conjugation with the actual system. For very short distances, devices like UART(Universal Asynchronous Receiver Transmitter: IN8250 from National Semiconductors Corporation) and USART (Universal Synchronous Asynchronous Receiver Transmitter; Intel 8251A from Intel Corporation.) incorporate the essential circuitry for handling this serial communication with handshaking. For long distances Telephone lines (switched lines) are more practically feasible because of there pre-availability. ONE COMPLICATION: BANDWIDTH is only 300 3000Hz. Version 2 EE IIT, Kharagpur 17
REMEDY: Convert the digital signal to audio tones. The device, which is used to do this conversion and vice-versa, is known as a MODEM.
D T E
D
TELEPHONE LINE
D C E
D T E
C E
Both the main microcomputer and the end-device or the time-shared device can be referred to as terminals. Whenever a terminal is switched on it first performs a self-diagnostic test, in which it checks itself and if it finds that its integrity is fully justified it asserts the DTR (data-terminal ready) signal low. As the modem senses it getting low, it understands that the terminal is ready. The modem then replies the terminal by asserting DSR (data-set ready) signal low. Here the direction of the arrows is of prime importance and must be remembered to get the full understandability of the whole procedure. If the terminal is actually having some valuable data to convey to the end-terminal it will assert the RTS (request-to-send) signal low back to the modem and, in turn, the modem will assert the CD (carrier-detect) signal to the terminal indicating as if now it has justified the connection with the terminal computer. But it may be possible that the modem may not be fully ready to transmit the actual data to the telephone, this may be because of its buffer saturation and several other reasons. When the modem is fully ready to send the data along the telephone line it will assert the CTS (Clear-tosend) signal back to the terminal. The terminal then starts sending the serial data to the modem and the modem. When the terminal gets exhausted of the data it asserts the RTS signal low indicating the modem that it has not got any more data to be sent. The modem in turn unasserts its CTS signal and stops transmitting. The same way initialization and the handshaking processes are executed at the other end. Therefore, it must be noted here that the very important aspect of data communication is the definition of the handshaking signals defined for transferring serial data to and from the modem.
Current loops
(home..)
Current loops are a standard, which are used widely in process automation. 20 mA are wirely used for transmitting serial communication data to programmable process controlling devices. Other widely used standard is 4-20mA current loop, which is used for transmitting analogue measurement signals between the sensor and measurement device.
(home..)
In digital communications 20 mA current loop is a standard. The transmitters will only source 20 mA and the receiver will only sink 20 mA. Current loops often use opto-couplers. Here it is the current which matters and not the voltages. For measurement purposes a small resistance, say of value1k, is connected in series with the receiver/transmitter and the current meter. The current flowing into the receiver indicates the scaled data, which is actually going inside it. The data transmitted though this kind of interface is usually a standard RS-232 signal just converted to current pulses. Current on and off the
transmission line depends on how the RS-232 circuit distinguishes between the value of currents and in what way it interprets the logic state thus obtained.
(home..)
4-20 mA current loop interface is the standard for almost all the process control instruments. This interface works as follows. The sensor is connected to a process controlling equipment, which reads the sensor value and supplies a voltage to the loop where the sensor is connected and reads the amount of current it takes. The typical supply voltage for this arrangement is around 12-24 Volts through a resistor and the measured output is the voltage drop across that resistor converted into its current counterpart. The current loop is designed so that a sensor takes 4 mA current when it is at its minimum value and 20 mA when it is in its maximum value. Because the sensor will always pass at least 4 mA current and there is usually a voltage drop of many volts over the sensor, many sensor types can be made to be powered from only that loop current.
Module 5
Embedded Communications
Version 2 EE IIT, Kharagpur 1
Lesson 26
Network Communication
Version 2 EE IIT, Kharagpur 2
Instructional Objectives
After going through this lesson the student would be able to Describe the need and importance of networking in an embedded system List the commonly adopted network communication standards and explain their basic features Distinguish between the CAN Bus, Field Bus and other network communication standards for embedded applications Choose a particular network standard to suit an application
T T
T F
T T
T F B
T T T T
T F F T
F T T T
F F T T A
T F T T
T T F F
F T T
F F T B
Network Communication
The role of networking in present-day data communication hardly needs any elaboration. The situation is also similar in the case of embedded systems, particularly those which are distributed over a larger geographical region the so-called distributed embedded systems. Unfortunately, the most common network standard, namely the Ethernet, is not suitable for such distributed systems, especially when there are real-time constraints to be satisfied. This is due to the lack of any service time guarantee in the Ethernet standard. On the other hand, alternatives like Token Ring, which do provide a service-time guarantee, are not very suitable because of the requirement of a ring-type topology not very convenient to implement in the industrial environment. The industry therefore proposed a standard called Token-bus (and got it approved as the IEEE 802.5 specification) to cater to such requirements. However, the standard became too complex and inefficient as a result. Subsequently different manufacturers have come up with their own standards, which are being implemented in specific applications. In this lesson we learn about three such standards, namely o o o I2C Bus Field Bus CAN Bus
We discuss about the last one in a little more detail because it is slowly emerging as one of the most popular networking standards for many embedded applications, like Home Appliances, Automobiles, Ships, Vending Machines, Medical Equipment, small-scale industries etc.
Start bit
Read / Write
8 Data bits
Ack Bits
Used by the receiver of the data to indicate successful reception The original specifications for this standards were quite low, namely, 100 kbps with 7 bit addressing. The recent specifications have raised the data rate to 3.4 Mbps with 10 bit addressing.
Initiatives such as the Interoperable Systems Project (ISP) from manufacturers under the leadership of Siemens, Fisher-Rosement and Yokagawa, or its counterpart, the WorldFIP, mainly supported by Honeywell, wanted to establish a de-facto Fieldbus standard by introducing their products onto the market. Both organisations merged in the Fieldbus Foundation (FF). This foundation strives to get a single world standard worked out. Industrial applications range from pulp and paper production and wastewater treatment right through to power station technology. PROFIBUS operations are processed by standard telegrams passing between master and slave using predefined channels called communication relations. Data is stored as objects which can be addressed in the object directory via an index. PROFIBUS specifies an RS 485 interface with a baud rate of 9.6 kbit/s over a cable length of 1200 m and up to 500 kbit/s over a cable length of 200 m. Telegrams consist of communication relations of the target device, the PROFIBUS partner address as well as the indices of the object to be addressed along with any data. With the exception of broadcasts, all telegrams are answered with a positive or negative acknowledgement. This ensures rapid recognition of faulty or non-existent stations. Transmission technology (Physical Layer) of the PROFIBUS-PA can be characterized as follows: o Digital, synchronous bit data transmission. o Data rate 31.25 kbit/s. o Manchester coding. o Signal transmission and remote power supply with transposed two-wire cabling (screened/unscreened). o Remote power supply DC voltage 9V...32V. o Signal AC voltage 0.75 Vpp...1 Vpp (send voltage). o Line and tree topology. o Up to 1.9 km total cabling. o Up to 32 members per cable segment. o Can be expanded with maximum four repeaters. The FOUNDATION fieldbus model is based on the IEC Open Systems Interconnect (OSI) layered communication model.
Fieldbus has additional advantages over 4-20 mA because many devices can connect to a single wire pair resulting in significant savings in wiring costs.
Communication stack
The communications stack comprises OSI Layers 2 and 7. The FOUNDATION fieldbus does not use the OSI layers 3, 4, 5 and 6 because the functions of these layers are not needed. Instead of these layers, the Fieldbus Access Sublayer (FAS) is used to map layer 7 directly to layer 2. Layer 2, the Data Link Layer (DLL), controls transmission of messages onto the fieldbus. The DDL manages access to the fieldbus through a deterministic centralised bus scheduler called the Link Active Scheduler (LAS). A fieldbus may have multiple Link Masters. If the current LAS fails, one of the Link Masters will become the LAS and the operation of the FOUNDATION fieldbus will continue. The FOUNDATION fieldbus is designed to "fail operational". The DLL is a subset of the emerging ISA/IEC DLL standards committee work. The Fieldbus Message Specification (FMS) is modeled after the OSI layer 7 Application Layer. FMS provides the communications services needed by the User Layer for remote access of data across the fieldbus network.
User Layer
The User Layer is not defined by the OSI model. However, for the first time, the FOUNDATION fieldbus specification defines a complete user layer based on function blocks. Function blocks provide the elements necessary for manufacturers to construct interoperable instruments and controllers.
Device descriptions
Each fieldbus device is described by a device description (DD) written in a special programming language known as Device Description Language (DDL). The DD can be thought of as a "driver" for the device. The DD provides all of the information needed for a control system or host to interpret communications coming from the device, including configuration, and diagnostic information. Any control system or host can communicate with a device if it "knows" the DD for the device. The host device uses an interpreter called Device Description Services (DDS) to read the DD for the device. New FOUNDATION fieldbus devices can be added to the fieldbus at any time by simply connecting the device to the fieldbus wire and providing the control system or host can read the identification of the fieldbus device, including the DD identifier, over the fieldbus. Once the DD identifier is is known, the host reads the DD from a CDROM and supplies the DD to DDS for interpretation. Version 2 EE IIT, Kharagpur 7
The completion of the technical specifications for an interoperable fieldbus system is a major milestone in the history of automation. The FOUNDATION fieldbus specification was developed by a consortium of instrument and control system manufacturers that represent over 90% of the instrumentation and control systems provided to end-users worldwide. The specifications will allow many manufacturers to deliver a wide range of interoperable fieldbus devices. These devices will usher in the next major technology transition in process and manufacturing automation.
Main Features
CAN can link up to 2032 devices (assuming one node with one identifier) on a single network. But accounting to the practical limitations of the hardware (transceivers), it may only link up to110 nodes (with 82C250, Philips) on a single network. It offers high-speed communication rate up to 1 Mbits/sec thus facilitating real-time control. It embodies unique error confinement and the error detection features making it more trustworthy and adaptable to a noise critical environment.
CAN Versions
Originally, Bosch provided the specifications. However the modern counterpart is designated as Version 2.0 of this specification, which is divided into two parts: Version 2.0A or Standard CAN; Using 11 bit identifiers. Version 2.0B or Extended CAN; Using 29 bit identifiers. The main aspect of these Versions is the formats of the MESSAGE FRAME; the main difference being the IDENTIFIER LENGTH.
CAN Standards
There are two ISO standards for CAN. The two differ in their physical layer descriptions. ISO 11898 handles high-speed applications up to 1Mbit/second. ISO 11519 can go upto an upper limit of 125kbit/second.
FIGURE
F R A M E
CRC Field ACK EOF Intr Idle
Data Field
11-bit Identifier
SOF
RTR r1 r0
M E S S A G E
Idle Arbitration Field Control
F R A M E
Data Field CRC Field ACK EOF Intr Idle
11-bit Identifier
18-bit Identifier
DLC
SOF
SRR IDE
RTR r0
r1
FIGURE
Module 5
Embedded Communications
Version 2 EE IIT, Kharagpur 1
Lesson 27
Wireless Communication
Version 2 EE IIT, Kharagpur 2
Instructional Objectives
After going through this lesson the student would be able to Describe the benefits and issues in wireless communication Distinguish between WLAN, WPAN and their different implementations like Ricochet, HiperLAN, HomeRF and Bluetooth Choose a particular wireless communication standard to suit an application
Wireless Communication
Third generation wireless technologies are being developed to enable personal, high-speed interactive connectivity to wide area networks (WANs). The IEEE 802.11x wireless technologies finds themselves with an increasing presence in corporate and academic office spaces, buildings, and campuses. Furthermore, with slow but steady inroads into public areas such as airports and coffee bars. WAN, LAN and PAN technologies enable device connectivity to infrastructure-based services - either through campus or corporate backbone intranet. The other end of coverage spectrum is occupied by the short-range embedded wireless connectivity technologies that allow devices to communicate with each other directly without the need for an established infrastructure. At this end of the coverage spectrum the wireless technologies like Ricochet, Bluetooth etc. offer the benefits of omni-directionality and the elimination of the line-of-sight requirement of RF-based connectivity. The embedded connectivity space resembles a communication bubble that follows people around and empowers them to connect their personal devices with other devices that enter the bubble. Connectivity in this bubble is spontaneous and ephemeral and can involve several devices of diverse computing capabilities, unlike wireless LAN solutions that are designed for communication between devices of sufficient computing power and battery. The table below shows a short comparison of various technologies in the wireless arena.
In this lesson we look at the most commonly adopted prospects of different wireless technologies mentioned above.
WLANs-IEEE 802.11X
This is the most prominent technology standard for WLANs (Wireless Local Area Networks). This comprises of a PHY (Physical Layer) and MAC (Physical and Medium Access Control). This allows specific carrier frequencies in the 2.4 GHz range bandwidths with data rates of 1 or 2 Mbps. Further enhancements to the same technology has lead to the modern day protocol known as the 802.11b which provides a basic data rate of 11Mbps and a fall-back rate of 5.5Mbps.All these technologies operate in the internationally available 2.4GHz ISM band. Both IEEE 802.11 and 802.11b standards are capable of providing communications between a number of terminals as an ad hoc network using peer-to-peer mode (see figures at the end) or as a client/server (see figures at the end) wireless configuration or a complicated distributed network (see figures at the end). All these networks require Wireless Cards (PCMCIA-Personal Computer Memory Card International Association-Cards) and wireless LAN Access points. There are two transmission types for these technologies: Frequency Hopping Spread Spectrum (FHSS) and Direct Sequence Spread Spectrum (DSSS). Whereas FHSS is primarily used for low power, low-range applications, the DSSS is popular with Ethernet-like data rates. In the ad-hoc network mode, as there is no central controller, the wireless access cards use the CSMA/CA(Carrier Sense Multiple Access with Collision Avoidance) protocol to resolve shared access of the channel. In the client/server configuration, many PCs and laptops, physically close to each other (20 to 500 meters), can be linked to a central hub (Known as the access point) that serves as a bridge between them and the wired network. The wireless access cards provide the interface between the PCs and the antenna while the access point serves as the wireless LAN hub. The access point is as high as the ceiling of a roof and can support 115-250 users for receiving, buffering and transmitting data between the WLAN and the wired network. Access points can be programmed to select one of the hopping sequences, and he PCMCIA cards tune in to the corresponding sequence. The WLAN bridge could also be implemented using line-of-sight directional antennas. Handover and roaming can also be supported across the various access points. Encryption is also supported using the optional shared-key RC4 (Ron's Code 4 or Rivest's Cipher) algorithm. Version 2 EE IIT, Kharagpur 4
Palm Pilot
PDA
Station
Wired Network
Access Point
WPANs-802.15X
WPANs (Wireless Personal Area Networks) work as short-range wireless networks. The various WPAN protocols and their interfaces have been and are being standardized by the IEEE 802.15 WG (WPAN Working Group). There are four divisions of this standardization.
Ricochet
This provides a secure mobile access to the desktop from outside an office. This service is provided by MERICOM a commercial Internet Service Provider (ISP). This was primarily provided at the airports and some selected areas. The Ricochet Network is a wide area wireless network system using spread spectrum packet switching technique and Metricom's patented frequency hopping, checker architecture. The network operates within the license-free (902-928) MHz) ISM band. A Ricochet wireless micro cellular data network (MCDN) is shown in the figure below.
Microcell radios on streetlights or other utility poles Name Server Gateway
Modem radio
Router
Computer
Services
Richochet provides immediate, dependable, and secure connections without the cost and complexities of land-based phone lines, dial-up connections, or cellular modems. Richochet Version 2 EE IIT, Kharagpur 7
modem features are its 28,800 bps, 24-hour access. The Richochet wireless network is based on frequency hopping, spread-spectrum packet radio technology, with transmissions randomly hopping every two-fifths of a second over 162 channels.
HomeRF
This technology comes under ad-hoc networking which spans an area such as enclosed home or an office building or a warehouse floor in a workshop. A specification for wireless communications in home called the shared wireless access protocol (SWAP) has been developed. Some common applications targeted are: access to a public network telephone (isochronous multimedia) and Internet (data) entertainment networks (cable television, digital audio and video with IEEE 1394 transfer and sharing of data and resources (printer, Internet connection, etc.), and home control and automation.
Advantages of home RF
In HomeRF same connection can be shared for both voice and data among the devices, at the same time. This technology provides a platform for a broad range of interoperable consumer devices for wireless digital communication between PCs and consumer electronic devices anywhere in and around the home.
Phone Connection
MAIN PC Cell Phone Microwave Oven Fridge Data Pad Television Handheld Communicator
Clock
Pager
Other PCs
Typical characteristics
Uses the 2.4 GHz ISM band Data rate: 2 Mbps and 1 Mbps Range: 50m Mobility 10m/s Topology: Packet-Oriented Supports both centralized communication (Infrastructure) and ad-hoc (Infrastructure-less) communication Support for simultaneous voice and data transmissions Provides Six audio connections at 32kbps with 20ms latency Maximum data throughput 1.2 Mbps Supports Low-Power paging mode Provides QoS to voice-only devices and best effort for data-only devices.
HiperLAN
"HiperLAN" or "High-performance LAN" has been designed specifically for an ad-hoc environment.
Can support both multimedia data and asynchronous data at rates as high as 23.5 Mbps. Employs 5.15 GHz and 1.71 GHz frequency bands Range : 50m Mobility 10m/s Version 2 EE IIT, Kharagpur 9
Topology : Packet-Oriented Supports both centralized and ad-hoc communication. Supports 25 audio connections at 32kbps and latency=10ms and, a video connection of 2 Mbps with 100ms latency and data rate=13.4Mbps.It supports MPEG or other state-ofthe-art real-time digital audio and video standards. HiperLANs are available in two types : o TYPE 1 : This has distributed MAC with QoS provisions and is based on GMSK (Gaussian minimum shift keying) o TYPE 2: This has a centralized scheduled MAC and is based on OFDM.
Objectives of HiperLAN
Provide QoS to build multiservice networks Provide strong security Handoff when moving between local area and wide area Increased throughput Ease of use, deployment, and maintenance Affordability and Scalability A typical HiperLAN system is shown in the figure below:
Fixed Network
AP
AP
AP
AP
HiperLAN System
A Bluetooth Connection Bluetooth provides many options to the user. For instance, Bluetooth radio technology built into both cellular telephone and a laptop replaces the cable used today to connect a laptop to a cellular phone. Printers, desktops, FAX machines, keyboards, joysticks and virtually any other digital device can be networked by the Bluetooth system. Bluetooth also provides a universal bridge to existing data networks and mechanism to form small private ad hoc groups of connected devices away from fixed network architectures. Bluetooth wireless communication technologies operate in the 2.4 GHz range. There are certain propositions related to RF Communication in the 2.4 GHz spectrum which the device developers must follow. This is important for an organized use of the spectrum because it is globally unlicensed. As such it is bound by specific regulations put forth by various countries in their respective territories. In context to wireless communications the RF Spectrum has been divided into 79 channels where bandwidth is limited to 1 MHz per channel. Frequency Hopping spread spectrum communications must be incorporated. Also proper mechanism for interference anticipation and removal should also be there. This is essential on account of the fact that the 2.4 GHz spectrum is unlicensed and, hence, more vulnerable to signal congestion because of increasing number of new users trying to communicate within the bandwidth.
two different communication topologies of Bluetooth PANs are piconet and scatternet. They are described in brief below.
The Piconet
PROXIMITY SPHERE AS1 Active slave (includes sniff hold modes) Parked Slave Standby Outside Piconet AS1 PS1
MASTER
PICONET
PS1 SB1
AS2 PS2
SB1 SB2
AS3
AS4
A PICONET A piconet consists of single master and all slaves in its proximity, which are communicating with it. The slaves may be in active, sniff, hold or park modes at any instant of time. There can be upto seven slaves and any number of parked slaves and standby devices in the vicinity of the master. The above figure shows a typical piconet. The figure shows two spheres. The white filled inscribed sphere comprises the piconet where the ellipses represent the devices or slaves and the box represents the master. Thus, there is only one master and several slaves. The slave names starting from `A' represent the Active slaves and these are linked to the master with continuous lines meaning `ACTIVE'. The slave names starting with `P' represent the parked slaves. Dashed lines are shown connecting it to the master meaning that the connection is not continuous but the devices are in the piconet i.e., `PARKED'. Some other slaves with names starting form `S' indicate the slaves, which are in STAND-BY and these, are actually outside the piconet but inside the proximity sphere.
The Scatternet
Piconet A AS A1 PS A1 Piconet B AS B1 PS B1 AS B2 PS B2 AS B4
MASTER
MASTER
AS B3 AS A2
AS A3
A SCATTERNET Scatternet is formed when two or more piconets fall in each others proximity. More precisely, a scatternet is formed when two or more piconets at least partially overlap in time and space. Within a Scatternet a slave can participate in multiple piconets by establishing connections and synchronizing with different masters in its proximity. A single device may act as master in one piconet and at the same time as slave in another one. A practical example of scatternet is mobile communication in which devices move frequently in and out of proximity of other devices. Figure above shows a typical Scatternet.
Bluetooth Specifications
Typical Bluetooth specifications have been characterized in the table below.
L2CAP
LMP
Audio
Baseband
Bluetooth Core Protocols A brief description is as follows. Service Discovery Protocol (SDP) provides means for application to discover which services are provided by or available through a Bluetooth device. It also allows applications to determine the characteristics of those available services. Logical Link Control and adaptation layer protocol (L2CAP) supports higher-level protocol multiplexing, packet-segmentation and reassembly, and the conveying of QoS (Quality of Service) information. The link managers (on either side) for link step and control use Link Manager Protocol (LMP). The baseband and link control layer enables physical RF link between Bluetooth units forming piconet. It provides two different packets, SCO and ACL, which can be transmitted in a multiplexing manner on the same RF link. Different master/Slave pairs of the same piconet can use different link types, and the link type may change arbitrarily during a session. Each link type supports up to sixteen different packet types. Four of these are control packets and are common for both SCO and ACL links. Both link types use a TDD scheme or full-duplex transmissions. The SCO link is symmetric and typically supports time-bounded voice traffic. SCO packets are transmitted over reserved intervals. Once the connection is established, both master and slave units may send SCO packet types and allow both voice and data transmission-with only the data portion being retransmitted when corrupted.
Operational States
OPERATIONAL STATES OF THE BLUETOOTH DEVICES
Stand -By
PAGE
PAGE SCAN
INQUIRY SCAN
INQUIRY
Master Response
Slave Response
Inquiry Response
Connection
State Description
STANDBY This is the default state and the lowest power consuming one too. Only the Bluetooth clock operates in the low-power mode. INQUIRY In this state a device seeks and gets familiar with the identity of other devices in its proximity. The other devices must have their Inquiry Scan state ENABLED if they want to entertain the query from other devices. PAGE In this state master of a piconet invites other devices to join in. To entertain this request the invitee must have its Page Scan state ENABLED.
A device may bypass the inquiry state if the identity of the device it is wanting to page is previously known (see the figure above). The figure above also indicates that any member of a piconet not necessarily the master, may still perform INQUIRY and PAGE operations for additional devices, thus, paving way for a Scatternet.
Module 6
Embedded System Software
Version 2 EE IIT, Kharagpur 1
Lesson 28
Introduction to Real-Time Systems
Version 2 EE IIT, Kharagpur 2
1. Introduction
Commercial usage of computer dates back to a little more than fifty years. This brief period can roughly be divided into mainframe, PC, and post-PC eras of computing. The mainframe era was marked by expensive computers that were quite unaffordable by individuals, and each computer served a large number of users. The PC era saw the emergence of desktops which could be easily be afforded and used by the individual users. The post-PC era is seeing emergence of small and portable computers, and computers embedded in everyday applications, making an individual interact with several computers everyday. Real-time and embedded computing applications in the first two computing era were rather rare and restricted to a few specialized applications such as space and defense. In the post-PC era of computing, the use of computer systems based on real-time and embedded technologies has already touched every facet of our life and is still growing at a pace that was never seen before. While embedded processing and Internet-enabled devices have now captured everyones imagination, they are just a small fraction of applications that have been made possible by real-time systems. If we casually look around us, we can discover many of them often they are camouflaged inside simple looking devices. If we observe carefully, we can notice several gadgets and applications which have today become in- dispensable to our every day life, are in fact based on embedded real-time systems. For example, we have ubiquitous consumer products such as digital cameras, cell phones, microwave ovens, camcorders, video game sets; telecommunication domain products and applications such as set-top boxes, cable modems, voice over IP (VoIP), and video conferencing applications; office products such as fax machines, laser printers, and security systems. Besides, we encounter real-time systems in hospitals in the form of medical instrumentation equipments and imaging systems. There are also a large number of equipments and gadgets based on real-time systems which though we normally do not use directly, but never the less are still important to our daily life. A few examples of such systems are Internet routers, base stations in cellular systems, industrial plant automation systems, and industrial robots. It can be easily inferred from the above discussion that in recent times real-time computers have become ubiquitous and have permeated large number of application areas. At present, the Version 2 EE IIT, Kharagpur 3
computers used in real-time applications vastly outnumber the computers that are being used in conventional applications. According to an estimate [3], 70% of all processors manufactured world-wide are deployed in real-time embedded applications. While it is already true that an overwhelming majority of all processors being manufactured are getting deployed in real-time applications, what is more remarkable is the unmistakable trend of steady rise in the fraction of all processors manufactured world-wide finding their way to real-time applications. Some of the reasons attributable to the phenomenal growth in the use of real-time systems in the recent years are the manifold reductions in the size and the cost of the computers, coupled with the magical improvements to their performance. The availability of computers at rapidly falling prices, reduced weight, rapidly shrinking sizes, and their increasing processing power have together contributed to the present scenario. Applications which not too far back were considered prohibitively expensive to automate can now be affordably automated. For instance, when microprocessors cost several tens of thousands of rupees, they were considered to be too expensive to be put inside a washing machine; but when they cost only a few hundred rupees, their use makes commercial sense. The rapid growth of applications deploying real-time technologies has been matched by the evolutionary growth of the underlying technologies supporting the development of real-time systems. In this book, we discuss some of the core technologies used in developing real-time systems. However, we restrict ourselves to software issues only and keep hardware discussions to the bare minimum. The software issues that we address are quite expansive in the sense that besides the operating system and program development issues, we discuss the networking and database issues. In this chapter, we restrict ourselves to some introductory and fundamental issues. In the next three chapters, we discuss some core theories underlying the development of practical real-time and embedded systems. In the subsequent chapter, we discuss some important features of commercial real-time operating systems. After that, we shift our attention to realtime communication technologies and databases.
books are displayed by the software. In this example, the events issue of query book command and display of results are logically ordered in terms of which events follow the other. But, no quantitative expression of time was required. Clearly, this example behavior is devoid of any real-time considerations. We are now in a position to define what a real-time system is: A system is called a real-time system, when we need quantitative expression of time (i.e. real-time) to describe the behavior of the system. Remember that in this definition of a real-time system, it is implicit that all quantitative time measurements are carried out using a physical clock. A chemical plant, whose part behavior description is - when temperature of the reaction chamber attains certain predetermined temperature value, say 250oC, the system automatically switches off the heater within say 30 milliseconds - is clearly a real-time system. Our examples so far were restricted to the description of partial behavior of systems. The complete behavior of a system can be described by listing its response to various external stimuli. It may be noted that all the clauses in the description of the behavior of a real-time system need not involve quantitative measures of time. That is, large parts of a description of the behavior of a system may not have any quantitative expressions of time at all, and still qualify as a real-time system. Any system whose behavior can completely be described without using any quantitative expression of time is of course not a real-time system.
Each time the plant conditions are sampled, the automation system should decide on the exact instantaneous corrective actions required such as changing the pressure, temperature, or chemical concentration and carry out these actions within certain predefined time bounds. Typically, the time bounds in such a chemical plant control application range from a few micro seconds to several milliseconds. Example 2: Automated Car Assembly Plant An automated car assembly plant is an example of a plant automation system. In an automated car assembly plant, the work product (partially assembled car) moves on a conveyor belt (see Fig. 28.1). By the side of the conveyor belt, several workstations are placed. Each workstation performs some specific work on the work product such as fitting engine, fitting door, fitting wheel, and spray painting the car, etc. as it moves on the conveyor belt. An empty chassis is introduced near the first workstation on the conveyor belt. A fully assembled car comes out after the work product goes past all the workstations. At each workstation, a sensor senses the arrival of the next partially assembled product. As soon as the partially assembled product is sensed, the workstation begins to perform its work on the work product. The time constraint imposed on the workstation computer is that the workstation must complete its work before the work product moves away to the next workstation. The time bounds involved here are typically of the order of a few hundreds of milliseconds.
Chassis
Fit engine
Fit door
Fit wheel
Conveyor Belt Fig. 28.1 Schematic Representation of an Automated Car Assembly Plant Example 3: Supervisory Control And Data Acquisition (SCADA) SCADA are a category of distributed control systems being used in many industries. A SCADA system helps monitor and control a large number of distributed events of interest. In SCADA systems, sensors are scattered at various geographic locations to collect raw data (called events of interest). These data are then processed and stored in a real-time database. The database models (or reflects) the current state of the environment. The database is updated frequently to make it a realistic model of the up-to-date state of the environment. An example of a SCADA application is an Energy Management System (EMS). An EMS helps to carry out load balancing in an electrical energy distribution network. The EMS senses the energy consumption at the distribution points and computes the load across different phases of power supply. It also helps dynamically balance the load. Another example of a SCADA system is a system that monitors and controls traffic in a computer network. Depending on the sensed load in different segments of the network, the SCADA system makes the router change its traffic routing policy dynamically. The time constraint in such a SCADA
application is that the sensors must sense the system state at regular intervals (say every few milliseconds) and the same must be processed before the next state is sensed.
1.2.2. Medical
A few examples of medical applications of real-time systems are: robots, MRI scanners, radiation therapy equipments, bedside monitors, and computerized axial tomography (CAT). Example 4: Robot Used in Recovery of Displaced Radioactive Material Robots have become very popular nowadays and are being used in a wide variety of medical applications. An application that we discuss here is a robot used in retrieving displaced radioactive materials. Radioactive materials such as Cobalt and Radium are used for treatment of cancer. At times during treatment, the radioactive Cobalt (or Radium) gets dislocated and falls down. Since human beings can not come near a radioactive material, a robot is used to restore the radioactive material to its proper position. The robot walks into the room containing the radioactive material, picks it up, and restores it to its proper position. The robot has to sense its environment frequently and based on this information, plan its path. The real-time constraint on the path planning task of the robot is that unless it plans the path fast enough after an obstacle is detected, it may collide with it. The time constraints involved here are of the order of a few milliseconds.
Example 6: Multi-Point Fuel Injection (MPFI) System An MPFI system is an automotive engine control system. A conceptual diagram of a car embedding an MPFI system is shown in Fig.28.2. An MPFI is a real-time system that controls the rate of fuel injection and allows the engine to operate at its optimal efficiency. In older models of cars, a mechanical device called the carburetor was used to control the fuel injection rate to the engine. It was the responsibility of the carburetor to vary the fuel injection rate depending on the current speed of the vehicle and the desired acceleration. Careful experiments have suggested that for optimal energy output, the required fuel injection rate is highly nonlinear with respect to the vehicle speed and acceleration. Also, experimental results show that the precise fuel injection through multiple points is more effective than single point injection. In MPFI engines, the precise fuel injection rate at each injection point is determined by a computer. An MPFI system injects fuel into individual cylinders resulting in better power balance among the cylinders as well as higher output from each one along with faster throttle response. The processor primarily controls the ignition timing and the quantity of fuel to be injected. The latter is achieved by controlling the duration for which the injector valve is open popularly known as pulse width. The actions of the processor are determined by the data gleaned from sensors located all over the engine. These sensors constantly monitor the ambient temperature, the engine coolant temperature, exhaust temperature, emission gas contents, engine rpm (speed), vehicle road speed, crankshaft position, camshaft position, etc. An MPFI engine with even an 8-bit computer does a much better job of determining an accurate fuel injection rate for given values of speed and acceleration compared to a carburetor-based system. An MPFI system not only makes a vehicle more fuel efficient, it also minimizes pollution by reducing partial combustion. Multi Point Fuel Injection (MPFI) System
call details for billing purposes, and hand-off of calls as the mobile moves. Call hand-off is required when a mobile moves away from a base station. As a mobile moves away, its received signal strength (RSS) falls at the base station. The base station monitors this and as soon as the RSS falls below a certain threshold value, it hands-off the details of the on-going call of the mobile to the base station of the cell to which the mobile has moved. The hand-off must be completed within a sufficiently small predefined time interval so that the user does not feel any temporary disruption of service during the hand-off. Typically call hand-off is required to be achieved within a few milliseconds.
1.2.6. Aerospace
A few important use of real-time systems in aerospace applications are: avionics, flight simulation, airline cabin management systems, satellite tracking systems, and computer on-board an aircraft. Example 8: Computer On-board an Aircraft In many modern aircrafts, the pilot can select an auto pilot option. As soon as the pilot switches to the auto pilot mode, an on-board computer takes over all controls of the aircraft including navigation, take-off, and landing of the aircraft. In the auto pilot mode, the computer periodically samples velocity and acceleration of the aircraft. From the sampled data, the on-board computer computes X, Y, and Z co-ordinates of the current aircraft position and compares them with the pre-specified track data. Before the next sample values are obtained, it computes the deviation from the specified track values and takes any corrective actions that may be necessary. In this case, the sampling of the various parameters, and their processing need to be completed within a few micro seconds.
Example 10: Cell Phones Cell phones are possibly the fastest growing segment of consumer electronics. A cell phone at any point of time carries out a number of tasks simultaneously. These include: converting input voice to digital signals by deploying digital signal processing (DSP) techniques, converting electrical signals generated by the microphone to output voice signals, and sampling incoming base station signals in the control channel. A cell phone responds to the communications received from the base station within certain specified time bounds. For example, a base station might command a cell phone to switch the on-going communication to a specific frequency. The cell phone must comply with such commands from the base station within a few milliseconds.
discussed applications. For example, even if the results are produced just after 20 seconds, nothing untoward is going to happen - this may not be the case with the other discussed applications.
Sensor
Actuator
before they can be used by the actuator. This is termed output conditioning. Similarly, input conditioning is required to be carried out on sensor signals before they can be accepted by the computer. For example, analog signals generated by a photo-voltaic cell are normally in the milli-volts range and need to be conditioned before they can be processed by a computer. The following are some important types of conditioning carried out on raw signals generated by sensors and digital signals generated by computers: 1. Voltage Amplification: Voltage amplification is normally required to be carried out to match the full scale sensor voltage output with the full scale voltage input to the interface of a computer. For example, a sensor might produce voltage in the millivolts range, whereas the input interface of a computer may require the input signal level to be of the order of a volt. 2. Voltage Level Shifting: Voltage level shifting is often required to align the voltage level generated by a sensor with that acceptable to the computer. For example, a sensor may produce voltage in the range -0.5 to +0.5 volt, whereas the input interface of the computer may accept voltage only in the range of 0 to 1 volt. In this case, the sensor voltage must undergo level shifting before it can be used by the computer. 3. Frequency Range Shifting and Filtering: Frequency range shifting is often used to reduce the noise components in a signal. Many types of noise occur in narrow bands and the signal must be shifted from the noise bands so that noise can be filtered out. 4. Signal Mode Conversion: A type of signal mode conversion that is frequently carried out during signal conditioning involves changing direct current into alternating current and vice-versa. Another type signal mode conversion that is frequently used is conversion of analog signals to a constant amplitude pulse train such that the pulse rate or pulse width is proportional to the voltage level. Conversion of analog signals to a pulse train is often necessary for input to systems such as transformer coupled circuits that do not pass direct current.
D/A register
D/A converter
Fig. 28.4 An Output Interface Interface Unit: Normally commands from the CPU are delivered to the actuator through an output interface. An output interface converts the stored voltage into analog form and then outputs this to the actuator circuitry. This of course would require the value generated to be written on a register (see Fig. 28.4). In an output interface, in order to produce an analog output, the CPU selects a data register of the output interface and writes the necessary data to it. The two main functional blocks of an output interface are shown in Fig. 28.4. The interface takes care of the buffering and the handshake control aspects. Analog to digital conversion is frequently deployed in an input interface. Similarly, digital to analog conversion is frequently used in an output interface. In the following, we discuss the important steps of analog to digital signal conversion (ADC). Version 2 EE IIT, Kharagpur 12
Analog to Digital Conversion: Digital computers can not process analog signals. Therefore, analog signals need to be converted to digital form. Analog signals can be converted to digital form using a circuitry whose block diagram is shown in Fig. 28.7. Using the block diagram shown in Fig. 28.7, analog signals are normally converted to digital form through the following two main steps:
Voltage
Sample the analog signal (shown in Fig. 28.5) at regular intervals. This sampling can be done by a capacitor circuitry that stores the voltage levels. The stored voltage levels can be made discrete. After sampling the analog signal (shown in Fig. 28.5), a step waveform as shown in Fig. 28.6 is obtained. Convert the stored value to a binary number by using an analog to digital converter (ADC) as shown in Fig. 28.7 and store the digital value in a register.
Voltage
Fig. 28.7 Conversion of an Analog Signal to a 16 bit Binary Number Digital to analog conversion can be carried out through a complementary set of operations. We leave it as an exercise to the reader to figure out the details of the circuitry that can perform the digital to analog conversion (DAC).
Actuators
Environment
Sensors
Fig. 28.8 A Schematic Representation of an Embedded Real-Time System 3. Embedded: A vast majority of real-time systems are embedded in nature [3]. An embedded computer system is physically embedded in its environment and often controls it. Fig. 28.8 shows a schematic representation of an embedded system. As shown in Fig. 28.8, the sensors of the real-time computer collect data from the environment, and pass them on to the realtime computer for processing. The computer, in turn passes information (processed data) to the actuators to carry out the necessary work on the environment, which results in controlling some characteristics of the environment. Several examples of embedded systems were discussed in Section 1.2. An example of an embedded system that we would often refer is the Multi-Point Fuel Injection (MPFI) system discussed in Example 6 of Sec. 1.2. 4. Safety-Criticality: For traditional non-real-time systems safety and reliability are independent issues. However, in many real-time systems these two issues are intricately bound together making them safety-critical. Note that a safe system is one that does not cause any damage even when it fails. A reliable system on the other hand, is one that can operate for long durations of time without exhibiting any failures. A safety-critical system is required to be highly reliable since any failure of the system can cause extensive damages. We elaborate this issue in Section 1.5. 5. Concurrency: A real-time system usually needs to respond to several independent events within very short and strict time bounds. For instance, consider a chemical plant automation system (see Example1 of Sec. 1.2), which monitors the progress of a chemical reaction and controls the rate of reaction by changing the different parameters of reaction such as pressure, temperature, chemical concentration. These parameters are sensed using sensors fixed in the chemical reaction chamber. These sensors may generate data asynchronously at different rates. Therefore, the real-time system must process data from all the sensors concurrently, otherwise signals may be lost and the system may malfunction. These systems can be considered to be non-deterministic, since the behavior of the system depends on the exact timing of its inputs. A non-deterministic computation is one in which two runs using the same set of input data can produce two distinct sets of output data in the two runs. 6. Distributed and Feedback Structure: In many real-time systems, the different components of the system are naturally distributed across widely spread geographic locations. In such Version 2 EE IIT, Kharagpur 15
systems, the different events of interest arise at the geographically separate locations. Therefore, these events may often have to be handled locally and responses produced to them to prevent overloading of the underlying communication network. Therefore, the sensors and the actuators may be located at the places where events are generated. An example of such a system is a petroleum refinery plant distributed over a large geographic area. At each data source, it makes good design sense to locally process the data before being passed on to a central processor. Many distributed as well as centralized real-time systems have a feedback structure as shown in Fig. 28.9. In these systems, the sensors usually sense the environment periodically. The sensed data about the environment is processed to determine the corrective actions necessary. The results of the processing are used to carry out the necessary corrective actions on the environment through the actuators, which in turn again cause a change to the required characteristics of the controlled environment, and so on.
Computation
Environment Fig. 28.9 Feedback Structure of Real-Time Systems 7. Task Criticality: Task criticality is a measure of the cost of failure of a task. Task criticality is determined by examining how critical are the results produced by the task to the proper functioning of the system. A real-time system may have tasks of very different criticalities. It is therefore natural to expect that the criticalities of the different tasks must be taken into consideration while designing for fault-tolerance. The higher the criticality of a task, the more reliable it should be made. Further, in the event of a failure of a highly critical task, immediate failure detection and recovery are important. However, it should be realized that task priority is a different concept and task criticality does not solely determine the task priority or the order in which various tasks are to be executed (these issues shall be elaborated in the later chapters). 8. Custom Hardware: A real-time system is often implemented on custom hardware that is specifically designed and developed for the purpose. For example, a cell phone does not use traditional microprocessors. Cell phones use processors which are tiny, supporting only those processing capabilities that are really necessary for cell phone operation and specifically designed to be power-efficient to conserve battery life. The capabilities of the processor used in a cell phone are substantially different from that of a general purpose processor. Another example is the embedded processor in an MPFI car. In this case, the processor used need not be a powerful general purpose processor such as a Pentium or an Athlon processor. Some of the most powerful computers used in MPFI engines are 16- or 32-bit processors running at approximately 40 MHz. However, unlike the conventional PCs, a processor used Version 2 EE IIT, Kharagpur 16
in these car engines do not deal with processing frills such as screen-savers or a dozen of different applications running at the same time. All that the processor in an MPFI system needs to do is to compute the required fuel injection rate that is most efficient for a given speed and acceleration. 9. Reactive: Real-time systems are often reactive. A reactive system is one in which an ongoing interaction between the computer and the environment is maintained. Ordinary systems compute functions on the input data to generate the output data (See Fig. 28.10 (a)). In other words, traditional systems compute the output data as some function of the input data. That is, output data can mathematically be expressed as: output data = (input data). For example, if some data I1 is given as the input, the system computes O1 as the result O1 = (I1). To elaborate this concept, consider an example involving library automation software. In a library automation software, when the query book function is invoked and Real-Time Systems is entered as the input book name, then the software displays Author name: R. Mall, Rack Number: 001, Number of Copies: 1.
Input data
Output data
Starting Parameters
Reactive System
(b)
In contrast to the traditional computation of the output as a simple function of the input data, real-time systems do not produce any output data but enter into an on-going interaction with their environment. In each interaction step, the results computed are used to carry out some actions on the environment. The reaction of the environment is sampled and is fed back to the system. Therefore the computations in a real-time system can be considered to be non-terminating. This reactive nature of real-time systems is schematically shown in the Fig. 28.10(b). 10. Stability: Under overload conditions, real-time systems need to continue to meet the deadlines of the most critical tasks, though the deadlines of non-critical tasks may not be met. This is in contrast to the requirement of fairness for traditional systems even under overload conditions. 11. Exception Handling: Many real-time systems work round-the-clock and often operate without human operators. For example, consider a small automated chemical plant that is set up to work non-stop. When there are no human operators, taking corrective actions on a failure becomes difficult. Even if no corrective actions can be immediate taken, it is desirable that a failure does not result in catastrophic situations. A failure should be detected and the system should continue to operate in a gracefully degraded mode rather than shutting off abruptly.
can only be ensured through increased reliability. It should now be clear why safety-critical systems need to be highly reliable. Just to give an example of the level of reliability required of safety-critical systems, consider the following. For any fly-by-wire aircraft, most of its vital parts are controlled by a computer. Any failure of the controlling computer is clearly not acceptable. The standard reliability requirement for such aircrafts is at most 1 failure per 109 flying hours (that is, a million years of continuous flying!). We examine how a highly reliable system can be developed in the next section.
Error Avoidance: For achieving high reliability, every possibility of occurrence of errors should be minimized during product development as much as possible. This can be achieved by adopting a variety of means: using well-founded software engineering practices, using sound design methodologies, adopting suitable CASE tools, and so on. Error Detection and Removal: In spite of using the best available error avoidance techniques, many errors still manage to creep into the code. These errors need to be detected and removed. This can be achieved to a large extent by conducting thorough reviews and testing. Once errors are detected, they can be easily fixed. Fault-Tolerance: No matter how meticulously error avoidance and error detection techniques are used, it is virtually impossible to make a practical software system entirely error-free. Few errors still persist even after carrying out thorough reviews and testing. Errors cause failures. That is, failures are manifestation of the errors latent in the system. Therefore to achieve high reliability, even in situations where errors are present, the system should be able to tolerate the faults and compute the correct results. This is called fault-tolerance. Fault-tolerance can be achieved by carefully incorporating redundancy. Legend: C1, C2, C3: Redundant copies of the same component C1 C2 V O T I N G
Majority Result
C3
It is relatively simple to design a hardware equipment to be fault-tolerant. The following are two methods that are popularly used to achieve hardware fault-tolerance:
Error Detection and Removal: In spite of using the best available error avoidance techniques, many errors still manage to creep into the code. These errors need to be detected and removed. This can be achieved to a large extent by conducting thorough reviews and testing. Once errors are detected, they can be easily fixed. Built In Self Test (BIST): In BIST, the system periodically performs self tests of its components. Upon detection of a failure, the system automatically reconfigures itself by switching out the faulty component and switching in one of the redundant good components. Triple Modular Redundancy (TMR): In TMR, as the name suggests, three redundant copies of all critical components are made to run concurrently (see Fig. 28.11). Observe that in Fig. 28.11, C1, C2, and C3 are the redundant copies of the same critical component. The system performs voting of the results produced by the redundant components to select the majority result. TMR can help tolerate occurrence of only a single failure at any time. (Can you answer why a TMR scheme can effectively tolerate a single component failure only?). An assumption that is implicit in the TMR technique is that at any time only one of the three redundant components can produce erroneous results. The majority result after voting would be erroneous if two or more components can fail simultaneously (more precisely, before a repair can be carried out). In situations where two or more components are likely to fail (or produce erroneous results), then greater amounts of redundancies would be required to be incorporated. A little thinking can show that at least 2n+1 redundant components are required to tolerate simultaneous failures of n component.
As compared to hardware, software fault-tolerance is much harder to achieve. To investigate the reason behind this, let us first discuss the techniques currently being used to achieve software fault-tolerance. We do this in the following subsection.
to statistical correlation of failures. Statistical correlation of failures means that even though individual teams worked in isolation to develop the different versions of a software component, still the different versions fail for identical reasons. In other words, the different versions of a component show similar failure patterns. This does not mean that the different modules developed by independent programmers, after all, contain identical errors. The reason for this is not far to seek, programmers commit errors in those parts of a problem which they perceive to be difficult - and what is difficult to one team is usually difficult to all teams. So, identical errors remain in the most complex and least understood parts of a software component. Recovery Blocks: In the recovery block scheme, the redundant components are called try blocks. Each try block computes the same end result as the others but is intentionally written using a different algorithm compared to the other try blocks. In N-version programming, the different versions of a component are written by different teams of programmers, whereas in recovery block different algorithms are used in different try blocks. Also, in contrast to the Nversion programming approach where the redundant copies are run concurrently, in the recovery block approach they are (as shown in Fig. 28.12) run one after another. The results produced by a try block are subjected to an acceptance test (see Fig. 28.12). If the test fails, then the next try block is tried. This is repeated in a sequence until the result produced by a try block successfully passes the acceptance test. Note that in Fig. 28.12 we have shown acceptance tests separately for different try blocks to help understand that the tests are applied to the try blocks one after the other, though it may be the case that the same test is applied to each try block. Legend: TB: try block
Component
Input
TB2
TB3
TB4 Result
Exception
test
test
test Failure
Success Failure
Success Result
Fig. 28.12 A Software Fault-Tolerance Scheme Using Recovery Blocks As was the case with N-version programming, the recovery blocks approach also does not achieve much success in providing effective fault-tolerance. The reason behind this is again statistical correlation of failures. Different try blocks fail for identical reasons as was explained in case of N-version programming approach. Besides, this approach suffers from a further limitation that it can only be used if the task deadlines are much larger than the task computation times (i.e. tasks have large laxity), since the different try blocks are put to execution one after the other when failures occur. The recovery block approach poses special difficulty when used with real-time tasks with very short slack time (i.e. short deadline and considerable execution time), Version 2 EE IIT, Kharagpur 21
as the try blocks are tried out one after the other deadlines may be missed. Therefore, in such cases the later try-blocks usually contain only skeletal code. Check points Acceptance test Progress of computation Rollback recovery Fig. 28.13 Checkpointing and Rollback Recovery Of course, it is possible that the later try blocks contain only skeletal code, produce only approximate results and therefore take much less time for computation than the first try block. Checkpointing and Rollback Recovery: Checkpointing and roll-back recovery is another popular technique to achieve fault-tolerance. In this technique as the computation proceeds, the system state is tested each time after some meaningful progress in computation is made. Immediately after a state-check test succeeds, the state of the system is backed up on a stable storage (see Fig. 28.13). In case the next test does not succeed, the system can be made to rollback to the last checkpointed state. After a rollback, from a checkpointed state a fresh computation can be initiated. This technique is especially useful, if there is a chance that the system state may be corrupted as the computation proceeds, such as data corruption or processor failure.
An example of a system having hard real-time tasks is a robot. The robot cyclically carries out a number of activities including communication with the host system, logging all completed activities, sensing the environment to detect any obstacles present, tracking the objects of interest, path planning, effecting next move, etc. Now consider that the robot suddenly encounters an obstacle. The robot must detect it and as soon as possible try to escape colliding with it. If it fails to respond to it quickly (i.e. the concerned tasks are not completed before the required time bound) then it would collide with the obstacle and the robot would be considered to have failed. Therefore detecting obstacles and reacting to it are hard real-time tasks. Another application having hard real-time tasks is an anti-missile system. An anti-missile system consists of the following critical activities (tasks). An anti-missile system must first detect all incoming missiles, properly position the anti-missile gun, and then fire to destroy the incoming missile before the incoming missile can do any damage. All these tasks are hard realtime in nature and the anti-missile system would be considered to have failed, if any of its tasks fails to complete before the corresponding deadlines. Applications having hard real-time tasks are typically safety-critical (Can you think an example of a hard real-time system that is not safety-critical?1 ) This means that any failure of a real-time task, including its failure to meet the associated deadlines, would result in severe consequences. This makes hard real-time tasks extremely critical. Criticality of a task can range from extremely critical to not so critical. Task criticality therefore is a different dimension than hard or soft characterization of a task. Criticality of a task is a measure of the cost of a failure the higher the cost of failure, the more critical is the task. For hard real-time tasks in practical systems, the time bounds usually range from several micro seconds to a few milli seconds. It may be noted that a hard real-time task does not need to be completed within the shortest time possible, but it is merely required that the task must complete within the specified time bound. In other words, there is no reward in completing a hard real-time task much ahead of its deadline. This is an important observation and this would take a central part in our discussions on task scheduling in the next two chapters.
Some computer games have hard real-time tasks; these are not safety-critical though. Whenever a timing constraint is not met, the game may fail, but the failure may at best be a mild irritant to the user.
Response Time
Fig. 28.14 Utility of Result of a Firm Real-Time Task with Time Firm real-time tasks typically abound in multimedia applications. The following are two examples of firm real- time tasks:
Video conferencing: In a video conferencing application, video frames and the accompanying audio are converted into packets and transmitted to the receiver over a network. However, some frames may get delayed at different nodes during transit on a packet-switched network due to congestion at different nodes. This may result in varying queuing delays experienced by packets traveling along different routes. Even when packets traverse the same route, some packets can take much more time than the other packets due to the specific transmission strategy used at the nodes. When a certain frame is being played, if some preceding frame arrives at the receiver, then this frame is of no use and is discarded. Due to this reason, when a frame is delayed by more than say one second, it is simply discarded at the receiver-end without carrying out any processing on it. Satellite-based tracking of enemy movements: Consider a satellite that takes pictures of an enemy territory and beams it to a ground station computer frame by frame. The ground computer processes each frame to find the positional difference of different objects of interest with respect to their position in the previous frame to determine the movements of the enemy. When the ground computer is overloaded, a new image may be received even before an older image is taken up for processing. In this case, the older image is of not much use. Hence the older images may be discarded and the recently received image could be processed.
For firm real-time tasks, the associated time bounds typically range from a few milli seconds to several hundreds of milli seconds.
0 Response Time Fig. 28.15 Utility of the Results Produced by a Soft Real-Time Task as a Function of Time Version 2 EE IIT, Kharagpur 24
1.8. Exercises
1. State whether you consider the following statements to be TRUE or FALSE. Justify your answer in each case. a. A hard real-time application is made up of only hard real-time tasks. b. Every safety-critical real-time system has a fail-safe state. c. A deadline constraint between two stimuli can be considered to be a behavioral constraint on the environment of the system. d. Hardware fault-tolerance techniques can easily be adapted to provide software faulttolerance. e. A good algorithm for scheduling hard real-time tasks must try to complete each task in the shortest time possible. f. All hard real-time systems are safety-critical in nature. g. Performance constraints on a real-time system ensure that the environment of the system is well-behaved. h. Soft real-time tasks are those which do not have any time bounds associated with them. i. Minimization of average task response times is the objective of any good hard realtime task-scheduling algorithm. j. It should be the goal of any good real-time operating system to complete every hard real-time task as ahead of its deadline as possible. What do you understand by the term real-time? How is the concept of real-time different from the traditional notion of time? Explain your answer using a suitable example. Using a block diagram show the important hardware components of a real-time system and their interactions. Explain the roles of the different components. In a real-time system, raw sensor signals need to be preprocessed before they can be used by a computer. Why is it necessary to preprocess the raw sensor signals before they can be used by a computer? Explain the different types of preprocessing that are normally carried out on sensor signals to make them suitable to be used directly by a computer. Identify the key differences between hard real-time, soft real-time, and firm realtime systems. Give at least one example of real-time tasks corresponding to these three categories. Identify the timing constraints in your tasks and justify why the tasks should be categorized into the categories you have indicated. Give an example of a soft real-time task and a non-real-time task. Explain the key difference between the characteristics of these two types of tasks. Draw a schematic model showing the important components of a typical hard real-time system. Explain the working of the input interface using a suitable schematic diagram. Explain using a suitable circuit diagram how analog to digital (ADC) conversion is achieved in an input interface. Explain the check pointing and rollback recovery scheme to provide fault-tolerant realtime computing. Explain the types of faults it can help tolerate and the faults it can not tolerate. Explain the situations in which this technique is useful. Answer the following questions concerning fault-tolerance of real-time systems. a. Explain why hardware fault-tolerance is easier to achieve compared to software faulttolerance. b. Explain the main techniques available to achieve hardware fault-tolerance. Version 2 EE IIT, Kharagpur 26
2.
3. 4.
5.
6. 7.
8.
9.
10. 11.
12.
13.
14.
15.
16.
17.
18.
What are the main techniques available to achieve software fault-tolerance? What are the shortcomings of these techniques? What do you understand by the fail-safe state of a system? Safety-critical real-time systems do not have a fail-safe state. What is the implication of this? Is it possible to have an extremely safe but unreliable system? If your answer is affirmative, then give an example of such a system. If you answer in the negative, then justify why it is not possible for such a system to exist. What is a safety-critical system? Give a few practical examples safety-critical hard realtime systems. Are all hard real-time systems safety-critical? If not, give at least one example of a hard real-time system that is not safety-critical. Explain with the help of a schematic diagram how the recovery block scheme can be used to achieve fault- tolerance of real-time tasks. What are the shortcomings of this scheme? Explain situations where it can be satisfactorily be used and situations where it can not be used. Identify and represent the timing constraints in the following air-defense system by means of an extended state machine diagram. Classify each constraint into either performance or behavioral constraint. Every incoming missile must be detected within 0.2 seconds of its entering the radar coverage area. The intercept missile should be engaged within 5 seconds of detection of the target missile. The intercept missile should be fired after 0.1 seconds of its engagement but no later than 1 sec. Represent a washing machine having the following specification by means of an extended state machine diagram. The wash-machine waits for the start switch to be pressed. After the user presses the start switch, the machine fills the wash tub with either hot or cold water depending upon the setting of the Hot Wash switch. The water filling continues until the high level is sensed. The machine starts the agitation motor and continues agitating the wash tub until either the preset timer expires or the user presses the stop switch. After the agitation stops, the machine waits for the user to press the start Drying switch. After the user presses the start Drying switch, the machine starts the hot air blower and continues blowing hot air into the drying chamber until either the user presses the stop switch or the preset timer expires. Represent the timing constraints in a collision avoidance task in an air surveillance system as an extended finite state machine (EFSM) diagram. The collision avoidance task consists of the following activities. a. The first subtask named radar signal processor processes the radar signal on a signal processor to generate the track record in terms of the targets location and velocity within 100 mSec of receipt of the signal. b. The track record is transmitted to the data processor within 1 mSec after the track record is determined. c. A subtask on the data processor correlates the received track record with the track records of other targets that come close to detect potential collision that might occur within the next 500 mSec. d. If a collision is anticipated, then the corrective action is determined within 10 mSec by another subtask running on the data processor. e. The corrective action is transmitted to the track correction task within 25 mSec. Consider the following (partial) specification of a real-time system: The velocity of a space-craft must be sampled by a computer on-board the space-craft at least once every second (the sampling event is denoted by S). After sampling the velocity, the current position is computed (denoted by event C) within 100msec. Concurrently, the Version 2 EE IIT, Kharagpur 27
c.
19.
expected position of the space-craft is retrieved from the database within 200msec (denoted by event R). Using these data, the deviation from the normal course of the spacecraft must be determined within 100 msec (denoted by event D) and corrective velocity adjustments must be carried out before a new velocity value is sampled in (the velocity adjustment event is denoted by A). Calculated positions must be transmitted to the earth station at least once every minute (position transmission event is denoted by the event T). Identify the different timing constraints in the system. Classify these into either performance or behavioral constraints. Construct an EFSM to model the system. Construct the EFSM model of a telephone system whose (partial) behavior is described below: After lifting the receiver handset, the dial tone should appear within 20 seconds. If a dial tone can not be given within 20 seconds, then an idle tone is produced. After the dial tone appears, the first digit should to be dialed within 10 seconds and the subsequent five digits within 5 seconds of each other. If the dialing of any of the digits is delayed, then an idle tone is produced. The idle tone continues until the receiver handset is replaced.
Module 6
Embedded System Software
Version 2 EE IIT, Kharagpur 1
Lesson 29
Real-Time Task Scheduling Part 1
Version 2 EE IIT, Kharagpur 2
Absolute deadline of Ti (1) =+d Relative deadline of Ti(1) =d 0 Arrival of Ti (1) Ti(1) +d Deadline of Ti (1) Ti(2) + pi
Fig. 29.1 Relative and Absolute Deadlines of a Task Relative Deadline versus Absolute Deadline: The absolute deadline of a task is the absolute time value (counted from time 0) by which the results from the task are expected. Thus, absolute deadline is equal to the interval of time between the time 0 and the actual instant at which the deadline occurs as measured by some physical clock. Whereas, relative deadline is the time interval between the start of the task and the instant at which deadline occurs. In other words, relative deadline is the time interval between the arrival of a task and the corresponding deadline. The difference between relative and absolute deadlines is illustrated in Fig. 29.1. It can be observed from Fig. 29.1 that the relative deadline of the task Ti(1) is d, whereas its absolute deadline is + d. Response Time: The response time of a task is the time it takes (as measured from the task arrival time) for the task to produce its results. As already remarked, task instances get generated
due to occurrence of events. These events may be internal to the system, such as clock interrupts, or external to the system such as a robot encountering an obstacle. The response time is the time duration from the occurrence of the event generating the task to the time the task produces its results. For hard real-time tasks, as long as all their deadlines are met, there is no special advantage of completing the tasks early. However, for soft real-time tasks, average response time of tasks is an important metric to measure the performance of a scheduler. A scheduler for soft realtime tasks should try to execute the tasks in an order that minimizes the average response time of tasks. Task Precedence: A task is said to precede another task, if the first task must complete before the second task can start. When a task Ti precedes another task Tj, then each instance of Ti precedes the corresponding instance of Tj. That is, if T1 precedes T2, then T1(1) precedes T2(1), T1(2) precedes T2(2), and so on. A precedence order defines a partial order among tasks. Recollect from a first course on discrete mathematics that a partial order relation is reflexive, antisymmetric, and transitive. An example partial ordering among tasks is shown in Fig. 29.2. Here T1 precedes T2, but we cannot relate T1 with either T3 or T4. We shall later use task precedence relation to develop appropriate task scheduling algorithms. T2 T1
T4
T3
Fig. 29.2 Precedence Relation among Tasks Data Sharing: Tasks often need to share their results among each other when one task needs to share the results produced by another task; clearly, the second task must precede the first task. In fact, precedence relation between two tasks sometimes implies data sharing between the two tasks (e.g. first task passing some results to the second task). However, this is not always true. A task may be required to precede another even when there is no data sharing. For example, in a chemical plant it may be required that the reaction chamber must be filled with water before chemicals are introduced. In this case, the task handling filling up the reaction chamber with water must complete, before the task handling introduction of the chemicals is activated. It is therefore not appropriate to represent data sharing using precedence relation. Further, data sharing may occur not only when one task precedes the other, but might occur among truly concurrent tasks, and overlapping tasks. In other words, data sharing among tasks does not necessarily impose any particular ordering among tasks. Therefore, data sharing relation among tasks needs to be represented using a different symbol. We shall represent data sharing among two tasks using a dashed arrow. In the example of data sharing among tasks represented in Fig. 29.2, T2 uses the results of T3, but T2 and T3 may execute concurrently. T2 may even Version 2 EE IIT, Kharagpur 5
start executing first, after sometimes it may receive some data from T3, and continue its execution, and so on.
ei = 2000 di
+ pi
+ 2*pi
Fig. 29.3 Track Correction Task (2000mSec; pi; ei; di) of a Rocket To illustrate the above notation to represent real-time periodic tasks, let us consider the track correction task typically found in a rocket control software. Assume the following characteristics of the track correction task. The track correction task starts 2000 milliseconds after the launch of the rocket, and recurs periodically every 50 milliseconds then on. Each instance of the task requires a processing time of 8 milliseconds and its relative deadline is 50 milliseconds. Recall that the phase of a task is defined by the occurrence time of the first instance of the task. Therefore, the phase of this task is 2000 milliseconds. This task can formally be represented as (2000 mSec, 50 mSec, 8 mSec, 50 mSec). This task is pictorially shown in Fig. 29.3. When the deadline of a task equals its period (i.e. pi=di), we can omit the fourth tuple. In this case, we can represent the task as Ti= (2000 mSec, 50 mSec, 8 mSec). This would automatically mean pi=di=50 mSec. Similarly, when i = 0, it can be omitted when no confusion arises. So, Ti = (20mSec; 100mSec) would indicate a task with i = 0, pi=100mSec, ei=20mSec, and di=100mSec. Whenever there is any scope for confusion, we shall explicitly write out the parameters Ti = (pi=50 mSecs, ei = 8 mSecs, di = 40 mSecs), etc. Version 2 EE IIT, Kharagpur 6
A vast majority of the tasks present in a typical real-time system are periodic. The reason for this is that many activities carried out by real-time systems are periodic in nature, for example monitoring certain conditions, polling information from sensors at regular intervals to carry out certain action at regular intervals (such as drive some actuators). We shall consider examples of such tasks found in a typical chemical plant. In a chemical plant several temperature monitors, pressure monitors, and chemical concentration monitors periodically sample the current temperature, pressure, and chemical concentration values which are then communicated to the plant controller. The instances of the temperature, pressure, and chemical concentration monitoring tasks normally get generated through the interrupts received from a periodic timer. These inputs are used to compute corrective actions required to maintain the chemical reaction at a certain rate. The corrective actions are then carried out through actuators. Sporadic Task: A sporadic task is one that recurs at random instants. A sporadic task Ti can be is represented by a three tuple: Ti = (ei, gi, di) where ei is the worst case execution time of an instance of the task, gi denotes the minimum separation between two consecutive instances of the task, di is the relative deadline. The minimum separation (gi) between two consecutive instances of the task implies that once an instance of a sporadic task occurs, the next instance cannot occur before gi time units have elapsed. That is, gi restricts the rate at which sporadic tasks can arise. As done for periodic tasks, we shall use the convention that the first instance of a sporadic task Ti is denoted by Ti(1) and the successive instances by Ti(2), Ti(3), etc. Many sporadic tasks such as emergency message arrivals are highly critical in nature. For example, in a robot a task that gets generated to handle an obstacle that suddenly appears is a sporadic task. In a factory, the task that handles fire conditions is a sporadic task. The time of occurrence of these tasks can not be predicted. The criticality of sporadic tasks varies from highly critical to moderately critical. For example, an I/O device interrupt, or a DMA interrupt is moderately critical. However, a task handling the reporting of fire conditions is highly critical. Aperiodic Task: An aperiodic task is in many ways similar to a sporadic task. An aperiodic task can arise at random instants. However, in case of an aperiodic task, the minimum separation gi between two consecutive instances can be 0. That is, two or more instances of an aperiodic task might occur at the same time instant. Also, the deadline for an aperiodic tasks is expressed as either an average value or is expressed statistically. Aperiodic tasks are generally soft real-time tasks. It is easy to realize why aperiodic tasks need to be soft real-time tasks. Aperiodic tasks can recur in quick succession. It therefore becomes very difficult to meet the deadlines of all instances of an aperiodic task. When several aperiodic tasks recur in a quick succession, there is a bunching of the task instances and it might lead to a few deadline misses. As already discussed, soft real-time tasks can tolerate a few deadline misses. An example of an aperiodic task is a logging task in a distributed system. The logging task can be started by different tasks running on different nodes. The logging requests from different tasks may arrive at the logger almost at the same time, or the requests may be spaced out in time. Other examples of aperiodic tasks include operator requests, keyboard presses, mouse movements, etc. In fact, all interactive commands issued by users are handled by aperiodic tasks.
Preemptive Scheduler: A preemptive scheduler is one which when a higher priority task arrives, suspends any lower priority task that may be executing and takes up the higher priority task for execution. Thus, in a preemptive scheduler, it can not be the case that a higher priority task is ready and waiting for execution, and the lower priority task is executing. A preempted lower priority task can resume its execution only when no higher priority task is ready. Utilization: The processor utilization (or simply utilization) of a task is the average time for which it executes per unit time interval. In notations: for a periodic task Ti, the utilization ui = ei/pi, where ei is the execution time and pi is the period of Ti. For a set of periodic tasks {Ti}: the n total utilization due to all tasks U = i=1 ei/pi. It is the objective of any good scheduling algorithm to feasibly schedule even those task sets that have very high utilization, i.e. utilization approaching 1. Of course, on a uniprocessor it is not possible to schedule task sets having utilization more than 1. Jitter: Jitter is the deviation of a periodic task from its strict periodic behavior. The arrival time jitter is the deviation of the task from arriving at the precise periodic time of arrival. It may be caused by imprecise clocks, or other factors such as network congestions. Similarly, completion time jitter is the deviation of the completion of a task from precise periodic points. The completion time jitter may be caused by the specific scheduling algorithm employed which takes up a task for scheduling as per convenience and the load at an instant, rather than scheduling at some strict time instants. Jitters are undesirable for some applications.
Important members of clock-driven schedulers that we discuss in this text are table-driven and cyclic schedulers. Clock-driven schedulers are simple and efficient. Therefore, these are frequently used in embedded applications. We investigate these two schedulers in some detail in Sec. 2.5. Important examples of event-driven schedulers are Earliest Deadline First (EDF) and Rate Monotonic Analysis (RMA). Event-driven schedulers are more sophisticated than clock-driven schedulers and usually are more proficient and flexible than clock-driven schedulers. These are more proficient because they can feasibly schedule some task sets which clock-driven schedulers cannot. These are more flexible because they can feasibly schedule sporadic and aperiodic tasks in addition to periodic tasks, whereas clock-driven schedulers can satisfactorily handle only periodic tasks. Event-driven scheduling of real-time tasks in a uniprocessor environment was a subject of intense research during early 1970s, leading to publication of a large number of research results. Out of the large number of research results that were published, the following two popular algorithms are the essence of all those results: Earliest Deadline First (EDF), and Rate Monotonic Analysis (RMA). If we understand these two schedulers well, we would get a good grip on real-time task scheduling on uniprocessors. Several variations to these two basic algorithms exist. Another classification of real-time task scheduling algorithms can be made based upon the type of task acceptance test that a scheduler carries out before it takes up a task for scheduling. The acceptance test is used to decide whether a newly arrived task would at all be taken up for scheduling or be rejected. Based on the task acceptance test used, there are two broad categories of task schedulers: Planning-based Best effort In planning-based schedulers, when a task arrives the scheduler first determines whether the task can meet its dead- lines, if it is taken up for execution. If not, it is rejected. If the task can meet its deadline and does not cause other already scheduled tasks to miss their respective deadlines, then the task is accepted for scheduling. Otherwise, it is rejected. In best effort schedulers, no acceptance test is applied. All tasks that arrive are taken up for scheduling and best effort is made to meet its deadlines. But, no guarantee is given as to whether a tasks deadline would be met. A third type of classification of real-time tasks is based on the target platform on which the tasks are to be run. The different classes of scheduling algorithms according to this scheme are: Uniprocessor Multiprocessor Distributed Uniprocessor scheduling algorithms are possibly the simplest of the three classes of algorithms. In contrast to uniprocessor algorithms, in multiprocessor and distributed scheduling algorithms first a decision has to be made regarding which task needs to run on which processor and then these tasks are scheduled. In contrast to multiprocessors, the processors in a distributed system do not possess shared memory. Also in contrast to multiprocessors, there is no global upto-date state information available in distributed systems. This makes uniprocessor scheduling algorithms that assume central state information of all tasks and processors to exist unsuitable for use in distributed systems. Further in distributed systems, the communication among tasks is through message passing. Communication through message passing is costly. This means that a scheduling algorithm should not incur too much communication over- head. So carefully designed distributed algorithms are normally considered suitable for use in a distributed system. In the following sections, we study the different classes of schedulers in more detail. Version 2 EE IIT, Kharagpur 10
However, tasks often do have non-zero phase. It would be interesting to determine what would be the major cycle when tasks have non-zero phase. The result of an investigation into this issue has been given as Theorem 2.1.
1.5.2. Theorem 1
The major cycle of a set of tasks ST = {T1, T2, , Tn} is LCM ({p1, p2, , pn}) even when the tasks have arbitrary phasing. Proof: As per our definition of a major cycle, even when tasks have non-zero phasing, task instances would repeat the same way in each major cycle. Let us consider an example in which the occurrences of a task Ti in a major cycle be as shown in Fig. 29.4. As shown in the example of Fig. 29.4, there are k-1 occurrences of the task Ti during a major cycle. The first occurrence of Ti starts time units from the start of the major cycle. The major cycle ends x time units after the last (i.e. (k-1)th) occurrence of the task Ti in the major cycle. Of course, this must be the same in each major cycle. +x=pi M Ti(1) Ti(2) time Fig. 29.4 Major Cycle When a Task Ti has Non-Zero Phasing Assume that the size of each major cycle is M. Then, from an inspection of Fig. 29.4, for the task to repeat identically in each major cycle: M = (k-1)pi + + x (2.1) Now, for the task Ti to have identical occurrence times in each major cycle, + x must equal to pi (see Fig. 29.4). Substituting this in Expr. 2.1, we get, M = (k-1) pi + pi = k pi (2.2) So, the major cycle M contains an integral multiple of pi. This argument holds for each task in the task set irrespective of its phase. Therefore M = LCM ({p1, p2, , pn}). Ti(k-1) x Ti(k) Ti(k+1) M Ti(2k-1) x
temperature controller periodically samples the temperature of a room and maintains it at a preset value. Such temperature controllers are embedded in typical computer-controlled air conditioners. Major Cycle Minor Cycle f2 Major Cycle
f1
f3
f4
f4n
f4n+1
f4n+2
f4n+3
Fig. 29.5 Major and Minor Cycles in a Cyclic Scheduler A cyclic scheduler repeats a pre-computed schedule. The pre-computed schedule needs to be stored only for one major cycle. Each task in the task set to be scheduled repeats identically in every major cycle. The major cycle is divided into one or more minor cycles (see Fig. 29.5). Each minor cycle is also sometimes called a frame. In the example shown in Fig. 29.5, the major cycle has been divided into four minor cycles (frames). The scheduling points of a cyclic scheduler occur at frame boundaries. This means that a task can start executing only at the beginning of a frame. The frame boundaries are defined through the interrupts generated by a periodic timer. Each task is assigned to run in one or more frames. The assignment of tasks to frames is stored in a schedule table. An example schedule table is shown in Figure 29.6.
Task Number T3 T1 T3 T4 Frame Number f1 f2 f3 f4
Fig. 29.6 An Example Schedule Table for a Cyclic Scheduler The size of the frame to be used by the scheduler is an important design parameter and needs to be chosen very carefully. A selected frame size should satisfy the following three constraints. 1. Minimum Context Switching: This constraint is imposed to minimize the number of context switches occurring during task execution. The simplest interpretation of this constraint is that a task instance must complete running within its assigned frame. Unless a task completes within its allocated frame, the task might have to be suspended and restarted in a later frame. This would require a context switch involving some processing overhead. To avoid unnecessary context switches, the selected frame size should be larger than the execution time of each task, so that when a task starts at a frame boundary it should be able to complete within the same frame. Formally, we can state this constraint as: max({ei}) < F where ei is the execution times of the of task Ti, and F is the frame size. Note that this constraint imposes a lower-bound on frame size, i.e., frame size F must not be smaller than max({ei}). Version 2 EE IIT, Kharagpur 13
2. Minimization of Table Size: This constraint requires that the number of entries in the schedule table should be minimum, in order to minimize the storage requirement of the schedule table. Remember that cyclic schedulers are used in small embedded applications with a very small storage capacity. So, this constraint is important to the commercial success of a product. The number of entries to be stored in the schedule table can be minimized when the minor cycle squarely divides the major cycle. When the minor cycle squarely divides the major cycle, the major cycle contains an integral number of minor cycles (no fractional minor cycles). Unless the minor cycle squarely divides the major cycle, storing the schedule for one major cycle would not be sufficient, as the schedules in the major cycle would not repeat and this would make the size of the schedule table large. We can formulate this constraint as: M/F = M/F (2.3) In other words, if the floor of M/F equals M/F, then the major cycle would contain an integral number of frames. Task arrival t t d Deadline
kF
(k+1)F
(k+2)F
Fig. 29.7 Satisfaction of a Task Deadline 3. Satisfaction of Task Deadline: This third constraint on frame size is necessary to meet the task deadlines. This constraint imposes that between the arrival of a task and its deadline, there must exist at least one full frame. This constraint is necessary since a task should not miss its deadline, because by the time it could be taken up for scheduling, the deadline was imminent. Consider this: a task can only be taken up for scheduling at the start of a frame. If between the arrival and completion of a task, not even one frame exists, a situation as shown in Fig. 29.7 might arise. In this case, the task arrives sometimes after the kth frame has started. Obviously it can not be taken up for scheduling in the kth frame and can only be taken up in the k+1th frame. But, then it may be too late to meet its deadline since the execution time of a task can be up to the size of a full frame. This might result in the task missing its deadline since the task might complete only at the end of (k+1)th frame much after the deadline d has passed. We therefore need a full frame to exist between the arrival of a task and its deadline as shown in Fig. 29.8, so that task deadlines could be met.
t Task arrival t d
Deadline
kF
(k+1) F (k+2)F
Fig. 29.8 A Full Frame Exists Between the Arrival and Deadline of a Task More formally, this constraint can be formulated as follows: Suppose a task arises after t time units have passed since the last frame (see Fig. 29.8). Then, assuming that a single frame is sufficient to complete the task, the task can complete before its deadline iff (2F t) di, or 2F (di + t). (2.4) Remember that the value of t might vary from one instance of the task to another. The worst case scenario (where the task is likely to miss its deadline) occurs for the task instance having the minimum value of t, such that t > 0. This is the worst case scenario, since under this the task would have to wait the longest before its execution can start. It should be clear that if a task arrives just after a frame has started, then the task would have to wait for the full duration of the current frame before it can be taken up for execution. If a task at all misses its deadline, then certainly it would be under such situations. In other words, the worst case scenario for a task to meet its deadline occurs for its instance that has the minimum separation from the start of a frame. The determination of the minimum separation value (i.e. min(t)) for a task among all instances of the task would help in determining a feasible frame size. We show by Theorem 2.2 that min(t) is equal to gcd(F, pi). Consequently, this constraint can be written as: for every Ti, 2F gcd(F, pi) di (2.5) Note that this constraint defines an upper-bound on frame size for a task Ti, i.e., if the frame size is any larger than the defined upper-bound, then tasks might miss their deadlines. Expr. 2.5 defined the frame size, from the consideration of one task only. Now considering all tasks, the frame size must be smaller than max(gcd(F, pi)+di)/2.
1.5.4. Theorem 2
The minimum separation of the task arrival from the corresponding frame start time (min(t)), considering all instances of a task Ti, is equal to gcd(F, pi). Proof: Let g = gcd(F, pi), where gcd is the function determining the greatest common divisor of its arguments. It follows from the definition of gcd that g must squarely divide each of F and pi. Let Ti be a task with zero phasing. Now, assume that this Theorem is violated for certain integers m and n, such that the Ti(n) occurs in the mth frame and the difference between Version 2 EE IIT, Kharagpur 15
the start time of the mth frame and the nth task arrival time is less than g. That is, 0 < (m F n pi) < g. Dividing this expression throughout by g, we get: 0 < (m F/g n pi/g) < 1 (2.6) However, F/g and pi/g are both integers because g is gcd(F, pi,). Therefore, we can write F/g = I1 and pi/g = I2 for some integral values I1 and I2. Substituting this in Expr 2.6, we get 0 < mI1 nI2 < 1. Since mI1 and nI2 are both integers, their difference cannot be a fractional value lying between 0 and 1. Therefore, this expression can never be satisfied. It can therefore be concluded that the minimum time between a frame boundary and the arrival of the corresponding instance of Ti can not be less than gcd(F, pi). For a given task set it is possible that more than one frame size satisfies all the three constraints. In such cases, it is better to choose the shortest frame size. This is because of the fact that the schedulability of a task set increases as more frames become available over a major cycle. It should however be remembered that the mere fact that a suitable frame size can be determined does not mean that a feasible schedule would be found. It may so happen that there is not enough number of frames available in a major cycle to be assigned to all the task instances. We now illustrate how an appropriate frame size can be selected for cyclic schedulers through a few examples.
1.5.5. Examples
Example 1: A cyclic scheduler is to be used to run the following set of periodic tasks on a uniprocessor: T1 = (e1=1, p1=4), T2 = (e2=, p2=5), T3 = (e3=1, p3=20), T4 = (e4=2, p4=20). Select an appropriate frame size. Solution: For the given task set, an appropriate frame size is the one that satisfies all the three required constraints. In the following, we determine a suitable frame size F which satisfies all the three required constraints. Constraint 1: Let F be an appropriate frame size, then max {ei, F}. From this constraint, we get F 1.5. Constraint 2: The major cycle M for the given task set is given by M = LCM(4,5,20) = 20. M should be an integral multiple of the frame size F, i.e., M mod F = 0. This consideration implies that F can take on the values 2, 4, 5, 10, 20. Frame size of 1 has been ruled out since it would violate the constraint 1. Constraint 3: To satisfy this constraint, we need to check whether a selected frame size F satisfies the inequality: 2F gcd(F, pi) < di for each pi. Let us first try frame size 2. For F = 2 and task T1: 2 2 gcd(2, 4) 4 4 2 4 Therefore, for p1 the inequality is satisfied. Let us try for F = 2 and task T2: 2 2 gcd(2, 5) 5 4 1 5 Therefore, for p2 the inequality is satisfied. Let us try for F = 2 and task T3: Version 2 EE IIT, Kharagpur 16
2 2 gcd(2, 20) 20 4 2 20 Therefore, for p3 the inequality is satisfied. For F = 2 and task T4: 2 2 gcd(2, 20) 20 4 2 20 For p4 the inequality is satisfied. Thus, constraint 3 is satisfied by all tasks for frame size 2. So, frame size 2 satisfies all the three constraints. Hence, 2 is a feasible frame size. Let us try frame size 4. For F = 4 and task T1: 2 4 gcd(4, 4) 4 8 4 4 Therefore, for p1 the inequality is satisfied. Let us try for F = 4 and task T2: 2 4 gcd(4, 5) 5 8 1 5 For p2 the inequality is not satisfied. Therefore, we need not look any further. Clearly, F = 4 is not a suitable frame size. Let us now try frame size 5, to check if that is also feasible. For F = 5 and task T1, we have 2 5 gcd(5, 4) 4 10 1 4 The inequality is not satisfied for T1. We need not look any further. Clearly, F = 5 is not a suitable frame size. Let us now try frame size 10. For F = 10 and task T1, we have 2 10 gcd(10, 4) 4 20 2 4 The inequality is not satisfied for T1. We need not look any further. Clearly, F=10 is not a suitable frame size. Let us try if 20 is a feasible frame size. For F = 20 and task T1, we have 2 20 gcd(20, 4) 4 40 4 4 Therefore, F = 20 is also not suitable. So, only the frame size 2 is suitable for scheduling. Even though for Example 1 we could successfully find a suitable frame size that satisfies all the three constraints, it is quite probable that a suitable frame size may not exist for many problems. In such cases, to find a feasible frame size we might have to split the task (or a few tasks) that is (are) causing violation of the constraints into smaller sub-tasks that can be scheduled in different frames. Example 2: Consider the following set of periodic real-time tasks to be scheduled by a cyclic scheduler: T1 = (e1=1, p1=4), T2 = (e2=2, p2=5), T3 = (e3=5, p3=20). Determine a suitable frame size for the task set. Solution: Using the first constraint, we have F 5. Using the second constraint, we have the major cycle M = LCM(4, 5, 20) = 20. So, the permissible values of F are 5, 10 and 20. Checking for a frame size that satisfies the third constraint, we can find that no value of F is suitable. To overcome this problem, we need to split the task that is making the task-set not Version 2 EE IIT, Kharagpur 17
schedulable. It is easy to observe that the task T3 has the largest execution time, and consequently due to constraint 1, makes the feasible frame sizes quite large. We try splitting T3 into two or three tasks. After splitting T3 into three tasks, we have: T3.1 = (20, 1, 20), T3.2 = (20, 2, 20), T3.3 = (20, 2, 20). The possible values of F now are 2 and 4. We can check that now after splitting the tasks, F=2 and F=4 become feasible frame sizes. It is very difficult to come up with a clear set of guidelines to identify the exact task that is to be split, and the parts into which it needs to be split. Therefore, this needs to be done by trial and error. Further, as the number of tasks to be scheduled increases, this method of trial and error becomes impractical since each task needs to be checked separately. However, when the task set consists of only a few tasks we can easily apply this technique to find a feasible frame size for a set of tasks otherwise not schedulable by a cyclic scheduler.
and if required the sporadic tasks have already been subjected to an acceptance test and only those which have passed the test are available for scheduling.
cyclic-scheduler() { current-task T = Schedule-Table[k]; k = k + 1; k = k mod N; //N is the total number of tasks in the schedule table dispatch-current-task(T); schedule-sporadic-tasks(); //Current task T completed early, // sporadic tasks can be taken up schedule-aperiodic-tasks(); //At the end of the frame, the running task // is pre-empted if not complete idle(); //No task to run, idle }
The cyclic scheduler routine cyclic-scheduler () is activated at the end of every frame by a periodic timer. If the current task is not complete by the end of the frame, then it is suspended and the task to be run in the next frame is dispatched by invoking the routine cyclic-scheduler(). If the task scheduled in a frame completes early, then any existing sporadic or aperiodic task is taken up for execution.
1.6. Exercises
1. State whether the following assertions are True or False. Write one or two sentences to justify your choice in each case. a. Average response time is an important performance metric for real-time operating systems handling running of hard real-time tasks. b. Unlike table-driven schedulers, cyclic schedulers do not require to store a precomputed schedule. Version 2 EE IIT, Kharagpur 19
2.
3. 4.
5.
c. The minimum period for which a table-driven scheduler scheduling n periodic tasks needs to pre-store the schedule is given by max{p1, p2, , pn}, where pi is the period of the task Ti. d. A cyclic scheduler is more proficient than a pure table-driven scheduler for scheduling a set of hard real-time tasks. e. A suitable figure of merit to compare the performance of different hard real-time task scheduling algorithms can be the average task response times resulting from each algorithm. f. Cyclic schedulers are more proficient than table-driven schedulers. g. While using a cyclic scheduler to schedule a set of real-time tasks on a uniprocessor, when a suitable frame size satisfying all the three required constraints has been found, it is guaranteed that the task set would be feasibly scheduled by the cyclic scheduler. h. When more than one frame satisfies all the constraints on frame size while scheduling a set of hard real-time periodic tasks using a cyclic scheduler, the largest of these frame sizes should be chosen. i. In table-driven scheduling of three periodic tasks T1,T2,T3, the scheduling table must have schedules for all tasks drawn up to the time interval [0,max(p1,p2,p3)], where pi is the period of the task Ti. j. When a set of hard real-time periodic tasks are being scheduled using a cyclic scheduler, if a certain frame size is found to be not suitable, then any frame size smaller than this would not also be suitable for scheduling the tasks. k. When a set of hard real-time periodic tasks are being scheduled using a cyclic scheduler, if a candidate frame size exceeds the execution time of every task and squarely divides the major cycle, then it would be a suitable frame size to schedule the given set of tasks. l. Finding an optimal schedule for a set of independent periodic hard real-time tasks without any resource- sharing constraints under static priority conditions is an NPcomplete problem. Real-time tasks are normally classified into periodic, aperiodic, and sporadic real-time task. a. What are the basic criteria based on which a real-time task can be determined to belong to one of the three categories? b. Identify some characteristics that are unique to each of the three categories of tasks. c. Give examples of tasks in practical systems which belong to each of the three categories. What do you understand by an optimal scheduling algorithm? Is it true that the time complexity of an optimal scheduling algorithm for scheduling a set of real-time tasks in a uniprocessor is prohibitively expensive to be of any practical use? Explain your answer. Suppose a set of three periodic tasks is to be scheduled using a cyclic scheduler on a uniprocessor. Assume that the CPU utilization due to the three tasks is less than 1. Also, assume that for each of the three tasks, the deadlines equals the respective periods. Suppose that we are able to find an appropriate frame size (without having to split any of the tasks) that satisfies the three constraints of minimization of context switches, minimization of schedule table size, and satisfaction of deadlines. Does this imply that it is possible to assert that we can feasibly schedule the three tasks using the cyclic scheduler? If you answer affirmatively, then prove your answer. If you answer negatively, then show an example involving three tasks that disproves the assertion. Consider a real-time system which consists of three tasks T1, T2, and T3, which have been characterized in the following table. Version 2 EE IIT, Kharagpur 20
Task T1 T2 T3
Phase mSec 20 40 70
Period mSec 20 50 80
6.
If the tasks are to be scheduled using a table-driven scheduler, what is the length of time for which the schedules have to be stored in the pre-computed schedule table of the scheduler. A cyclic real-time scheduler is to be used to schedule three periodic tasks T1, T2, and T3 with the following characteristics: Task T1 T2 T3 Phase mSec 0 0 0 Execution Time mSec 20 20 30 Relative Deadline mSec 100 80 150 Period mSec 100 80 150
7.
Suggest a suitable frame size that can be used. Show all intermediate steps in your calculations. Consider the following set of three independent real-time periodic tasks. Task T1 T2 T3 Start Time mSec 20 40 60 Processing Time mSec 25 10 50 Period mSec 150 50 200 Deadline mSec 100 30 150
Suppose a cyclic scheduler is to be used to schedule the task set. What is the major cycle of the task set? Suggest a suitable frame size and provide a feasible schedule (task to frame assignment for a major cycle) for the task set.
Module 6
Embedded System Software
Version 2 EE IIT, Kharagpur 1
Lesson 30
Real-Time Task Scheduling Part 2
Version 2 EE IIT, Kharagpur 2
of the background tasks in every unit of time is 1i=1 ei / pi. Hence, Expr. 2.7 follows easily. We now illustrate the applicability of Expr. 2.7 through the following three simple examples.
1.3. Examples
Example 1: Consider a real-time system in which tasks are scheduled using foregroundbackground scheduling. There is only one periodic foreground task Tf : (f =0, pf =50 msec, ef =100 msec, df =100 msec) and the background task be TB = (eB =1000 msec). Compute the completion time for background task.
B B
Solution: By using the expression (2.7) to compute the task completion time, we have ctB = 1000 / (150/100) = 2000 msec So, the background task TB would take 2000 milliseconds to complete.
B B
Example 2: In a simple priority-driven preemptive scheduler, two periodic tasks T1 and T2 and a background task are scheduled. The periodic task T1 has the highest priority and executes once every 20 milliseconds and requires 10 milliseconds of execution time each time. T2 requires 20 milliseconds of processing every 50 milliseconds. T3 is a background task and requires 100 milliseconds to complete. Assuming that all the tasks start at time 0, determine the time at which T3 will complete. Version 2 EE IIT, Kharagpur 4
Solution: The total utilization due to the foreground tasks: i=1 ei / pi = 10/20 + 20/50 = 90/100. This implies that the fraction of time remaining for the background task to execute is given by: 2 1i=1 ei / pi = 10/100. Therefore, the background task gets 1 millisecond every 10 milliseconds. Thus, the background task would take 10(100/1) = 1000 milliseconds to complete. Example 3: Suppose in Example 1, an overhead of 1 msec on account of every context switch is to be taken into account. Compute the completion time of TB.
B
Context Switching Time Back Foreground ground Foreground Back Foreground ground 01 51 52 100 Fig. 30.1 Task Schedule for Example 3 Solution: The very first time the foreground task runs (at time 0), it incurs a context switching overhead of 1 msec. This has been shown as a shaded rectangle in Fig. 30.1. Subsequently each time the foreground task runs, it preempts the background task and incurs one context switch. On completion of each instance of the foreground task, the background task runs and incurs another context switch. With this observation, to simplify our computation of the actual completion time of TB, we can imagine that the execution time of every foreground task is increased by two context switch times (one due to itself and the other due to the background task running after each time it completes). Thus, the net effect of context switches can be imagined to be causing the execution time of the foreground task to increase by 2 context switch times, i.e. to 52 milliseconds from 50 milliseconds. This has pictorially been shown in Fig. 30.1. Now, using Expr. 2.7, we get the time required by the background task to complete: 1000/(152/100) = 2083.4 milliseconds In the following two sections, we examine two important event-driven schedulers: EDF (Earliest Deadline First) and RMA (Rate Monotonic Algorithm). EDF is the optimal dynamic priority real-time task scheduling algorithm and RMA is the optimal static priority real-time task scheduling algorithm.
B
where ui is average utilization due to the task Ti and n is the total number of tasks in the task set. Expr. 3.2 is both a necessary and a sufficient condition for a set of tasks to be EDF schedulable. EDF has been proven to be an optimal uniprocessor scheduling algorithm. This means that, if a set of tasks is not schedulable under EDF, then no other scheduling algorithm can feasibly schedule this task set. In the simple schedulability test for EDF (Expr. 3.2), we assumed that the period of each task is the same as its deadline. However, in practical problems the period of a task may at times be different from its deadline. In such cases, the schedulability test needs to be changed. If pi > di, then each task needs ei amount of computing time every min(pi, di) duration of time. Therefore, we can rewrite Expr. 3.2 as: n ei / min(pi, di) 1 (3.3/2.9) i=1 However, if pi < di, it is possible that a set of tasks is EDF schedulable, even when the task set fails to meet the Expr 3.3. Therefore, Expr 3.3 is conservative when pi < di, and is not a necessary condition, but only a sufficient condition for a given task set to be EDF schedulable. Example 4: Consider the following three periodic real-time tasks to be scheduled using EDF on a uniprocessor: T1 = (e1=10, p1=20), T2 = (e2=5, p2=50), T3 = (e3=10, p3=35). Determine whether the task set is schedulable. Solution: The total utilization due to the three tasks is given by: 3 ei / pi = 10/20 + 5/50 + 10/35 = 0.89 i=1 This is less than 1. Therefore, the task set is EDF schedulable. Though EDF is a simple as well as an optimal algorithm, it has a few shortcomings which render it almost unusable in practical applications. The main problems with EDF are discussed in Sec. 3.4.3. Next, we discuss the concept of task priority in EDF and then discuss how EDF can be practically implemented.
queue would contain the absolute deadline of the task. At every preemption point, the entire queue would be scanned from the beginning to determine the task having the shortest deadline. However, this implementation would be very inefficient. Let us analyze the complexity of this scheme. Each task insertion will be achieved in O(1) or constant time, but task selection (to run next) and its deletion would require O(n) time, where n is the number of tasks in the queue. A more efficient implementation of EDF would be as follows. EDF can be implemented by maintaining all ready tasks in a sorted priority queue. A sorted priority queue can efficiently be implemented by using a heap data structure. In the priority queue, the tasks are always kept sorted according to the proximity of their deadline. When a task arrives, a record for it can be inserted into the heap in O(log2 n) time where n is the total number of tasks in the priority queue. At every scheduling point, the next task to be run can be found at the top of the heap. When a task is taken up for scheduling, it needs to be removed from the priority queue. This can be achieved in O(1) time. A still more efficient implementation of the EDF can be achieved as follows under the assumption that the number of distinct deadlines that tasks in an application can have are restricted. In this approach, whenever task arrives, its absolute deadline is computed from its release time and its relative deadline. A separate FIFO queue is maintained for each distinct relative deadline that tasks can have. The scheduler inserts a newly arrived task at the end of the corresponding relative deadline queue. Clearly, tasks in each queue are ordered according to their absolute deadlines. To find a task with the earliest absolute deadline, the scheduler only needs to search among the threads of all FIFO queues. If the number of priority queues maintained by the scheduler is Q, then the order of searching would be O(1). The time to insert a task would also be O(1).
Resource Sharing Problem: When EDF is used to schedule a set of real-time tasks, unacceptably high overheads might have to be incurred to support resource sharing among the tasks without making tasks to miss their respective deadlines. We examine this issue in some detail in the next lesson. Efficient Implementation Problem: The efficient implementation that we discussed in Sec. 3.4.2 is often not practicable as it is difficult to restrict the number of tasks with distinct deadlines to a reasonable number. The efficient implementation that achieves O(1) overhead assumes that the number of relative deadlines is restricted. This may be unacceptable in some situations. For a more flexible EDF algorithm, we need to keep the tasks ordered in terms of their deadlines using a priority queue. Whenever a task arrives, it is inserted into the priority queue. The complexity of insertion of an element into a priority queue is of the order log2 n, where n is the number of tasks to be scheduled. This represents a high runtime overhead, since most real-time tasks are periodic with small periods and strict deadlines.
Priority
Priority
Rate (a)
Period (b)
worst-case execution times and periods of the tasks. A pertinent question at this point is how can a system developer determine the worst-case execution time of a task even before the system is developed. The worst-case execution times are usually determined experimentally or through simulation studies. The following are some important criteria that can be used to check the schedulability of a set of tasks set under RMA.
1 ui 0.692 (1,0)
Number of tasks
Fig. 30.3 Achievable Utilization with the Number of Tasks under RMA Evaluation of Expr. 3.4 when n involves an indeterminate expression of the type .0. By applying LHospitals rule, we can verify that the right hand side of the expression evaluates to loge2 = 0.692. From the above computations, it is clear that the maximum CPU utilization that can be achieved under RMA is 1. This is achieved when there is only a single task in the system. As the number of tasks increases, the achievable CPU utilization falls and as n , the achievable utilization stabilizes at loge2, which is approximately 0.692. This is pictorially shown in Fig. 30.3. We now illustrate the applicability of the RMA schedulability criteria through a few examples.
1.5.2. Examples
Example 5: Check whether the following set of periodic real-time tasks is schedulable under RMA on a uniprocessor: T1 = (e1=20, p1=100), T2 = (e2=30, p2=150), T3 = (e3=60, p3=200). Solution: Let us first compute the total CPU utilization achieved due to the three given tasks. 3 ui = 20/100 + 30/150 + 60/200 = 0.7 i=1 This is less than 1; therefore the necessary condition for schedulability of the tasks is satisfied. Now checking for the sufficiency condition, the task set is schedulable under RMA if Liu and Laylands condition given by Expr. 3.4 is satisfied Checking for satisfaction of Expr. 3.4, the maximum achievable utilization is given by: 1/3 3(2 1) = 0.78 The total utilization has already been found to be 0.7. Now substituting these in Liu and Laylands criterion: 3 1/3 ui 3(2 1) i=1 Therefore, we get 0.7 < 0.78. Expr. 3.4, a sufficient condition for RMA schedulability, is satisfied. Therefore, the task set is RMA-schedulable Example 6: Check whether the following set of three periodic real-time tasks is schedulable under RMA on a uniprocessor: T1 = (e1=20, p1=100), T2 = (e2=30, p2=150), T3 = (e3=90, p3=200). Solution: Let us first compute the total CPU utilization due to the given task set: 3 ui = 20/100 + 30/150 + 90/200 = 0.7 i=1 Version 2 EE IIT, Kharagpur 10
Now checking for Liu and Layland criterion: 3 ui 0.78 i=1 Since 0.85 is not 0.78, the task set is not RMA-schedulable. Liu and Layland test (Expr. 2.10) is pessimistic in the following sense. If a task set passes the Liu and Layland test, then it is guaranteed to be RMA schedulable. On the other hand, even if a task set fails the Liu and Layland test, it may still be RMA schedulable. It follows from this that even when a task set fails Liu and Laylands test, we should not conclude that it is not schedulable under RMA. We need to test further to check if the task set is RMA schedulable. A test that can be per- formed to check whether a task set is RMA schedulable when it fails the Liu and Layland test is the Lehoczkys test. Lehoczkys test has been expressed as Theorem 3.
1.5.3. Theorem 3
A set of periodic real-time tasks is RMA schedulable under any task phasing, iff all the tasks meet their respective first deadlines under zero phasing.
T1 10
T2 30
T1 40
T2 60
T1 70
T2 90 time in msec
T2 20
T1 30
T2 50
T1 60
T2 80 time in msec
(b) T1 has a 20 msec phase with respect to T2 Fig. 30.4 Worst Case Response Time for a Task Occurs When It is in Phase with Its Higher Priority Tasks A formal proof of this Theorem is beyond the scope of this discussion. However, we provide an intuitive reasoning as to why Theorem 3 must be true. Intuitively, we can understand this result from the following reasoning. First let us try to understand the following fact.
The worst case response time for a task occurs when it is in phase with its higher To see why this statement must be true, consider the following statement. Under RMA whenever a higher priority task is ready, the lower priority tasks can not execute and have to wait. This implies that, a lower priority task will have to wait for the entire duration of execution of each higher priority task that arises during the execution of the lower priority task. More number of instances of a higher priority task will occur, when a task is in phase with it, when it is in phase with it rather than out of phase with it. This has been illustrated through a simple example in Fig. 30.4. In Fig. 30.4(a), a higher priority task T1=(10,30) is in phase with a lower priority task T2=(60,120), the response time of T2 is 90 msec. However, in Fig. 30.4(b), when T1 has a 20 msec phase, the response time of T2 becomes 80. Therefore, if a task meets its first deadline under zero phasing, then they it will meet all its deadlines. Example 7: Check whether the task set of Example 6 is actually schedulable under RMA. Solution: Though the results of Liu and Laylands test were negative as per the results of Example 6, we can apply the Lehoczky test and observe the following: For the task T1: e1 < p1 holds since 20 msec < 100 msec. Therefore, it would meet its first deadline (it does not have any tasks that have higher priority). Deadline for T1
T1 20
T1 20
T2 50
T3
T1
T3 150
T2
For the task T2: T1 is its higher priority task and considering 0 phasing, it would occur once before the deadline of T2. Therefore, (e1 + e2) < p2 holds, since 20 + 30 = 50 msec < 150 msec. Therefore, T2 meets its first deadline. For the task T3: (2e1 + 2e2 + e3) < p3 holds, since 220 + 230 + 90 = 190msec < 200 msec. We have considered 2e1 and 2e2 since T1 and T2 occur twice within the first deadline of T3. Therefore, T3 meets its first deadline. So, the given task set is schedulable under RMA. The schedulability test for T3 has pictorially been shown in Fig. 30.5. Since all the tasks meet their first deadlines under zero phasing, they are RMA schedulable according to Lehoczkys results.
Ti(1) T1(1)
T1(2)
T1(3)
Fig. 30.6 Instances of T1 over a single instance of Ti Let us now try to derive a formal expression for this important result of Lehoczky. Let {T1, T2, ,Ti} be the set of tasks to be scheduled. Let us also assume that the tasks have been ordered in descending order of their priority. That is, task priorities are related as: pr(T1) > pr(T2) > > pr(Ti), where pr(Ti) denotes the priority of the task Ti. Observe that the task T1 has the highest priority and task Ti has the least priority. This priority ordering can be assumed without any loss of generalization since the required priority ordering among an arbitrary collection of tasks can always be achieved by a simple renaming of the tasks. Consider that the task Ti arrives at the time instant 0. Consider the example shown in Fig. 30.6. During the first instance of the task Ti, three instances of the task T1 have occurred. Each time T1 occurs, Ti has to wait since T1 has higher priority than Ti. Let us now determine the exact number of times that T1 occurs within a single instance of Ti. This is given by pi / p1. Since T1s execution time is e1, then the total execution time required due to task T1 before the deadline of Ti is pi / p1 e1. This expression can easily be generalized to consider the execution times all tasks having higher priority than Ti (i.e. T1, T2, , Ti1). Therefore, the time for which Ti will have to wait due to all its higher priority tasks can be expressed as: i-1 pi / pk ek (3.5/2.11) k=1 Expression 3.5 gives the total time required to execute Tis higher priority tasks for which Ti would have to wait. So, the task Ti would meet its first deadline, iff i-1 ei + k=1 pi / pk ek pi (3.6/2.12) That is, if the sum of the execution times of all higher priority tasks occurring before Tis first deadline, and the execution time of the task itself is less than its period pi, then Ti would complete before its first deadline. Note that in Expr. 3.6, we have implicitly assumed that the
task periods equal their respective deadlines, i.e. pi = di. If pi < di, then the Expr. 3.6 would need modifications as follows. i-1 ei + k=1 di / pk ek di (3.7/2.13) Note that even if Expr. 3.7 is not satisfied, there is some possibility that the task set may still be schedulable. This might happen because in Expr. 3.7 we have considered zero phasing among all the tasks, which is the worst case. In a given problem, some tasks may have non-zero phasing. Therefore, even when a task set narrowly fails to meet Expr 3.7, there is some chance that it may in fact be schedulable under RMA. To understand why this is so, consider a task set where one particular task Ti fails Expr. 3.7, making the task set not schedulable. The task misses its deadline when it is in phase with all its higher priority task. However, when the task has non-zero phasing with at least some of its higher priority tasks, the task might actually meet its first deadline contrary to any negative results of the expression 3.7. Let us now consider two examples to illustrate the applicability of the Lehoczkys results. Example 8: Consider the following set of three periodic real-time tasks: T1=(10,20), T2=(15,60), T3=(20,120) to be run on a uniprocessor. Determine whether the task set is schedulable under RMA. Solution: First let us try the sufficiency test for RMA schedulability. By Expr. 3.4 (Liu and Layland test), the task set is schedulable if ui 0.78. ui = 10/20 + 15/60 + 20/120 = 0.91 This is greater than 0.78. Therefore, the given task set fails Liu and Layland test. Since Expr. 3.4 is a pessimistic test, we need to test further. Let us now try Lehoczkys test. All the tasks T1, T2, T3 are already ordered in decreasing order of their priorities. Testing for task T1: Since e1 (10 msec) is less than d1 (20 msec), T1 would meet its first deadline. Testing for task T2: 15 + 60/20 10 60 or 15 + 30 = 45 60 msec The condition is satisfied. Therefore, T2 would meet its first deadline. Testing for Task T3: 20 + 120/20 10 + 120/60 15 = 20 + 60 + 30 = 110 msec This is less than T3s deadline of 120. Therefore T3 would meet its first deadline. Since all the three tasks meet their respective first deadlines, the task set is RMA schedulable according to Lehoczkys results. Example 9: RMA is used to schedule a set of periodic hard real-time tasks in a system. Is it possible in this system that a higher priority task misses its deadline, whereas a lower priority task meets its deadlines? If your answer is negative, prove your denial. If your answer is affirmative, give an example involving two or three tasks scheduled using RMA where the lower priority task meets all its deadlines whereas the higher priority task misses its deadline. Solution: Yes. It is possible that under RMA a higher priority task misses its deadline where as a lower priority task meets its deadline. We show this by constructing an example. Consider the following task set: T1 = (e1=15, p1=20), T2 = (e2=6, p2=35), T3 = (e3=3, p3=100). For the given task set, it is easy to observe that pr(T1) > pr(T2) > pr(T3). That is, T1, T2, T3 are ordered in decreasing order of their priorities. Version 2 EE IIT, Kharagpur 14
For this task set, T3 meets its deadline according to Lehoczkys test since e3 + p3 / p2 e2 + p3 / p1 e1 = 3 + ( 100/35 6) + ( 100/20 15) = 3 + (3 6) + (5 15) = 96 100 msec. But, T2 does not meet its deadline since e2 + p2 / p1 e1 = 6 + ( 35/20 15) = 6 + (2 15) = 36 msec. This is greater than the deadline of T2 (35 msec). As a consequence of the results of Example 9, by observing that the lowest priority task of a given task set meets its first deadline, we can not conclude that the entire task set is RMA schedulable. On the contrary, it is necessary to check each task individually as to whether it meets its first deadline under zero phasing. If one finds that the lowest priority task meets its deadline, and concludes that the entire task set would be feasibly scheduled under RMA, he is likely to be flawed.
1.5.5. Theorem 4
For a set of harmonically related tasks HS = {Ti}, the RMA schedulability criterion is given n by i=1 ui 1. Proof: Let us assume that T1, T2, , Tn be the tasks in the given task set. Let us further assume that the tasks in the task set T1, T2, , Tn have been arranged in increasing order of their periods. That is, for any i and j, pi < pj whenever i < j. If this relationship is not satisfied, then a simple renaming of the tasks can achieve this. Now, according to Expr. 3.6, a task Ti meets its i-1 deadline, if ei + k=1 pi / pk ek pi. Version 2 EE IIT, Kharagpur 15
However, since the task set is harmonically related, pi can be written as m pk for some m. pi / pk = pi / pk. Now, Expr. 3.6 can be written as: Using this, i-1 ei + k=1 (pi / pk) ek pi For Ti = Tn, we can write, en + k=1 (pn / pk) ek pn. Dividing both sides of this expression by pn, we get the required result. n n Hence, the task set would be schedulable iff k=1 ek / pk 1 or i=1 ui 1.
n-1
Task Queue
Priority Level 1 2 3 4 5 6
Fig. 30.7 Multi-Level Feedback Queue The disadvantages of RMA include the following: It is very difficult to support aperiodic and sporadic tasks under RMA. Further, RMA is not optimal when task periods and deadlines differ.
Thus, T2 will miss its first deadline. Hence, the given task set can not be feasibly scheduled under RMA. Now let us check the schedulability using DMA: Under DMA, the priority ordering of the tasks is as follows: pr(T2) > pr(T1) > pr(T3). Checking for T2: 15 msec < 20 msec. Hence, T2 will meet its first deadline. Checking for T1: (15 + 10) < 35 Hence T1 will meet its first deadline. Checking for T3: (20 + 30 + 40) < 200 Therefore, T3 will meet its deadline. Therefore, the given task set is schedulable under DMA but not under RMA.
The condition is satisfied; therefore T1 meets its first deadline. Checking for task T2: (222) + 32 < 150 The condition is satisfied; therefore T2 meets its first deadline. Checking for task T3: (222) + (322) + 90 < 200. The condition is satisfied; therefore T3 meets its first deadline. Therefore, the task set can be feasibly scheduled under RMA even when context switching overhead is taken into consideration.
ei + bti + k=1 pi / pk ek pi (3.9/2.16) We have so far implicitly assumed that a task undergoes at most a single self suspension. However, if a task undergoes multiple self-suspensions, then expression 3.9 we derived above, would need to be changed. We leave this as an exercise for the reader. Example 14: Consider the following set of periodic real-time tasks: T1 = (e1=10, p1=50), T2 = (e2=25, p2=150), T3 = (e3=50, p3=200) [all in msec]. Assume that the self-suspension times of T1, T2, and T3 are 3 msec, 3 msec, and 5 msec, respectively. Determine whether the tasks would meet their respective deadlines, if scheduled using RMA. Solution: The tasks are already ordered in descending order of their priorities. By using the generalized Lehoczkys condition given by Expr. 3.9, we get: For T1 to be schedulable: (10 + 3) < 50 Therefore T1 would meet its first deadline. For T2 to be schedulable: (25 + 6 + 103) < 150 Therefore, T2 meets its first deadline. For T3 to be schedulable: (50 + 11 + (104 + 252)) < 200 This inequality is also satisfied. Therefore, T3 would also meet its first deadline. It can therefore be concluded that the given task set is schedulable under RMA even when self-suspension of tasks is considered.
i-1
1.10.Exercises
1. State whether the following assertions are True or False. Write one or two sentences to justify your choice in each case. a. When RMA is used for scheduling a set of hard real-time periodic tasks, the upper bound on achievable utilization improves as the number in tasks in the system being developed increases. Version 2 EE IIT, Kharagpur 20
b.
2.
If a set of periodic real-time tasks fails Lehoczkys test, then it can safely be concluded that this task set can not be feasibly scheduled under RMA. c. A time-sliced round-robin scheduler uses preemptive scheduling. d. RMA is an optimal static priority scheduling algorithm to schedule a set of periodic real-time tasks on a non-preemptive operating system. e. Self-suspension of tasks impacts the worst case response times of the individual tasks much more adversely when preemption of tasks is supported by the operating system compared to the case when preemption is not supported. f. When a set of periodic real-time tasks is being scheduled using RMA, it can not be the case that a lower priority task meets its deadline, whereas some higher priority task does not. g. EDF (Earliest Deadline First) algorithm possesses good transient overload handling capability. h. A time-sliced round robin scheduler is an example of a non-preemptive scheduler. i. EDF algorithm is an optimal algorithm for scheduling hard real-time tasks on a uniprocessor when the task set is a mixture of periodic and aperiodic tasks. j. In a non-preemptable operating system employing RMA scheduling for a set of realtime periodic tasks, self-suspension of a higher priority task (due to I/O etc.) may increase the response time of a lower priority task. k. The worst-case response time for a task occurs when it is out of phase with its higher priority tasks. l. Good real-time task scheduling algorithms ensure fairness to real-time tasks while scheduling. State whether the following assertions are True or False. Write one or two sentences to justify your choice in each case. a. The EDF algorithm is optimal for scheduling real-time tasks in a uniprocessor in a non-preemptive environment. b. When RMA is used to schedule a set of hard real-time periodic tasks in a uniprocessor environment, if the processor becomes overloaded any time during system execution due to overrun by the lowest priority task, it would be very difficult to predict which task would miss its deadline. c. While scheduling a set of real-time periodic tasks whose task periods are harmonically related, the upper bound on the achievable CPU utilization is the same for both EDF and RMA algorithms. d. In a non-preemptive event-driven task scheduler, scheduling decisions are made only at the arrival and completion of tasks. e. The following is the correct arrangement of the three major classes of real-time scheduling algorithms in ascending order of their run-time overheads. static priority preemptive scheduling algorithms table-driven algorithms dynamic priority algorithms f. While scheduling a set of independent hard real-time periodic tasks on a uniprocessor, RMA can be as proficient as EDF under some constraints on the task set. g. RMA should be preferred over the time-sliced round-robin algorithm for scheduling a set of soft real-time tasks on a uniprocessor.
3.
4. 5.
6.
7.
Under RMA, the achievable utilization of a set of hard real-time periodic tasks would drop when task periods are multiples of each other compared to the case when they are not. i. RMA scheduling of a set of real-time periodic tasks using the Liu and Layland criterion might produce infeasible schedules when the task periods are different from the task deadlines. What do you understand by scheduling point of a task scheduling algorithm? How are the scheduling points determined in (i) clock-driven, (ii) event-driven, (iii) hybrid schedulers? How will your definition of scheduling points for the three classes of schedulers change when (a) self-suspension of tasks, and (b) context switching overheads of tasks are taken into account. What do you understand by jitter associated with a periodic task? How are these jitters caused? Is EDF algorithm used for scheduling real-time tasks a dynamic priority scheduling algorithm? Does EDF compute any priority value of tasks any time? If you answer affirmatively, then explain when is the priority computed and how is it computed. If you answer in negative, then explain the concept of priority in EDF. What is the sufficient condition for EDF schedulability of a set of periodic tasks whose period and deadline are different? Construct an example involving a set of three periodic tasks whose period differ from their respective deadlines such that the task set fails the sufficient condition and yet is EDF schedulable. Verify your answer. Show all your intermediate steps. A preemptive static priority real-time task scheduler is used to schedule two periodic tasks T1 and T2 with the following characteristics: Task T1 T2 Phase mSec 0 0 Execution Time mSec 10 20 Relative Deadline mSec 20 50 Period mSec 20 50
h.
8.
Assume that T1 has higher priority than T2. A background task arrives at time 0 and would require 1000mSec to complete. Compute the completion time of the background task assuming that context switching takes no more than 0.5 mSec. Assume that a preemptive priority-based system consists of three periodic foreground tasks T1, T2, and T3 with the following characteristics: Task T1 T2 T3 Phase mSec 0 0 0 Execution Time mSec 20 30 30 Relative Deadline mSec 100 150 300 Period mSec 100 150 300
T1 has higher priority than T2 and T2 has higher priority than T3. A background task Tb arrives at time 0 and would require 2000mSec to complete. Compute the completion time of the background task Tb assuming that context switching time takes no more than 1 mSec. Version 2 EE IIT, Kharagpur 22
9.
Consider the following set of four independent real-time periodic tasks. Task T1 T2 T3 T4 Start Time msec 20 40 20 60 Processing Time msec 25 10 15 50 Period msec 150 50 50 200
10.
11. 12.
Assume that task T3 is more critical than task T2. Check whether the task set can be feasibly scheduled using RMA. What is the worst case response time of the background task of a system in which the background task requires 1000 msec to complete? There are two foreground tasks. The higher priority foreground task executes once every 100mSec and each time requires 25mSec to complete. The lower priority foreground task executes once every 50 msec and requires 15 msec to complete. Context switching requires no more than 1 msec. Construct an example involving more than one hard real-time periodic task whose aggregate processor utilization is 1, and yet schedulable under RMA. Determine whether the following set of periodic tasks is schedulable on a uniprocessor using DMA (Deadline Monotonic Algorithm). Show all intermediate steps in your computation. Task T1 T2 T3 T4 Start Time mSec 20 60 40 25 Processing Time mSec 25 10 20 10 Period mSec 150 60 200 80 Deadline mSec 140 40 120 25
13.
Consider the following set of three independent real-time periodic tasks. Task T1 T2 T3 Start Time mSec 20 60 40 Processing Time mSec 25 10 50 Period mSec 150 50 200 Deadline mSec 100 30 150
14.
Determine whether the task set is schedulable on a uniprocessor using EDF. Show all intermediate steps in your computation. Determine whether the following set of periodic real-time tasks is schedulable on a uniprocessor using RMA. Show the intermediate steps in your computation. Is RMA optimal when the task deadlines differ from the task periods? Version 2 EE IIT, Kharagpur 23
Task T1 T2 T3 T4 15.
16.
17.
18.
19.
Construct an example involving two periodic real-time tasks which can be feasibly scheduled by both RMA and EDF, but the schedule generated by RMA differs from that generated by EDF. Draw the two schedules on a time line and highlight how the two schedules differ. Consider the two tasks such that for each task: a. the period is the same as deadline b. period is different from deadline Can multiprocessor real-time task scheduling algorithms be used satisfactorily in distributed systems. Explain the basic difference between the characteristics of a real-time task scheduling algorithm for multiprocessors and a real-time task scheduling algorithm for applications running on distributed systems. Construct an example involving a set of hard real-time periodic tasks that are not schedulable under RMA but could be feasibly scheduled by DMA. Verify your answer, showing all intermediate steps. Three hard real-time periodic tasks T1 = (50, 100, 100), T2 = (70, 200, 200), and T3 = (60, 400, 400) [time in msec] are to be scheduled on a uniprocessor using RMA. Can the task set be feasibly be scheduled? Suppose context switch overhead of 1 millisecond is to be taken into account, determine the schedulability. Consider the following set of three real-time periodic tasks. Task T1 T2 T3 a. b. Start Time mSec 20 40 60 Processing Time mSec 25 10 50 Period mSec 150 50 200 Deadline mSec 100 50 200
c.
d.
Check whether the three given tasks are schedulable under RMA. Show all intermediate steps in your computation. Assuming that each context switch incurs an overhead of 1 msec, determine whether the tasks are schedulable under RMA. Also, determine the average context switching overhead per unit of task execution. Assume that T1, T2, and T3 self-suspend for 10 msec, 20 msec, and 15 msec respectively. Determine whether the task set remains schedulable under RMA. The context switching overhead of 1 msec should be considered in your result. You can assume that each task undergoes self-suspension only once during each of its execution. Assuming that T1 and T2 are assigned the same priority value, determine the additional delay in response time that T2 would incur compared to the case when they are assigned distinct priorities. Ignore the self-suspension times and the context switch overhead for this part of the question. Version 2 EE IIT, Kharagpur 24
Module 6
Embedded System Software
Version 2 EE IIT, Kharagpur 1
Lesson 31
Concepts in Real-Time Operating Systems
Version 2 EE IIT, Kharagpur 2
1. Introduction
In the last three lessons, we discussed the important real-time task scheduling techniques. We highlighted that timely production of results in accordance to a physical clock is vital to the satisfactory operation of a real-time system. We had also pointed out that real-time operating systems are primarily responsible for ensuring that every real-time task meets its timeliness requirements. A real-time operating system in turn achieves this by using appropriate task scheduling techniques. Normally real-time operating systems provide flexibility to the programmers to select an appropriate scheduling policy among several supported policies. Deployment of an appropriate task scheduling technique out of the supported techniques is therefore an important concern for every real-time programmer. To be able to determine the suitability of a scheduling algorithm for a given problem, a thorough understanding of the characteristics of various real-time task scheduling algorithms is important. We therefore had a rather elaborate discussion on real-time task scheduling techniques and certain related issues such as sharing of critical resources and handling task dependencies. In this lesson, we examine the important features that a real-time operating system is expected to support. We start by discussing the time service supports provided by the real-time operating systems, since accurate and high precision clocks are very important to the successful operation any real- time application. Next, we point out the important features that a real-time operating system needs to support. Finally, we discuss the issues that would arise if we attempt to use a general purpose operating system such as UNIX or Windows in real-time applications.
system clock should have sufficiently fine resolution 1 to support the necessary time services. However, designers of real-time operating systems find it very difficult to support very fine resolution system clocks. In current technology, the resolution of hardware clocks is usually finer than a nanosecond (contemporary processor speeds exceed 3GHz). But, the clock resolution being made available by modern real-time operating systems to the programmers is of the order of several milliseconds or worse. Let us first investigate why real-time operating system designers find it difficult to maintain system clocks with sufficiently fine resolution. We then examine various time services that are built based on the system clock, and made available to the real-time programmers. The hardware clock periodically generates interrupts (often called time service interrupts). After each clock interrupt, the kernel updates the software clock and also performs certain other work (explained in Sec 4.1.1). A thread can get the current time reading of the system clock by invoking a system call supported by the operating system (such as the POSIX clock-gettime()). The finer the resolution of the clock, the more frequent need to be the time service interrupts and larger is the amount of processor time the kernel spends in responding to these interrupts. This overhead places a limitation on how fine is the system clock resolution a computer can support. Another issue that caps the resolution of the system clock is the response time of the clock-gettime() system call is not deterministic. In fact, every system call (or for that matter, a function call) has some associated jitter. The problem gets aggravated in the following situation. The jitter is caused on account of interrupts having higher priority than system calls. When an interrupt occurs, the processing of a system call is stalled. Also, the preemption time of system calls can vary because many operating systems disable interrupts while processing a system call. The variation in the response time (jitter) introduces an error in the accuracy of the time value that the calling thread gets from the kernel. Remember that jitter was defined as the difference between the worst-case response time and the best case response time (see Sec. 2.3.1). In commercially available operating systems, jitters associated with system calls can be several milliseconds. A software clock resolution finer than this error, is therefore not meaningful. We now examine the different activities that are carried out by a handler routine after a clock interrupt occurs. Subsequently, we discuss how sufficient fine resolution can be provided in the presence of jitter in function calls.
Handler Handler 2 1
Clock resolution denotes the time granularity provided by the clock of a computer. It corresponds to the duration of time that elapses between two successive clock ticks.
Each time a clock interrupt occurs, besides incrementing the software clock, the handler routine carries out the following activities: Process timer events: Real-time operating systems maintain either per-process timer queues or a single system-wide timer queue. The structure of such a timer queue has been shown in Fig. 31.1. A timer queue contains all timers arranged in order of their expiration times. Each timer is associated with a handler routine. The handler routine is the function that should be invoked when the timer expires. At each clock interrupt, the kernel checks the timer data structures in the timer queue to see if any timer event has occurred. If it finds that a timer event has occurred, then it queues the corresponding handler routine in the ready queue. Update ready list: Since the occurrence of the last clock event, some tasks might have arrived or become ready due to the fulfillment of certain conditions they were waiting for. The tasks in the wait queue are checked, the tasks which are found to have become ready, are queued in the ready queue. If a task having higher priority than the currently running task is found to have become ready, then the currently running task is preempted and the scheduler is invoked. Update execution budget: At each clock interrupt, the scheduler decrements the time slice (budget) remaining for the executing task. If the remaining budget becomes zero and the task is not complete, then the task is preempted, the scheduler is invoked to select another task to run.
1.1.3. Timers
We had pointed out that timer service is a vital service that is provided to applications by all real-time operating systems. Real-time operating systems normally support two main types of timers: periodic timers and aperiodic (or one shot) timers. We now discuss some basic concepts about these two types of timers. Version 2 EE IIT, Kharagpur 5
Periodic Timers: Periodic timers are used mainly for sampling events at regular intervals or performing some activities periodically. Once a periodic timer is set, each time after it expires the corresponding handler routine is invoked, it gets reinserted into the timer queue. For example, a periodic timer may be set to 100 msec and its handler set to poll the temperature sensor after every 100 msec interval. Aperiodic (or One Shot) Timers: These timers are set to expire only once. Watchdog timers are popular examples of one shot timers. f(){ wd_start(t1, exception-handler);
start
t1
end
Watchdog timers are used extensively in real-time programs to detect when a task misses its deadline, and then to initiate exception handling procedures upon a deadline miss. An example use of a watchdog timer has been illustrated in Fig. 31.2. In Fig. 31.2, a watchdog timer is set at the start of a certain critical function f() through a wd_start(t1) call. The wd_start(t1) call sets the watch dog timer to expire by the specified deadline (t1) of the starting of the task. If the function f() does not complete even after t1 time units have elapsed, then the watchdog timer fires, indicating that the task deadline must have been missed and the exception handling procedure is initiated. In case the task completes before the watchdog timer expires (i.e. the task completes within its deadline), then the watchdog timer is reset using a wd_ tickle() call.
Real-Time Priority Levels: A real-time operating system must support static priority levels. A priority level supported by an operating system is called static, when once the programmer assigns a priority value to a task, the operating system does not change it by itself. Static priority levels are also called real-time priority levels. This is because, as we discuss in section 4.3, all traditional operating systems dynamically change the priority levels of tasks from programmer assigned values to maximize system throughput. Such priority levels that are changed by the operating system dynamically are obviously not static priorities. Fast Task Preemption: For successful operation of a real-time application, whenever a high priority critical task arrives, an executing low priority task should be made to instantly yield the CPU to it. The time duration for which a higher priority task waits before it is allowed to execute is quantitatively expressed as the corresponding task preemption time. Contemporary real-time operating systems have task preemption times of the order of a few micro seconds. However, in traditional operating systems, the worst case task preemption time is usually of the order of a second. We discuss in the next section that this significantly large latency is caused by a non-preemptive kernel. It goes without saying that a real-time operating system needs to have a preemptive kernel and should have task preemption times of the order of a few micro seconds. Predictable and Fast Interrupt Latency: Interrupt latency is defined as the time delay between the occurrence of an interrupt and the running of the corresponding ISR (Interrupt Service Routine). In real-time operating systems, the upper bound on interrupt latency must be bounded and is expected to be less than a few micro seconds. The way low interrupt latency is achieved, is by performing bulk of the activities of ISR in a deferred procedure call (DPC). A DPC is essentially a task that performs most of the ISR activity. A DPC is executed later at a certain priority value. Further, support for nested interrupts are usually desired. That is, a realtime operating system should not only be preemptive while executing kernel routines, but should be preemptive during interrupt servicing as well. This is especially important for hard real-time applications with sub-microsecond timing requirements. Support for Resource Sharing Among Real-Time Tasks: If real- time tasks are allowed to share critical resources among themselves using the traditional resource sharing techniques, then the response times of tasks can become unbounded leading to deadline misses. This is one compelling reason as to why every commercial real-time operating system should at the minimum provide the basic priority inheritance mechanism. Support of priority ceiling protocol (PCP) is also desirable, if large and moderate sized applications are to be supported. Requirements on Memory Management: As far as general-purpose operating systems are concerned, it is rare to find one that does not support virtual memory and memory protection features. However, embedded real-time operating systems almost never support these features. Only those that are meant for large and complex applications do. Real-time operating systems for large and medium sized applications are expected to provide virtual memory support, not only to meet the memory demands of the heavy weight tasks of the application, but to let the memory demanding non-real-time applications such as text editors, e-mail software, etc. to also run on the same platform. Virtual memory reduces the average memory access time, but degrades the worst-case memory access time. The penalty of using virtual memory is the overhead associated with storing the address translation table and performing the virtual to physical address translations. Moreover, fetching pages from the secondary memory on demand incurs significant latency. Therefore, operating systems supporting virtual memory must provide the real-time Version 2 EE IIT, Kharagpur 7
applications with some means of controlling paging, such as memory locking. Memory locking prevents a page from being swapped from memory to hard disk. In the absence of memory locking feature, memory access times of even critical real-time tasks can show large jitter, as the access time would greatly depend on whether the required page is in the physical memory or has been swapped out. Memory protection is another important issue that needs to be carefully considered. Lack of support for memory protection among tasks leads to a single address space for the tasks. Arguments for having only a single address space include simplicity, saving memory bits, and light weight system calls. For small embedded applications, the overhead of a few Kilo Bytes of memory per process can be unacceptable. However, when no memory protection is provided by the operating system, the cost of developing and testing a program without memory protection becomes very high when the complexity of the application increases. Also, maintenance cost increases as any change in one module would require retesting the entire system. Embedded real-time operating systems usually do not support virtual memory. Embedded real-time operating systems create physically contiguous blocks of memory for an application upon request. However, memory fragmentation is a potential problem for a system that does not support virtual memory. Also, memory protection becomes difficult to support a non-virtual memory management system. For this reason, in many embedded systems, the kernel and the user processes execute in the same space, i.e. there is no memory protection. Hence, a system call and a function call within an application are indistinguishable. This makes debugging applications difficult, since a run away pointer can corrupt the operating system code, making the system freeze. Additional Requirements for Embedded Real-Time Operating Systems: Embedded applications usually have constraints on cost, size, and power consumption. Embedded real-time operating systems should be capable of diskless operation, since many times disks are either too bulky to use, or increase the cost of deployment. Further, embedded operating systems should minimize total power consumption of the system. Embedded operating systems usually reside on ROM. For certain applications which require faster response, it may be necessary to run the realtime operating system on a RAM. Since the access time of a RAM is lower than that of a ROM, this would result in faster execution. Irrespective of whether ROM or RAM is used, all ICs are expensive. Therefore, for real-time operating systems for embedded applications it is desirable to have as small a foot print (memory usage) as possible. Since embedded products are typically manufactured large scale, every rupee saved on memory and other hardware requirements impacts millions in profit.
The two most troublesome problems that a real-time programmer faces while using Unix for real-time applications include non-preemptive Unix kernel and dynamically changing priority of tasks.
Fig. 31.3 Invocation of an Operating System Service through System Call At the risk of digressing from the focus of this discussion, let us understand an important operating systems concept. Certain operations such as handling devices, creating processes, file operations, etc., need to be done in the kernel mode only. That is, application programs are prevented from carrying out these operations, and need to request the operating system (through a system call) to carry out the required operation. This restriction enables the kernel to enforce discipline among different programs in accessing these objects. In case such operations are not performed in the kernel mode, different application programs might interfere with each others operation. An example of an operating system where all operations were performed in user mode is the once popular operating system DOS (though DOS is nearly obsolete now). In DOS, application programs are free to carry out any operation in user mode 2 , including crashing the system by deleting the system files. The instability this can bring about is clearly unacceptable in real-time environment, and is usually considered insufficient in general applications as well.
2
In fact, in DOS there is only one mode of operation, i.e. kernel mode and user mode are indistinguishable.
A process running in kernel mode cannot be preempted by other processes. In other words, the Unix kernel is non-preemptive. On the other hand, the Unix system does preempt processes running in the user mode. A consequence of this is that even when a low priority process makes a system call, the high priority processes would have to wait until the system call completes. The longest system calls may take up to several hundreds of milliseconds to complete. Worst-case preemption times of several hundreds of milliseconds can easily cause, high priority tasks with short deadlines of the order of a few milliseconds to miss their deadlines. Let us now investigate, why the Unix kernel was designed to be non-preemptive in the first place. Whenever an operating system routine starts to execute, all interrupts are disabled. The interrupts are enabled only after the operating system routine completes. This was a very efficient way of preserving the integrity of the kernel data structures. It saved the overheads associated with setting and releasing locks and resulted in lower average task preemption times. Though a non-preemptive kernel results in worst-case task response time of upto a second, it was acceptable to Unix designers. At that time, the Unix designers did not foresee usage of Unix in real-time applications. Of course, it could have been possible to ensure correctness of kernel data structures by using locks at appropriate places rather than disabling interrupts, but it would have resulted in increasing the average task preemption time. In Sec. 4.4.4 we investigate how modern real-time operating systems make the kernel preemptive without unduly increasing the task preemption time.
Tasks 1
Priority Level
2 Task Queues 3 4 5 6 Fig. 31.4 Multi-Level Feedback Queues Unix periodically computes the priority of a task based on the type of the task and its execution history. The priority of a task (Ti) is recomputed at the end of its j-th time slice using the following two expressions: Pr(Ti, j) = Base(Ti) + CPU(Ti, j) + nice(Ti) (4.1) CPU(Ti, j) = U(Ti, j1) / 2 + CPU(Ti, j1) / 2 (4.2) where Pr(Ti, j) is the priority of the task Ti at the end of its j-th time slice; U(Ti , j) is the utilization of the task Ti for its j-th time slice, and CPU(Ti , j) is the weighted history of CPU utilization of the task Ti at the end of its j-th time slice. Base(Ti) is the base priority of the task Ti and nice(Ti) is the nice value associated with Ti. User processes can have non-negative nice values. Thus, effectively the nice value lowers the priority value of a process (i.e. being nice to the other processes). Expr. 4.2 has been recursively defined. Unfolding the recursion, we get: CPU(Ti, j) = U(Ti, j1) / 2 + U(Ti, j2) / 4 + (4.3) It can be easily seen from Expr. 4.3 that, in the computation of the weighted history of CPU utilization of a task, the activity (i.e. processing or I/O) of the task in the immediately concluded interval is given the maximum weightage. If the task used up CPU for the full duration of the slice (i.e. 100% CPU utilization), then CPU(Ti, j) gets a higher value indicating a lower priority. Observe that the activities of the task in the preceding intervals get progressively lower weightage. It should be clear that CPU(Ti, j) captures the weighted history of CPU utilization of the task Ti at the end of its j-th time slice. Now, substituting Expr 4.3 in Expr. 4.1, we get: Pr(Ti, j) = Base(Ti) + U(Ti, j1) / 2 + U(Ti, j2) / 4 + + nice(Ti) (4.4) The purpose of the base priority term in the priority computation expression (Expr. 4.4) is to divide all tasks into a set of fixed bands of priority levels. The values of U(Ti , j) and nice components are restricted to be small enough to prevent a process from migrating from its assigned band. The bands have been designed to optimize I/O, especially Version 2 EE IIT, Kharagpur 11
block I/O. The different priority bands under Unix in decreasing order of priorities are: swapper, block I/O, file manipulation, character I/O and device control, and user processes. Tasks performing block I/O are assigned the highest priority band. To give an example of block I/O, consider the I/O that occurs while handling a page fault in a virtual memory system. Such block I/O use DMA-based transfer, and hence make efficient use of I/O channel. Character I/O includes mouse and keyboard transfers. The priority bands were designed to provide the most effective use of the I/O channels. Dynamic re-computation of priorities was motivated from the following consideration. Unix designers observed that in any computer system, I/O is the bottleneck. Processors are extremely fast compared to the transfer rates of I/O devices. I/O devices such as keyboards are necessarily slow to cope up with the human response times. Other devices such as printers and disks deploy mechanical components that are inherently slow and therefore can not sustain very high rate of data transfer. Therefore, effective use of the I/O channels is very important to increase the overall system throughput. The I/O channels should be kept as busy as possible for letting the interactive tasks to get good response time. To keep the I/O channels busy, any task performing I/O should not be kept waiting for CPU. For this reason, as soon as a task blocks for I/O, its priority is increased by the priority re-computation rule given in Expr. 4.4. However, if a task makes full use of its last assigned time slice, it is determined to be computation-bound and its priority is reduced. Thus the basic philosophy of Unix operating system is that the interactive tasks are made to assume higher priority levels and are processed at the earliest. This gives the interactive users good response time. This technique has now become an accepted way of scheduling soft real-time tasks across almost all available general purpose operating systems. We can now state from the above observations that the overall effect of recomputation of priority values using Expr. 4.4 as follows: In Unix, I/O intensive tasks migrate to higher and higher priorities, whereas CPUintensive tasks seek lower priority levels. No doubt that the approach taken by Unix is very appropriate for maximizing the average task throughput, and does indeed provide good average responses time to interactive (soft realtime) tasks. In fact, almost every modern operating system does very similar dynamic recomputation of the task priorities to maximize the overall system throughput and to provide good average response time to the interactive tasks. However, for hard real-time tasks, dynamic shifting of priority values is clearly not appropriate.
Lack of Real-Time File Services: In Unix, file blocks are allocated as and when they are requested by an application. As a consequence, while a task is writing to a file, it may encounter an error when the disk runs out of space. In other words, no guarantee is given that disk space would be available when a task writes a block to a file. Traditional file writing approaches also result in slow writes since required space has to be allocated before writing a block. Another problem with the traditional file systems is that blocks of the same file may not be contiguously located on the disk. This would result in read operations taking unpredictable times, resulting in jitter in data access. In real-time file systems significant performance improvement can be achieved by storing files contiguously on the disk. Since the file system pre-allocates space, the times for read and write operations are more predictable. Inadequate Timer Services Support: In Unix systems, real-time timer support is insufficient for many hard real-time applications. The clock resolution that is provided to applications is 10 milliseconds, which is too coarse for many hard real-time applications.
ICP/IP Host System Target Board Fig. 31.5 Schematic Representation of a Host-Target System The main idea behind this approach is that the real-time operating system running on the target board be kept as small and simple as possible. This implies that the operating system on the target board would lack virtual memory management support, neither does it support any utilities such as compilers, program editors, etc. The processor on the target board would run the real-time operating system. The host system must have the program development environment, including compilers, editors, library, cross-compilers, debuggers etc. These are memory demanding applications that require virtual memory support. The host is usually connected to the target using a serial port or a TCP/IP connection (see Fig. 31.5). The real-time program is developed on the host. It is then cross-compiled to generate code for the target processor. Subsequently, the executable module is downloaded to the target board. Tasks are executed on the target board and the execution is controlled at the host side using a symbolic cross-debugger. Once the program works successfully, it is fused on a ROM or flash memory and becomes ready to be deployed in applications. Commercial examples of host-target real-time operating systems include PSOS, VxWorks, and VRTX. We examine these commercial products in lesson 5. We would point out that these operating systems, due to their small size, limited functionality, and optimal design achieve much better performance figures than full-fledged operating systems. For example, the task preemption times of these systems are of the order of few microseconds compared to several hundreds of milliseconds for traditional Unix systems.
the processing of the kernel routine and dispatches the waiting highest priority task immediately. The worst-case preemption latency in this technique therefore becomes the longest time between two consecutive preemption points. As a result, the worst-case response times of tasks are now several folds lower than those for traditional operating systems without preemption points. This makes the preemption point-based operating systems suitable for use in many categories hard real-time applications, though still not suitable for applications requiring preemption latency of the order of a few micro seconds or less. Another advantage of this approach is that it involves only minor changes to be made to the kernel code. Many operating systems have taken the preemption point approach in the past, a prominent example being HP-UX.
done from efficiency considerations and worked well for non-real-time and uniprocessor applications. Masking interrupts during kernel processing makes to even very small critical routines to have worst case response times of the order of a second. Further, this approach would not work in multiprocessor environments. In multiprocessor environments masking the interrupts for one processor does not help, as the tasks running on other processors can still corrupt the kernel data structure. It is now clear that in order to make the kernel preemptive, locks must be used at appropriate places in the kernel code. In fully preemptive Unix systems, normally two types of locks are used: kernel-level locks, and spin locks. T2 Busy wait Spin lock Critical Resource T1
Fig. 31.6 Operation of a Spin Lock A kernel-level lock is similar to a traditional lock. When a task waits for a kernel level lock to be released, it is blocked and undergoes a context switch. It becomes ready only after the required lock is released by the holding task and becomes available. This type of locks is inefficient when critical resources are required for short durations of the order of a few milliseconds or less. In some situations such context switching overheads are not acceptable. Consider that some task requires the lock for carrying out very small processing (possibly a single arithmetic operation) on some critical resource. Now, if a kernel level lock is used, another task requesting the lock at that time would be blocked and a context switch would be incurred, also the cache contents, pages of the task etc. may be swapped. Here a context switching time is comparable to the time for which a task needs a resource even greater than it. In such a situation, a spin lock would be appropriate. Now let us understand the operation of a spin lock. A spin lock has been schematically shown in Fig. 31.6. In Fig. 31.6, a critical resource is required by the tasks T1 and T2 for very short times (comparable to a context switching time). This resource is protected by a spin lock. The task T1 has acquired the spin lock guarding the resource. Meanwhile, the task T2 requests the resource. When task T2 cannot get access to the resource, it just busy waits (shown as a loop in the figure) and does not block and suffer context switch. T2 gets the resource as soon as T1 relinquishes the resource. Real-Time Priorities: Let us now examine how self-host systems address the problem of dynamic priority levels of the traditional Unix systems. In Unix based real-time operating systems, in addition to dynamic priorities, real-time and idle priorities are supported. Fig. 31.7 schematically shows the three available priority levels.
254 255
Fig. 31.7 Priority Changes in Self-host Unix Systems Idle(Non-Migrating): This is the lowest priority. The task that runs when there are no other tasks to run (idle), runs at this level. Idle priorities are static and are not recomputed periodically. Dynamic: Dynamic priorities are recomputed periodically to improve the average response time of soft real-time tasks. Dynamic re-computation of priorities ensures that I/O bound tasks migrate to higher priorities and CPU-bound tasks operate at lower priority levels. As shown in Fig. 31.7, dynamic priority levels are higher than the idle priority, but are lower than the real-time priorities. Real-Time: Real-time priorities are static priorities and are not recomputed. Hard real-time tasks operate at these levels. Tasks having real-time priorities operate at higher priorities than the tasks with dynamic priority levels.
Windows XP
Fig. 31.8 Genealogy of Operating Systems from Microsofts Stable An organization owning Windows NT systems might be interested to use it for its real-time applications on account of either cost saving or convenience. This is especially true in prototype application development and also when only a limited number of deployments are required. In the following, we critically analyze the suitability of Windows NT for real-time application development. First, we highlight some features of Windows NT that are very relevant and useful to a real-time application developer. In the subsequent subsection, we point out some of the lacuna of Windows NT when used in real-time application development.
31
Real-time idle 16 Dynamic-time critical 15 Dynamic 1-15 Dynamic normal Dynamic idle Idle 1 0
These problems have been avoided by Windows CE operating system through a priority inheritance mechanism. 2. Support for Resource Sharing Protocols: We had discussed in Chapter 3 that unless appropriate resource sharing protocols are used, tasks while accessing shared resources may suffer unbounded priority inversions leading to deadline misses and even system failure. Windows NT does not provide any support (such as priority inheritance, etc.) to support real-time tasks to share critical resource among themselves. This is a major shortcoming of Windows NT when used in real-time applications. Since most real-time applications do involve resource sharing among tasks we outline below the possible ways in which user-level functionalities can be added to the Windows NT system. The simplest approach to let real-time tasks share critical resources without unbounded priority inversions is as follows. As soon as a task is successful in locking a nonpreemptable resource, its priority can be raised to the highest priority (31). As soon as a task releases the required resource, its priority is restored. However, we know that this arrangement would lead to large inheritance-related inversions. Another possibility is to implement the priority ceiling protocol (PCP). To implement this protocol, we need to restrict the real-time tasks to have even priorities (i.e. 16, 18, ..., 30). The reason for this restriction is that NT does not support FIFO scheduling among equal priority tasks. If the highest priority among all tasks needing a resource is 2n, then the ceiling priority of the resource is 2n+1. In Unix, FIFO option among equal priority tasks is available; therefore all available priority levels can be used.
Though Windows NT has many of the features desired of a real-time operating system, its implementation of DPCs together its lack of protocol support for resource sharing among equal priority tasks makes it unsuitable for use in safety-critical real-time applications. A comparison of the extent to which some of the basic features required for real-time programming are provided by Windows NT and Unix V is indicated in Table 1. With careful programming, Windows NT may be useful for applications that can tolerate occasional deadline misses, and have deadlines of the order of hundreds of milliseconds than microseconds. Of course, to be used in such applications, the processor utilization must be kept sufficiently low and priority inversion control must be provided at the user level.
1.7. Exercises
1. State whether the following assertions are True or False. Justify your answer in each case. a. When RMA is used for scheduling a set of hard real-time periodic tasks, the upper bound on achievable utilization improves as the number in tasks in the system being developed increases. b. Under the Unix operating system, computation intensive tasks dynamically gravitate towards higher priorities. c. Normally, task switching time is larger than task preemption time. d. Suppose a real-time operating system does not support memory protection, then a procedure call and a system call are indistinguishable in that system. e. Watchdog timers are typically used to start certain tasks at regular intervals. f. For the memory of same size under segmented and virtual addressing schemes, the segmented addressing scheme would in general incur lower memory access jitter compared to the virtual addressing scheme. Even though clock frequency of modern processors is of the order of several GHz, why do many modern real-time operating systems not support nanosecond or even microsecond resolution clocks? Is it possible for an operating system to support nanosecond resolution clocks in operating systems at present? Explain how this can be achieved. Give an example of a real-time application for which a simple segmented memory management support by the RTOS is preferred and another example of an application for which virtual memory management support is essential. Justify your choices. Is it possible to meet the service requirements of hard real-time applications by writing additional layers over the Unix System V kernel? If your answer is no, explain the reason. If your answer is yes, explain what additional features you would implement in the external layer of Unix System V kernel for supporting hard real-time applications. Briefly indicate how Unix dynamically recomputes task priority values. Why is such recomputation of task priorities required? What are the implications of such priority recomputations on real-time application development? Why is Unix V non-preemptive in kernel mode? How do fully preemptive kernels based on Unix (e.g. Linux) overcome this problem? Briefly describe an experimental set up that can be used to determine the preemptability of different operating systems by high-priority real-time tasks when a low priority task has made a system call. Explain how interrupts are handled in Windows NT. Explain how the interrupt processing scheme of Windows NT makes it unsuitable for hard real-time applications. How has this problem been overcome in WinCE? Would you recommend Unix System V to be used for a few real-time tasks for running a data acquisition application? Assume that the computation time for these tasks is of the order of few hundreds of milliseconds and the deadline of these tasks is of the order of several tens of seconds. Justify your answer. Explain the problems that you would encounter if you try to develop and run a hard realtime system on the Windows NT operating system. Briefly explain why the traditional Unix kernel is not suitable to be used in a multiprocessor environments. Define a spin lock and a kernel-level lock and explain their use in realizing a preemptive kernel.
2.
3.
4.
5.
6.
7.
8.
9. 10.
11. 12.
13.
What do you understand by a microkernel-based operating system? Explain the advantages of a microkernel- based real-time operating system over a monolithic operating system. What is the difference between a self-host and a host-target based embedded operating system? Give at least one example of a commercial operating system from each category. What problems would a real-time application developer might face while using RT-Linux for developing hard real-time applications? What are the important features required in a real-time operating system? Analyze to what extent these features are provided by Windows NT and Unix V.
Module 6
Embedded System Software
Version 2 EE IIT, Kharagpur 1
Lesson 32
Commercial Real-Time Operating Systems
Version 2 EE IIT, Kharagpur 2
1. Introduction
Many real-time operating systems are at present available commercially. In this lesson, we analyze some of the popular real-time operating systems and investigate why these popular systems cannot be used across all applications. We also examine the POSIX standards for RTOS and their implications.
1.1. POSIX
POSIX stands for Portable Operating System Interface. X has been suffixed to the abbreviation to make it sound Unix-like. Over the last decade, POSIX has become an important standard in the operating systems area including real-time operating systems. The importance of POSIX can be gauzed from the fact that nowadays it has become uncommon to come across a commercial operating system that is not POSIX-compliant. POSIX started as an open software initiative. Since POSIX has now become overwhelmingly popular, we discuss the POSIX requirements on real-time operating systems. We start with a brief introduction to open software movement and then trace the historical events that have led to the emergence of POSIX. Subsequently, we highlight the important requirements of real-time POSIX.
Open Source: Provides portability at the source code level. To run an application on a new platform would require only compilation and linking. ANSI and POSIX are important open source standards. Open Object: This standard provides portability of unlinked object modules across different platforms. To run an application in a new environment, relinking of the object modules would be required. Open Binary: This standard provides complete software portability across hardware platforms based on a common binary language structure. An open binary product can be portable at the executable code level. At the moment, no open binary standards. The main goal of POSIX is application portability at the source code level. Before we discuss about RT-POSIX, let us explore the historical background under which POSIX was developed.
Open Source: Provides portability at the source code level. To run an application on a new platform would require only compilation and linking. ANSI and POSIX are important open source standards.
system interfaces and system call parameters shells and utilities test methods for verifying conformance to POSIX real-time extensions
Execution scheduling: An operating system to be POSIX-RT compliant must provide support for real-time (static) priorities. Performance requirements on system calls: It specifies the worst case execution times required for most real-time operating services. Priority levels: The number of priority levels supported should be at least 32. Timers: Periodic and one shot timers (also called watch dog timer) should be supported. The system clock is called CLOCK REALTIME when the system supports real-time POSIX. Real-time files: Real-time file system should be supported. A real-time file system can pre-allocate storage for files and should be able to store file blocks contiguously on the disk. This enables to have predictable delay in file access in virtual memory system. Memory locking: Memory locking should be supported. POSIX-RT defines the operating system services: mlockall() to lock all pages of a process, mlock() to lock a range of pages, and mlockpage() to lock only the current page. The unlock services are munlockall(), munlock(), and munlockpage. Memory locking services have been introduced to support deterministic memory access. Multithreading support: Real-time threading support is mandated. Real-time threads are schedulable entities of a real-time application that have individual timeliness constraints and may have collective timeliness constraints when belonging to a runnable set of threads.
1.6.1. PSOS
PSOS is a popular real-time operating system that is being primarily used in embedded applications. It is available from Wind River Systems, a large player in the real-time operating system arena. It is a host-target type of real- time operating system. PSOS is being used in Version 2 EE IIT, Kharagpur 5
several commercial embedded products. An example application of PSOS is in the base stations of the cellular systems.
Legend: XRAY+: Source level Debgguer PROBE: Target Debgger Editor Crosscompiler XRAY+ Libraries Host Computer TCP/IP
PROBE
Target
Fig. 32.1 PSOS-based Development of Embedded Software PSOS-based application development has schematically been shown in Fig. 32.1. The host computer is typically a desktop. Both Unix and Windows hosts are supported. The target board contains the embedded processor, ROM, RAM, etc. The host computer runs the editor, crosscompiler, source-level debugger, and library routines. On the target board PSOS+, and other optional modules such as PNA+, PHILE, and PROBE are installed on a RAM. PNA+ is the network manager. It provides TCP/IP communication over Ethernet and FDDI. It conforms to Unix 4.3 (BSD) socket syntax and is compatible with other TCP/IP-based networking standards such as ftp and NFS. Using these, PNA+ provides efficient downloading and debugging communication between the target and the host. PROBE+ is the target debugger and XRAY+ is the source-level debugger. The application development is done on the host machine and is downloaded to the target board. The application is debugged using the source debugger (XRAY+). Once the application runs satisfactorily, it is fused on a ROM and installed on the target board. We now highlight some important features of PSOS. PSOS consists of 32 priority levels. In the minimal configuration, the foot print of the operating system is only 12KBytes. For sharing critical resources among real-time tasks, it supports priority inheritance and priority ceiling protocols. It support segmented memory management. It allocates tasks to memory regions. A memory region is a physically contiguous block of memory. A memory region is created by the operating system in response to a call from an application. In most modern operating systems, the control jumps to the kernel when an interrupt occurs. PSOS takes a different approach. The device drivers are outside the kernel and can be loaded and removed at the run time. When an interrupt occurs, the processor jumps directly to the ISR (interrupt service routine) pointed to by the vector table. The intention is not only to gain speed, but also to give the application developer complete control over interrupt handling.
1.6.2. VRTX
VRTX is a POSIX-RT compliant operating system from Mentor Graphics. VRTX has been certified by the US FAA (Federal Aviation Agency) for use in mission and life critical applications such as avionics. VRTX has two multitasking kernels: VRTXsa and VRTXmc. VRTXsa is used for large and medium applications. It supports virtual memory. It has a POSIX-compliant library and supports priority inheritance. Its system calls are deterministic and fully preemptable. VRTXmc is optimized for power consumption and ROM and RAM sizes. It has therefore a very small foot print. The kernel typically requires only 4 to 8 Kbytes of ROM and 1KBytes of RAM. It does not support virtual memory. This version is targeted for cell phones and other small hand-held devices.
1.6.3. VxWorks
VxWorks is a product from Wind River Systems. It is host-target system. The host can be either a Windows or a Unix machine. It supports most POSIX-RT functionalities. VxWorks comes with an integrated development environment (IDE) called Tornado. In addition to the standard support for program development tools such as editor, cross-compiler, cross-debugger, etc. Tornado contains VxSim and WindView. VxSim simulates a VxWorks target for use as a prototyping and testing environment. WindView provides debugging tools for the simulator environment. VxMP is the multiprocessor version of VxWorks. VxWorks was deployed in the Mars Pathfinder which was sent to Mars in 1997. Pathfinder landed in Mars, responded to ground commands, and started to send science and engineering data. However, there was a hitch: it repeatedly reset itself. Remotely using trace generation, logging, and debugging tools of VxWorks, it was found that the cause was unbounded priority inversion. The unbounded priority inversion caused real-time tasks to miss their deadlines, and as a result, the exception handler reset the system each time. Although VxWorks supports priority inheritance, using the remote debugging tool, it was found to have been disabled in the configuration file. The problem was fixed by enabling it.
1.6.4. QNX
QNX is a product from QNX Software System Ltd. QNX Neutrino offers POSIX-compliant APIs and is implemented using microkernel architecture. The microkernel architecture of QNX is shown in Fig. 32.2. Because of the fine grained scalability of the micro- kernel architecture, it can be configured to a very small size a critical advantage in high volume devices, where even a 1% reduction in memory costs can return millions of dollars in profit.
TCP/IP Manager
1.6.5. C/OS-II
C/OS-II is a free RTOS, easily available on Internet. It is written in ANSI C and contains small portion of assembly code. The assembly language portion has been kept to a minimum to make it easy to port it to different processors. To date, C/OS-II has been ported to over 100 different processor architectures ranging from 8-bit to 64-bit microprocessors, microcontrollers, and DSPs. Some important features of C/OS-II are highlighted in the following.
C/OS-II was designed so that the programmer can use just a few of the offered services or select the entire range of services. This allows the programmer to minimize the amount of memory needed by C/OS-II on a per-product basis. C/OS-II has a fully preemptive kernel. This means that C/OS-II always ensures that the highest priority task that is ready would be taken up for execution. C/OS-II allows up to 64 tasks to be created. Each task operates at a unique priority level. There are 64 priority levels. This means that round-robin scheduling is not supported. The priority levels are used as the PID (Process Identifier) for the tasks. C/OS-II uses a partitioned memory management. Each memory partition consists of several fixed sized blocks. A task obtains memory blocks from the memory partition and the task must create a memory partition before it can be used. Allocation and deallocation of fixed-sized memory blocks is done in constant time and is deterministic. A task can create and use multiple memory partitions, so that it can use memory blocks of different sizes. C/OS-II has been certified by Federal Aviation Administration (FAA) for use in commercial aircraft by meeting the demanding requirements of its standard for software used in avionics. To meet the requirements of this standard it was demonstrated through documentation and testing that it is robust and safe.
1.6.6. RT Linux
Linux is by large a free operating system. It is robust, feature rich, and efficient. Several realtime implementations of Linux (RT-Linux) are available. It is a self-host operating system (see Fig. 32.3). RT-Linux runs along with a Linux system. The real-time kernel sits between the hardware and the Linux system. The RT kernel intercepts all interrupts generated by the hardware. Fig. 32.12 schematically shows this aspect. If an interrupt is to cause a real-time task
to run, the real-time kernel preempts Linux, if Linux is running at that time, and lets the real-time task run. Thus, in effect Linux runs as a task of RT-Linux. Linux RT Linux
Hardware
Fig. 32.3 Structure of RT Linux The real-time applications are written as loadable kernel modules. In essence, real-time applications run in the kernel space. In the approach taken by RT Linux, there are effectively two independent kernels: real-time kernel and Linux kernel. Therefore, this approach is also known as the dual kernel approach as the real-time kernel is implemented outside the Linux kernel. Any task that requires deterministic scheduling is run as a real-time task. These tasks preempt Linux whenever they need to execute and yield the CPU to Linux only when no real-time task is ready to run. Compared to the microkernel approach, the following are the shortcomings of the dual-kernel approach.
Duplicated Coding Efforts: Tasks running in the real-time kernel can not make full use of the Linux system services file systems, networking, and so on. In fact, if a real-time task invokes a Linux service, it will be subject to the same preemption problems that prohibit Linux processes from behaving deterministically. As a result, new drivers and system services must be created specifically for the real-time kernel even when equivalent services already exist for Linux. Fragile Execution Environment: Tasks running in the real-time kernel do not benefit from the MMU-protected environment that Linux provides to the regular non-real-time processes. Instead, they run unprotected in the kernel space. Consequently, any real-time task that contains a coding error such as a corrupt C pointer can easily cause a fatal kernel fault. This is serious problem since many embedded applications are safety-critical in nature. Limited Portability: In the dual kernel approach, the real-time tasks are not Linux processes at all; but programs written using a small subset of POSIX APIs. To aggravate the matter, different implementations of dual kernels use different APIs. As a result, realtime programs written using one vendors RT-Linux version may not run on anothers. Programming Difficulty: RT-Linux kernels support only a limited subset of POSIX APIs. Therefore, application development takes more effort and time.
1.6.7. Lynx
Lynx is a self host system. The currently available version of Lynx (Lynx 3.0) is a microkernel-based real-time operating system, though the earlier versions were based on monolithic design. Lynx is fully compatible with Linux. With Lynxs binary compatibility, a Linux programs binary image can be run directly on Lynx. On the other hand, for other Linux compatible operating systems such as QNX, Linux applications need to be recompiled in order to run on them. The Lynx microkernel is 28KBytes in size and provides the essential services in scheduling, interrupt dispatch, and synchronization. The other services are provided as kernel plug-ins (KPIs). By adding KPIs to the microkernel, the system can be configured to support I/O, file systems, sockets, and so on. With full configuration, it can function as a multipurpose Unix machine on which both hard and soft real-time tasks can run. Unlike many embedded real-time operating systems, Lynx supports memory protection.
1.6.8. Windows CE
Windows CE is a stripped down version of Windows, and has a minimum footprint of 400KBytes only. It provides 256 priority levels. To optimize performance, all threads are run in the kernel mode. The timer accuracy is 1 msec for sleep and wait related APIs. The different functionalities of the kernel are broken down into small non-preemptive sections. So, during system call preemption is turned off for only short periods of time. Also, interrupt servicing is preemptable. That is, it supports nested interrupts. It uses memory management unit (MMU) for virtual memory management. Windows CE uses a priority inheritance scheme to avoid priority inversion problem present in Windows NT. Normally, the kernel thread handling the page fault (i.e. DPC) runs at priority level higher than NORMAL (refer Sec. 4.5.2). When a thread with priority level NORMAL suffers a page fault, the priority of the corresponding kernel thread handling this page fault is raised to the priority of the thread causing the page fault. This ensures that a thread is not blocked by another lower priority thread even when it suffers a page fault.
1.6.9. Exercises
1. State whether the following statements are True or False. Justify your answer in each case. a. In real-time Linux (RT-Linux), real-time processes are scheduled at priorities higher than the kernel processes. b. EDF scheduling of tasks is commonly supported in commercial real-time operating systems such as PSOS and VRTX. c. POSIX 1003.4 (real-time standard) requires that real-time processes be scheduled at priorities higher than kernel processes. d. POSIX is an attempt by ANSI/IEEE to enable executable files to be portable across different Unix machines. What is the difference between block I/O and character I/O? Give examples of each. Which type of I/O is accorded higher priority by Unix? Why? List four important features that a POSIX 1003.4 (Real-Time standard) compliant operating system must support. Is preemptability of kernel processes required by POSIX 1003.4? Can a Unix-based operating system using the preemption-point technique claim to be POSIX 1003.4 compliant? Explain your answers. Version 2 EE IIT, Kharagpur 10
2. 3.
4.
5. 6. 7.
8. 9. 10.
Suppose you are the manufacturer of small embedded components used mainly in consumer electronics goods such as automobiles, MP3 players, and computer-based toys. Would you prefer to use PSOS, WinCE, or RT-Linux in your embedded component? Explain the reasons behind your answer. What is the difference between a system call and a function call? What problems, if any, might arise if the system calls are invoked as procedure calls? Explain how a real-time operating system differs from a traditional operating system. Name a few real-time operating systems that are commercially available. What is open software? Does an open software mandate portability of the executable files across different platforms? Name an open software standard for real-time operating systems. What is the advantage of using an open software operating system for real-time application development? What are the pros and cons of using an open software product in program development compared to a proprietary product? Identify at least four important advantages of using VxWorks as the operating system for real-time applications compared to using Unix V.3. What is an open source standard? How is it different from open object and open binary standards? Give some examples of popular open source software products. Can multithreading result in faster response times (compared to single threaded tasks) even in uniprocessor systems? Explain your answer and identify the reasons to support your answer.
7.
Module 7
Software Engineering Issues
Version 2 EE IIT, Kharagpur 1
Lesson 33
Introduction to Software Engineering
Version 2 EE IIT, Kharagpur 2
1. Introduction
With the advancement of technology, computers have become more powerful and sophisticated. The more powerful a computer is, the more sophisticated programs it can run. Thus, programmers have been tasked to solve larger and more complex problems. They have coped with this challenge by innovating and by building on their past programming experience. All those past innovations and experience of writing good quality programs in efficient and costeffective ways have been systematically organized into a body of knowledge. This body of knowledge forms the basis of software engineering principles. Thus, we can view software engineering as a systematic collection of past experience. The experience is arranged in the form of methodologies and guidelines.
Suppose you have a friend who asked you to build a small wall as shown in fig. 33.1. You would be able to do that using your common sense. You will get building materials like bricks; cement etc. and you will then build the wall.
Fig. 33.1 A Small Wall But what would happen if the same friend asked you to build a large multistoried building as shown in fig. 33.2?
Fig. 33.2 A Multistoried Building You don't have a very good idea about building such a huge complex. It would be very difficult to extend your idea about a small wall construction into constructing a large building. Even if you tried to build a large building, it would collapse because you would not have the requisite knowledge about the strength of materials, testing, planning, architectural design, etc. Building a small wall and building a large building are entirely different ball games. You can use your intuition and still be successful in building a small wall, but building a large building requires knowledge of civil, architectural and other engineering principles. Without using software engineering principles it would be difficult to develop large programs. In industry it is usually needed to develop large programs to accommodate multiple functions. A problem with developing such large commercial programs is that the complexity and difficulty levels of the programs increase exponentially with their sizes as shown in fig. 33.3. For example, a program of size 1,000 lines of code has some complexity. But a program with 10,000 LOC is not just 10 times more difficult to develop, but may as well turn out to be 100 times more difficult unless software engineering principles are used. In such situations software engineering techniques come to the rescue. Software engineering helps to reduce programming complexity. Software engineering principles use two important techniques to reduce problem complexity: abstraction and decomposition. The principle of abstraction (in fig. 33.4) implies that a problem can be simplified by omitting irrelevant details. Once the simpler problem is solved then the omitted details can be taken into consideration to solve the next lower level abstraction, and so on.
Fig. 33.3 Increase in development time and effort with problem size
The other approach to tackle problem complexity is decomposition. In this technique, a complex problem is divided into several smaller problems and then the smaller problems are solved one by one. However, in this technique any random decomposition of a problem into smaller parts will not help. The problem has to be decomposed such that each component of the decomposed problem can be solved independently and then the solution of the different Version 2 EE IIT, Kharagpur 5
Complexity, Efforts and Time taken to develop Size 3rd abstraction 1st abstraction Full problem Fig. 33.4 A hierarchy of abstraction
components can be combined to get the full solution. A good decomposition of a problem as shown in fig. 33.5 should minimize interactions among various components. If the different subcomponents are interrelated, then the different components cannot be solved separately and the desired reduction in complexity will not be realized.
Hardware cost / Software cost 1960 Year 2002 Fig. 33.6 Change in the relative cost of hardware and software over time Version 2 EE IIT, Kharagpur 6
Organizations are spending larger and larger portions of their budget on software. Not only are the software products turning out to be more expensive than hardware, but they also present a host of other problems to the customers: software products are difficult to alter, debug, and enhance; use resources non-optimally; often fail to meet the user requirements; are far from being reliable; frequently crash; and are often delivered late. Among these, the trend of increasing software costs is probably the most important symptom of the present software crisis. Remember that the cost we are talking of here is not on account of increased features, but due to ineffective development of the product characterized by inefficient resource usage, and time and cost over-runs. There are many factors that have contributed to the making of the present software crisis. Factors are larger problem sizes, lack of adequate training in software engineering, increasing skill shortage, and low productivity improvements. It is believed that the only satisfactory solution to the present software crisis can possibly come from a spread of software engineering practices among the engineers, coupled with further advancements to the software engineering discipline itself.
where it was introduced and rework those phases - possibly change the design or change the code and so on. Today, software testing has become very systematic and standard testing techniques are available. Testing activity has also become all encompassing in the sense that test cases are being developed right from the requirements specification stage. There is better visibility of design and code. By visibility we mean production of good quality, consistent and standard documents during every phase. In the past, very little attention was paid to producing good quality and consistent documents. In the exploratory style, the design and test activities, even if carried out (in whatever way), were not documented satisfactorily. Today, consciously good quality documents are being developed during product development. This has made fault diagnosis and maintenance smoother. Now, projects are first thoroughly planned. Project planning normally includes preparation of various types of estimates, resource scheduling, and development of project tracking plans. Several techniques and tools for tasks such as configuration management, cost estimation, scheduling, etc. are used for effective software project management. Several metrics are being used to help in software project management and software quality assurance.
prepare the test documents first, and some other engineer might begin with the design phase of the parts assigned to him. This would be one of the perfect recipes for project failure. A software life cycle model defines entry and exit criteria for every phase. A phase can start only if its phase-entry criteria have been satisfied. So without a software life cycle model, the entry and exit criteria for a phase cannot be recognized. Without models (such as classical waterfall model, iterative waterfall model, prototyping model, evolutionary model, spiral model etc.), it becomes difficult for software project managers to monitor the progress of the project. Many life cycle models have been proposed so far. Each of them has some advantages as well as some disadvantages. A few important and commonly used life cycle models are as follows:
Classical Waterfall Model Iterative Waterfall Model Prototyping Model Evolutionary Model Spiral Model
Feasibility Study Requirement analysis and specification Design Coding Testing Maintenance Fig. 33.7 Classical Waterfall Model
At first project managers or team leaders try to have a rough understanding of what is required to be done by visiting the client side. They study different input data to the system and output data to be produced by the system. They study what kind of processing is needed to be done on these data and they look at the various constraints on the behaviour of the system. After they have an overall understanding of the problem, they investigate the different solutions that are possible. Then they examine each of the solutions in terms of what kinds of resources are required, what would be the cost of development and what would be the development time for each solution. Based on this analysis, they pick the best solution and determine whether the solution is feasible financially and technically. They check whether the customer budget would meet the cost of the product and whether they have sufficient technical expertise in the area of development.
The following is an example of a feasibility study undertaken by an organization. It is intended to give one a feel of the activities and issues involved in the feasibility study phase of a typical software project.
Case Study A mining company named Galaxy Mining Company Ltd. (GMC) has mines located at various places in India. It has about fifty different mine sites spread across eight states. The company employs a large number of mines at each mine site. Mining being a risky profession, the company intends to operate a special provident fund, which would exist in addition to the standard provident fund that the miners already enjoy. The main objective of having the special provident fund (SPF) would be to quickly distribute some compensation before the standard provident amount is paid. According to this scheme, each mine site would deduct SPF instalments from each miner every month and deposit the same with the CSPFC (Central Special Provident Fund Commissioner). The CSPFC will maintain all details regarding the SPF instalments collected from the miners. GMC employed a reputed software vendor Adventure Software Inc. to undertake the task of developing the software for automating the maintenance of SPF records of all employees. GMC realized that besides saving manpower on bookkeeping work, the software would help in speedy settlement of claim cases. GMC indicated that the amount it could afford for this software to be developed and installed was 1 million rupees. Adventure Software Inc. deputed their project manager to carry out the feasibility study. The project manager discussed the matter with the top managers of GMC to get an overview of the project. He also discussed the issues involved with the several field PF officers at various mine sites to determine the exact details of the project. The project manager identified two broad approaches to solve the problem. One was to have a central database which could be accessed and updated via a satellite connection to various mine sites. The other approach was to have local databases at each mine site and to update the central database periodically through a dial-up connection. These periodic updates could be done on a daily or hourly basis depending on the delay acceptable to GMC in invoking various functions of the software. The project manager found that the second approach was very affordable and more fault-tolerant as the local mine sites could still operate even when the communication link to the central database temporarily failed. The project manager quickly analyzed the database functionalities required, the userinterface issues, and the software handling communication with the mine sites. He arrived at a cost to develop from the analysis. He found that the solution involving maintenance of local databases at the mine sites and periodic updating of a central database was financially and technically feasible. The project manager discussed his solution with the GMC management and found that the solution was acceptable to them as well.
The goal of the requirements gathering activity is to collect all relevant information from the customer regarding the product to be developed with a view to clearly understand the customer requirements and weed out the incompleteness and inconsistencies in these requirements.
The requirements analysis activity is begun by collecting all relevant data regarding the product to be developed from the users of the product and from the customer through interviews and discussions. For example, to perform the requirements analysis of a business accounting software required by an organization, the analyst might interview all the accountants of the organization to ascertain their requirements. The data collected from such a group of users usually contain several contradictions and ambiguities, since each user typically has only a partial and incomplete view of the system. Therefore it is necessary to identify all ambiguities and contradictions in the requirements and resolve them through further discussions with the customer. After all ambiguities, inconsistencies, and incompleteness have been resolved and all the requirements properly understood, the requirements specification activity can start. During this activity, the user requirements are systematically organized into a Software Requirements Specification (SRS) document. The customer requirements identified during the requirements gathering and analysis activity are organized into an SRS document. The important components of this document are functional requirements, the non-functional requirements, and the goals of implementation.
3.2.3. Design
The goal of the design phase is to transform the requirements specified in the SRS document into a structure that is suitable for implementation in some programming language. In technical terms, during the design phase the software architecture is derived from the SRS document. Two distinctly different approaches are available: the traditional design approach and the objectoriented design approach. Traditional design approach: Traditional design consists of two different activities; first a structured analysis of the requirements specification is carried out where the detailed structure of the problem is examined. This is followed by a structured design activity. During structured design, the results of structured analysis are transformed into the software design. Object-oriented design approach: In this technique, various objects that occur in the problem domain and the solution domain are first identified, and the different relationships that exist among these objects are identified. The object structure is further refined to obtain the detailed design.
The different modules making up a software product are almost never integrated in one shot. Integration is normally carried out incrementally over a number of steps. During each integration step, the partially integrated system is tested and a set of previously planned modules are added to it. Finally, when all the modules have been successfully integrated and tested, system testing is carried out. The goal of system testing is to ensure that the developed system conforms to the requirements laid out in the SRS document. System testing usually consists of three different kinds of testing activities:
testing: It is the system testing performed by the development team. testing: It is the system testing performed by a friendly set of customers. Acceptance testing: It is the system testing performed by the customer himself after product delivery to determine whether to accept or reject the delivered product.
System testing is normally carried out in a planned manner according to the system test plan document. The system test plan identifies all testing-related activities that must be performed, specifies the schedule of testing, and allocates resources. It also lists all the test cases and the expected outputs for each test case.
3.2.6. Maintenance
Maintenance of a typical software product requires much more than the effort necessary to develop the product itself. Many studies carried out in the past confirm this and indicate that the relative effort of development of a typical software product to its maintenance effort is roughly in the 40:60 ratio. Maintenance involves performing any one or more of the following three kinds of activities:
Correcting errors that were not discovered during the product development phase. This is called corrective maintenance. Improving the implementation of the system, and enhancing the functionalities of the system according to the customers requirements. This is called perfective maintenance. Porting the software to work in a new environment. For example, porting may be required to get the software to work on a new computer platform or with a new operating system. This is called adaptive maintenance.
how screens might look like how the user interface would behave how the system would produce outputs, etc.
This is something similar to what the architectural designers of a building do; they show a prototype of the building to their customer. The customer can evaluate whether he likes it or not and the changes that he would need in the actual product. A similar thing happens in the case of a software product and its prototyping model. 32.
The Spiral model of software development is shown in fig. 33.8. The diagrammatic representation of this model appears like a spiral with many loops. The exact number of loops in the spiral is not fixed. Each loop of the spiral represents a phase of the software process. For example, the innermost loop might be concerned with feasibility study; the next loop with requirements specification; the next one with design, and so on. Each phase in this model is split into four sectors (or quadrants) as shown in fig. 33.8. The following activities are carried out during each phase of a spiral model. First quadrant (Objective Setting): During the first quadrant, we need to identify the objectives of the phase. Examine the risks associated with these objectives Second quadrant (Risk Assessment and Reduction): A detailed analysis is carried out for each identified project risk. Steps are taken to reduce the risks. For example, if there is a risk that the requirements are inappropriate, a prototype system may be developed Third quadrant (Objective Setting): Develop and validate the next level of the product after resolving the identified risks. Fourth quadrant (Objective Setting): Review the results achieved so far with the customer and plan the next iteration around the spiral. With each iteration around the spiral, progressively a more complete version of the software gets built.
technically challenging software products that are prone to several kinds of risks. However, this model is much more complex than the other models. This is probably a factor deterring its use in ordinary projects.
3.6. Exercises
1. Mark the following as True or False. Justify your answer. a. All software engineering principles are backed by either scientific basis or theoretical proof. b. There are well defined steps through which a problem is solved using an exploratory style. c. Evolutionary life cycle model is ideally suited for development of very small software products typically requiring a few months of development effort. Version 2 EE IIT, Kharagpur 18
d.
2.
Prototyping life cycle model is the most suitable one for undertaking a software development project susceptible to schedule slippage. e. Spiral life cycle model is not suitable for products that are vulnerable to a large number of risks. For the following, mark all options which are true. a. Which of the following problems can be considered to be contributing to the present software crisis? large problem size lack of rapid progress of software engineering lack of intelligent engineers shortage of skilled manpower b. Which of the following are essential program constructs (i.e. it would not be possible to develop programs for any given problem without using the construct)? Sequence Selection Jump Iteration c. In a classical waterfall model, which phase precedes the design phase? Coding and unit testing Maintenance Requirements analysis and specification Feasibility study d. Among development phases of software life cycle, which phase typically consumes the maximum effort? Requirements analysis and specification Design Coding Testing e. Among all the phases of software life cycle, which phase consumes the maximum effort? Design Maintenance Testing Coding f. In the classical waterfall model, during which phase is the Software Requirement Specification (SRS) document produced? Design Maintenance Requirements analysis and specification Coding g. Which phase is the last development phase in the classical waterfall software life cycle? Design Maintenance Testing Coding Version 2 EE IIT, Kharagpur 19
3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25.
Which development phase in classical waterfall life cycle immediately follows coding phase? Design Maintenance Testing Requirement analysis and specification Identify the problem one would face, if he tries to develop a large software product without using software engineering principles. Identify the two important techniques that software engineering uses to tackle the problem of exponential growth of problem complexity with its size. State five symptoms of the present software crisis. State four factors that have contributed to the making of the present software crisis. Suggest at least two possible solutions to the present software crisis. Identify at least four basic characteristics that differentiate a simple program from a software product. Identify two important features of that a program must satisfy to be called as a structured program. Explain exploratory program development style. Show at least three important drawbacks of the exploratory programming style. Identify at least two advantages of using high-level languages over assembly languages. State at least two basic differences between control flow-oriented and data flow-oriented design techniques. State at least five advantages of object-oriented design techniques. State at least three differences between the exploratory style and modern styles of software development. Explain the problems that might be faced by an organization if it does not follow any software life cycle model. Differentiate between structured analysis and structured design. Identify at least three activities undertaken in an object-oriented software design approach. State why it is a good idea to test a module in isolation from other modules. Identify why different modules making up a software product are almost never integrated in one shot. Identify the necessity of integration and system testing. Identify six different phases of a classical waterfall model. Mention the reasons for which classical waterfall model can be considered impractical and cannot be used in real projects. Explain what a software prototype is. Identify three reasons for the necessity of developing a prototype during software development. Explain the situations under which it is beneficial to develop a prototype during software development. Identify the activities carried out during each phase of a spiral model. Discuss the advantages of using spiral model.
h.
Module 7
Software Engineering Issues
Version 2 EE IIT, Kharagpur 1
Lesson 34
Requirements Analysis and Specification
Version 2 EE IIT, Kharagpur 2
1. Introduction
The requirements analysis and specification phase starts once the feasibility study phase is complete and the project is found to be technically sound and feasible. The goal of the requirements analysis and specification phase is to clearly understand customer requirements and to systematically organize these requirements in a specification document. This phase consists of the following two activities: Requirements Gathering And Analysis Requirements Specification
What is the problem? Why is it important to solve the problem? What are the possible solutions to the problem? What exactly are the data input to the system and what exactly are the data output by the system? Version 2 EE IIT, Kharagpur 3
What are the likely complexities that might arise while solving the problem? If there are external software or hardware with which the developed software has to interface, then what exactly would the data interchange formats with the external system be?
After the analyst has understood the exact customer requirements, he proceeds to identify and resolve the various requirements problems. The most important requirements problems that the analyst has to identify and eliminate are the problems of anomalies, inconsistencies, and incompleteness. When the analyst detects any inconsistencies, anomalies or incompleteness in the gathered requirements, he resolves them by carrying out further discussions with the endusers and the customers.
3. SRS Document
After the analyst has collected all the requirements information regarding the software to be developed, and has removed all the incompleteness, in consistencies, and anomalies from the specification, he starts to systematically organize the requirements in the form of an SRS document. The important parts of SRS document are: Functional requirements of the system Non-functional requirements of the system, and Goals of implementation
Output data
Book details
So, the function Search Book (F1) takes the author's name and transforms it into book details. Functional requirements actually describe a set of high-level requirements, where each highlevel requirement takes some data from the user and provides some data to the user as an output. Also each high-level requirement might consist of several other functions.
to be input to the system, its input data domain, the output data domain, and the type of processing to be carried on the input data to obtain the output data. Let us first try to document the withdraw-cash function of an ATM (Automated Teller Machine) system. The withdraw-cash is a high-level requirement. It has several sub-requirements corresponding to the different user interactions. These different interaction sequences capture the different scenarios. Example: Withdraw Cash from ATM R1: withdraw cash Description: The withdraw cash function first determines the type of account that the user has and the account number from which the user wishes to withdraw cash. It checks the balance to determine whether the requested amount is available in the account. If enough balance is available, it outputs the required cash, otherwise it generates an error message. R1.1: select withdraw amount option Input: withdraw amount option Output: user prompted to enter the account type R1.2: select account type Input: user option Output: prompt to enter amount R1.3: get required amount Input: amount to be withdrawn in integer values greater than 100 and less than 10,000 in multiples of 100. Output: The requested cash and printed transaction statement. Processing: the amount is debited from the users account if sufficient balance is available, otherwise an error message displayed.
Response to undesired events: It should characterize acceptable responses to undesired events. These are called system response to exceptional conditions. Verifiable: All requirements of the system as documented in the SRS document should be verifiable. This means that it should be possible to determine whether or not requirements have been met in an implementation.
3.1.10.
A good SRS document should properly characterize the conditions under which different scenarios of interaction occur. Sometimes such conditions are complex and several alternative interaction and processing sequences may exist. There are two main techniques available to analyze and represent complex processing logic: decision trees and decision tables.
1. Decision Trees A decision tree gives a graphic view of the processing logic involved in decision making and the corresponding actions taken. The edges of a decision tree represent conditions and the leaf nodes represent the actions to be performed depending on the outcome of testing the condition. Example Consider Library Membership Automation Software (LMS) where it should support the following three options: New member Renewal Cancel membership New member option Decision: When the 'new member' option is selected, the software asks details about the member like member's name, address, phone number etc. Action: If proper information is entered, then a membership record for the member is created and a bill is printed for the annual membership charge plus the security deposit payable. Renewal option Decision: If the 'renewal' option is chosen, the LMS asks for the member's name and his membership number to check whether he is a valid member or not. Action: If the membership is valid then membership expiry date is updated and the annual membership bill is printed, otherwise an error message is displayed. Cancel membership option Decision: If the 'cancel membership' option is selected, then the software asks for member's name and his membership number. Action: The membership is cancelled, a cheque for the balance amount due to the member is printed and finally the membership record is deleted from the database. Decision tree representation of the above example The following tree (fig. 34.3) shows the graphical representation of the above example. After getting information from the user, the system makes a decision and then performs the corresponding actions.
Action Get details Create records Print bills Get details Update record Print bills
User output Cancellation Get details Print cheque Delete record Invalid Option Print error massage Fig. 34.3 Decision tree for LMS 2. Decision Tables A decision table is used to represent the complex processing logic in a tabular or a matrix form. The upper rows of the table specify the variables or conditions to be evaluated. The lower rows of the table specify the actions to be taken when the corresponding conditions are satisfied. Example Consider the previously discussed LMS example. The decision table shown in fig. 34.4 shows how to represent the problem in a tabular form. Here the table is divided into two parts. The upper part shows the conditions and the lower part shows what actions are taken. Each column of the table is a rule.
Conditions Valid selection New member Renewal Cancellation Actions Display error message Ask member's details Build customer record Generate bill Update expiry date Print cheque Delete record
Fig. 34.4 Decision table for LMS From the above table you can easily understand that, if the valid selection condition is false, then the action taken for this condition is 'display error message' and so on.
SEM
Model-oriented approaches are more suited to use in later phases of life cycle because here even minor changes to a specification may lead to drastic changes to the entire specification. They do not support logical conjunctions (AND) and disjunctions (OR). Property-oriented approaches are suitable for requirements specification because they can be easily changed. They specify a system as a conjunction of axioms and you can easily replace one axiom with another one.
B D
For example, Fig. 34.7 shows that we can compare node B with node D, but we can't compare node D with node A.
rigorous specification is more important than the formal specification itself. The construction of a rigorous specification clarifies several aspects of system behaviour that are not obvious in an informal specification.
Formal methods usually have a well-founded mathematical basis. Thus, formal
specifications are not only more precise, but also mathematically sound and can be used to reason about the properties of a specification and to rigorously prove that an implementation satisfies its specifications.
Formal methods have well-defined semantics. Therefore, ambiguity in specifications
is automatically avoided when one formally specifies a system. Version 2 EE IIT, Kharagpur 13
The mathematical basis of the formal methods facilitates automating the analysis of
specifications. For example, a tableau-based technique has been used to automatically check the consistency of specifications. Also, automatic theorem proving techniques can be used to verify that an implementation satisfies its specifications. The possibility of automatic verification is one of the most important advantages of formal methods.
Formal specifications can be executed to obtain immediate feedback on the features of
the specified system. This concept of executable specifications is related to rapid prototyping. Informally, a prototype is a toy working model of a system that can provide immediate feedback on the behaviour of the specified system, and is especially useful in checking the completeness of specifications.
from the fact that, even moderately complicated problems blow up the complexity of formal specification and their analysis. Also, a large unstructured set of mathematical formulae is difficult to comprehend.
5. Axiomatic Specification
In axiomatic specification of a system, first-order logic is used to write the pre and postconditions to specify the operations of the system in the form of axioms. The pre-conditions basically capture the conditions that must be satisfied before an operation can successfully be invoked. In essence, the pre-conditions capture the requirements on the input parameters of a function. The post-conditions are the conditions that must be satisfied when a function completes execution and the function is considered to have been executed successfully. Thus, the postconditions are essentially the constraints on the results produced for the function execution to be considered successful.
Establish the range of input values over which the function should behave correctly. Also find out other constraints on the input parameters and write them in the form of a predicate. Specify a predicate defining the conditions which must hold on the output of the function if it behaved properly. Version 2 EE IIT, Kharagpur 14
Establish the changes made to the functions input parameters after execution of the function. Pure mathematical functions do not change their input and therefore this type of assertion is not necessary for pure functions. Combine all of the above into pre and post conditions of the function.
5.2. Examples
Example 1 Specify the pre- and post-conditions of a function that takes a real number as argument and returns half the input value if the input is less than or equal to 100, or else returns double the value. f (x : real) : real pre : x R post : {(x100) (f(x) = x/2)} {(x>100) (f(x) = 2x)} Example 2 Axiomatically specify a function named search which takes an integer array and an integer key value as its arguments and returns the index in the array where the key value is present. search(X : IntArray, key : Integer) : Integer pre : i [Xfirst.Xlast], X[i] = key post : {(X[search(X, key)] = key) (X = X)} Here, the convention that has been followed is that, if a function changes any of its input parameters, and if that parameter is named X, then it has been referred that after the function completes execution as X.
6. Algebraic Specification
In the algebraic specification technique an object class or type is specified in terms of relationships existing between the operations defined on that type. It was first brought into prominence by Guttag [1980, 1985] in specification of abstract data types. Various notations of algebraic specifications have evolved, including those based on OBJ and Larch languages.
2. Exceptions section This section gives the names of the exceptional conditions that might occur when different operations are carried out. These exception conditions are used in the later sections of an algebraic specification. 3. Syntax section This section defines the signatures of the interface procedures. The collection of sets that form input domain of an operator and the sort where the output is produced are called the signature of the operator. For example, PUSH takes a stack and an element and returns a new stack. stack x element stack 4. Equations section This section gives a set of rewrite rules (or equations) defining the meaning of the interface procedures in terms of each other. In general, this section is allowed to contain conditional expressions.
6.2. Operators
By convention, each equation is implicitly universally quantified over all possible values of the variables. Names not mentioned in the syntax section such r or e are variables. The first step in defining an algebraic specification is to identify the set of required operations. After having identified the required operators, it is helpful to classify them as basic constructor operators, extra constructor operators, basic inspector operators, or extra inspection operators. The definition of these categories of operators is as follows:
1. Basic construction operators: These operators are used to create or modify entities of a
type. The basic construction operators are essential to generate all possible element of the type being specified. For example, create and append are basic construction operators.
2. Extra construction operators: These are the construction operators other than the basic
construction operators. For example, the operator remove is an extra construction operator, because even without using remove it is possible to generate all values of the type being specified.
3. Basic inspection operators: These operators evaluate attributes of a type without
modifying them, e.g., eval, get, etc. Let S be the set of operators whose range is not the data type being specified. The set of the basic operators S1 is a subset of S, such that each operator from S-S1 can be expressed in terms of the operators from S1.
4. Extra inspection operators. These are the inspection operators that are not basic
inspectors.
specification complete. Using a complete set of rewrite rules, it is possible to simplify an arbitrary sequence of operations on the interface procedures. The first step in defining an algebraic specification is to identify the set of required operations. After having identified the required operators, it is helpful to classify them as basic constructor operators, extra constructor operators, basic inspector operators, or extra inspector operators. A simple way to determine whether an operator is a constructor (basic or extra) or an inspector (basic or extra) is to check the syntax expression for the operator. If the type being specified appears on the right hand side of the expression then it is a constructor, otherwise it is an inspection operator. For example, in case of the following example, create is a constructor because point appears on the right hand side of the expression and point is the data type being specified. But, xcoord is an inspection operator since it does not modify the point type. Example Let us specify a data type point supporting the operations create, xcoord, ycoord, and isequal where the operations have their usual meaning. Types: defines point uses boolean, integer Syntax: create : integer integer point xcoord : point integer ycoord : point integer isequal : point point Boolean Equations: xcoord(create(x, y)) = x ycoord(create(x, y)) = y isequal(create(x1, y1), create(x2, y2)) = ((x1 = x2) and (y1 = y2)) In this example, we have only one basic constructor (create), and three basic inspectors (xcoord, ycoord, and isequal). Therefore, we have only 3 equations.
8. Exercises
1. Mark the following as True or False. Justify your answer. a. All software engineering principles are backed by either scientific basis or theoretical proof. b. Functional requirements address maintainability, portability, and usability issues. c. The edges of decision tree represent corresponding actions to be performed according to conditions. d. The upper rows of the decision table specify the corresponding actions to be taken when an evaluation test is satisfied. e. A column in a decision table is called an attribute. f. Pre-conditions of axiomatic specifications state the requirements on the parameters of the function before the function can start executing. g. Post-conditions of axiomatic specifications state the requirements on the parameters of the function when the function is completed.
h.
2.
Homogeneous algebra is a collection of different sets on which several operations are defined. i. Applications developed using 4 GLs would normally be more efficient and run faster compared to applications developed using 3 GL. For the following, mark all options which are true. j. An SRS document normally contains Functional requirements of the system Module structure Configuration management plan Non-functional requirements of the system Constraints on the system The structured specification technique that is used to reduce the effort in writing specification is Incremental specification Specification instantiation Both the above None of the above l. Examples of executable specifications are Third generation languages Fourth generation languages Second-generation languages First generation languages Identify the roles of a system analyst. Identify the important parts of an SRS document. Identify the problems an organization might face without developing an SRS document. Identify the non-functional requirement-issues that are considered for a given problem description. Discuss the problems that an unstructured specification would create during software development. Identify the necessity of using formal technique in the context of requirements specification. Identify the differences between model-oriented and property-oriented approaches in the context of requirements specification. Explain the use of operational semantic. Explain the use of algebraic specifications in the context of requirements specification. Identify the requirements of algebraic specifications to define a system. Identify the essential sections of an algebraic specification to define a system. Explain the steps for developing algebraic specification of simple problems. Identify the properties that every good algebraic specification should possess. Identify the basic properties of a structured specification. Discuss the advantages and disadvantages of algebraic specification. Write down the important features of an executable specification language with examples. k.
Module 7
Software Engineering Issues
Version 2 EE IIT, Kharagpur 1
Lesson 35
Modelling Timing Constraints
Version 2 EE IIT, Kharagpur 2
An event may either be instantaneous or may have certain duration. For example, a button press event is described by the duration for which the button was kept pressed. Some authors argue that durational events are really not a basic type of event, but can be expressed using other events. In fact, it is possible to consider a duration event as a combination of two events: a start event and an end event. For example, the button press event can be described by a combination of start button press and end button press events. However, it is often convenient to retain the notion of a durational event. In this text, we consider durational events as a special class of events. Using the preliminary notions about events discussed in this subsection, we classify various types of timing constraints in subsection 1.7.1.
Fig. 35.2 Deadline Constraint between two events e1 and e2 The deadline and delay constraints can further be classified into two types each based on whether the constraint is imposed on the stimulus or on the response event. This has been explained with some examples in section 1.3.
Call Initiator
Telephone system
Call Receiver
Environment
Once the receiver of the hand set is lifted, the dial tone must be produced by the system within 2 seconds, otherwise a beeping sound is produced until the handset is replaced. In this example, the lifting of the receiver hand set represents a stimulus to the telephone system and production of the dial tone is the response. ResponseStimulus (RS): Here the deadline is on the production of response counted from the corresponding stimulus. This is a behavioral constraint, since the constraint is imposed on the stimulus event. An example of an RS type of deadline constraint is the following: Once the dial tone appears, the first digit must be dialed within 30 seconds, otherwise the system enters an idle state and an idle tone is produced. ResponseResponse (RR): An RR type of deadline constraint is defined on two response events. In this case, once the first response event occurs, the second response event must occur before a certain deadline. This is a performance constraint, since the timing constraint has been defined on a response event. An example of an RR type of deadline constraint is the following: Once the ring tone is given to the callee, the corresponding ring back tone must be given to the caller within two seconds, otherwise the call is terminated. Here ring back tone and the corresponding ring tone are the two response events. Delay Constraints: We can identify only one type of delay constraint (SS type) in the telephone system example that we are considering. However, in other problems it may be possible to identify different types of delay constraints. An SS type of a delay constraint is a behavioral constraint. An example of an SS type of delay constraint is the following: Once a digit is dialled, the next digit should be dialled after at least 1 second. Otherwise, a beeping sound is produced until the call initiator replaces the handset. Here the delay constraint is defined on the event of dialling of the next digit (stimulus) after a digit is dialled (also a stimulus). Duration Constraint: A duration constraint on an event specifies the time interval over which the event acts. An example of a duration constraint is the following: If you press the button of the handset for less than 15 seconds, it connects to the local operator. If you press the button for any duration lasting between 15 to 30 seconds, it connects to the international operator. If you keep the button pressed for more than 30 seconds, then on releasing it would produce the dial tone. Timing Constraints Performance Constraints Behaviorial Constraints
Deadline
Duration
RR SR SR
RR
RS
SS SS
RS
A classification of the different types of timing constraints that we discussed in this section is shown in Fig. 35.4. Note that a performance constraint can either be delay, deadline, or durational type. The delay or deadline constraints on performance can either be RR or RS type. Similarly, the behavioral constraints can either be delay, deadline, or durational type. The delay or deadline constraints on behavior of environment can either be RS or SS type.
Fig. 35.5 Conventions Used in Drawing an EFSM We have already discussed that events can be considered to be of two types: stimulus events and response events. We had also discussed different types of timing constraints in Section 1.3. Now we explain how these constraints can be modelled by using EFSMs.
The EFSM model for this constraint is shown in Fig. 35.7. In Fig. 35.7, as soon as dial tone appears, a timer is set to expire in 30 seconds and the system transits to the Await First Digit state. If the timer expires before the first digit arrives, then the system transits to an idle state where an idle tone is produced. Otherwise, if the digit appears first, then the system transits to the Await Second Digit state. Await Second Digit First digit
Timer alarm/idle tone Dial tone/ set timer (30 s) Await First Digit Idle
Timer alarm/beeping
in 2 seconds. If the ring-back tone appears first, the system transits to Await First Digit state, else it enters Await Receiver On-hook state, and the call is terminated.
Next digit/beeping
Local Operator Button release Button press Set alarm (15sec) Await Event 1 Button release
International Operator
Dial Tone
Fig. 35.11 A Model of a Durational Constraint The EFSM model for this example is shown in Fig. 35.11. Note that we have introduced two intermediate states Await Event 1 and Await Event 2 to model a durational constraint.
3. Exercises
1. 2. Mark the following as True or False. Justify your answer. a. A deadline constraint between two stimuli can be considered to be a behavioural constraint on the environment of the system. Identify and represent the timing constraints in the following air-defense system by means of an extended state machine diagram. Classify each constraint into either performance or behavioral constraint. Every incoming missile must be detected within 0.2 seconds of its entering the radar coverage area. The intercept missile should be engaged within 5 seconds of detection of the target missile. The intercept missile should be fired after 0.1 Seconds of its engagement but no later than 1 second. Represent a wash-machine having the following specification by means of an extended state machine diagram. The wash-machine waits for the start switch to be pressed. After the user presses the start switch, the machine fills the wash tub with either hot or cold water depending upon the setting of the HotWash switch. The water filling continues until the high level is sensed. The machine starts the agitation motor and continues agitating the wash tub until either the Version 2 EE IIT, Kharagpur 13
3.
4. 5.
6.
7.
8.
preset timer expires or the user presses the stop switch. After the agitation stops, the machine waits for the user to press the startDrying switch. After the user presses the startDrying switch, the machine starts the hot air blower and continues blowing hot air into the drying chamber until either the user presses the Stop switch or the preset timer expires. What is the difference between a performance constraint and a behavioral constraint? Give practical examples of each type of constraint. Represent the timing constraints in a collision avoidance task in an air surveillance system as an extended finite state machine (EFSM) diagram. The collision avoidance task consists of the following activities. The first subtask named radar signal processor processes the radar signal on a signal processor to generate the track record in terms of the targets location and velocity within 100 mSec of receipt of the signal. The track record is transmitted to the data processor within 1 mSec after the track record is determined. A subtask on the data processor correlates the received track record with the track records of other targets that come close to detect potential collision that might occur within the next 500 mSec. If a collision is anticipated, then the corrective action is determined within 10 mSec by another subtask running on the data processor. The corrective action is transmitted to the track correction task within 25 mSec. Consider the following (partial) specification of a real-time system: The velocity of a space-craft must be sampled by a computer on-board the spacecraft at least once every second (the sampling event is denoted by S). After sampling the velocity, the current position is computed (denoted by event C) within 100 msec, parallelly, the expected position of the space-craft is retrieved from the database within 200 msec (denoted by event R). Using these data, the deviation from the normal course of the spacecraft must be determined within 100 msec (denoted by event D) and corrective velocity adjustments must be carried out before a new velocity value is sampled in (the velocity adjustment event is denoted by A). Calculated positions must be transmitted to the earth station at least once every minute (position transmission event is denoted by the event T). Identify the different timing constraints in the system. Classify these into either performance or behavioral constraints. Construct an EFSM to model the system. Construct the EFSM model of a telephone system whose (partial) behavior is described below: After lifting the receiver handset, the dial tone should appear within 20 seconds. If a dial tone can not be given within 20 seconds, then an idle tone is produced. After the dial tone appears, the first digit should to be dialled within 10 seconds and the subsequent five digits within 5 seconds of each other. If the dialling of any of the digits is delayed, then an idle tone is produced. The idle tone continues until the receiver handset is replaced. What are the different types of timing constraints that can occur in a system? Give examples of each.
Module 7
Software Engineering Issues
Version 2 EE IIT, Kharagpur 1
Lesson 36
Software Design Part 1
Version 2 EE IIT, Kharagpur 2
1. Introduction
The goal of the design phase is to transform the requirements specified in the SRS document into a structure that is suitable for implementation in some programming language. A good software design is seldom arrived by using a single step procedure, but requires several iterations through a series of steps. Design activities can be broadly classified into two important parts: Preliminary (or high-level) design and Detailed design High-level design means identification of different modules and the control relationships among them and the definition of the interfaces among these modules. The outcome of highlevel design is called the program structure or software architecture. During detailed design, the data structure and the algorithms of the different modules are designed. The outcome of the detailed design stage is usually known as the module-specification document.
1.2.1. Cohesion
Most researchers and engineers agree that a good software design implies clean decomposition of the problem into modules, and the neat arrangement of these modules in a hierarchy. The primary characteristics of neat module decomposition are high cohesion and low coupling. Cohesion is a measure of functional strength of a module. A module having high cohesion and low coupling is said to be functionally independent of other modules. By the term functional independence, we mean that a cohesive module performs a single task or function. The different classes of cohesion that a module may possess are depicted in fig. 36.1.
Coincidental Low Logical Temporal Procedural Communicational Sequential Functional High
Fig. 36.1 Classification of Cohesion Coincidental cohesion: A module is said to have coincidental cohesion, if it performs a set of tasks that relate to each other very loosely, if at all. In this case, the module contains a random collection of functions. It is likely that the functions have been put in the module out of pure coincidence without any thought or design. Logical cohesion: A module is said to be logically cohesive, if all elements of the module perform similar operations, e.g. error handling, data input, data output, etc. An example of logical cohesion is the case where a set of print functions generating different output reports are arranged into a single module. Temporal cohesion: When a module contains functions that are related by the fact that all the functions must be executed in the same time span, the module is said to exhibit temporal
cohesion. The set of functions responsible for initialization, start-up, shutdown of some process, etc. exhibit temporal cohesion. Procedural cohesion: A module is said to possess procedural cohesion, if the set of functions of the module are all part of a procedure (algorithm) in which a certain sequence of steps have to be carried out for achieving an objective, e.g. the algorithm for decoding a message. Communicational cohesion: A module is said to have communicational cohesion, if all functions of the module refer to or update the same data structure, e.g. the set of functions defined on an array or a stack. Sequential cohesion: A module is said to possess sequential cohesion, if the elements of a module form the parts of sequence, where the output from one element of the sequence is input to the next. Functional cohesion: Functional cohesion is said to exist, if different elements of a module cooperate to achieve a single function. For example, a module containing all the functions required to manage employees pay-roll displays functional cohesion. Suppose a module displays functional cohesion, and we are asked to describe what the module does, then we would be able to describe it using a single sentence.
1.2.2. Coupling
Coupling between two modules is a measure of the degree of interdependence or interaction between the two modules. A module having high cohesion and low coupling is said to be functionally independent of other modules. If two modules interchange large amounts of data, then they are highly interdependent. The degree of coupling between two modules depends on their interface complexity. The interface complexity is basically determined by the number of types of parameters that are interchanged while invoking the functions of the module. Even if no techniques to precisely and quantitatively estimate the coupling between two modules exist today, classification of the different types of coupling will help to quantitatively estimate the degree of coupling between two modules. Five types of coupling can occur between any two modules as shown in fig. 36.2. Date Low Fig. 36.2 Classification of coupling Stamp Coupling: Two modules are stamped coupled, if they communicate using a composite data item such as a record in PASCAL or a structure in C. Control coupling: Control coupling exists between two couples, if data from one module is used to direct the order of instructions execution in another. An example of control coupling is a flag set in one module and tested in another module. Common coupling: Two modules are common coupled, if they share some global data items. Content coupling: Content coupling exists between two modules, if their code is shared, e.g. a branch from one module into another module. Version 2 EE IIT, Kharagpur 5 Stamp Control Common Content High
world functions such as sort, display, track, etc, but real-world entities such as employee, picture, machine, radar system, etc. For example in OOD, an employee pay-roll software is not developed by designing functions such as update-employeerecord, get-employee-address, etc. but by designing objects such as employees, departments, etc.
In object-oriented design, software is not developed by designing functions such as
update-employee-record, get-employee-address, etc., but by designing objects such as employee, department, etc.
In OOD, state information is not represented in a centralized shared memory but is
distributed among the objects of the system. For example, while developing an employee pay-roll system, the employee data such as the names of the employees, their code numbers, basic salaries, etc. are usually implemented as global data in a traditional programming system; whereas in an object-oriented system these data are distributed among different employee objects of the system. Objects communicate by passing messages. Therefore, one object may discover the state information of another object by interrogating it. Of course, somewhere or the other the real-world functions must be implemented.
Function-oriented techniques such as SA/SD group functions together if, as a group,
they constitute a higher-level function. On the other hand, object-oriented techniques group functions together on the basis of the data they operate on. To illustrate the differences between the object-oriented and the function-oriented design approaches, an example can be considered. Example: Fire-Alarm System The owner of a large multi-storied building wants to have a computerized fire alarm system for his building. Smoke detectors and fire alarms would be placed in each room of the building. The fire alarm system would monitor the status of these smoke detectors. Whenever a fire condition is reported by any of the smoke detectors, the fire alarm system should determine the location at which the fire condition is reported by any of the smoke detectors. The fire alarm system should determine the location at which the fire condition has occurred and then sound the alarms only in the neighboring locations. The fire alarm system should also flash an alarm message on the computer consol. Fire fighting personnel man the console round the clock. After a fire condition has been successfully handled, the fire alarm system should support resetting the alarms by the fire fighting personnel.
Function-Oriented Approach: /* Global data (system state ) accessible by various functions */ BOOL detector_status[MAX_ROOMS]; int detector_locs[MAX_ROOMS]; BOOL alarm_status[MAX_ROOMS];/* alarm activated when status is set */ int alarm_locs[MAX_ROOMS]; /* room number where alarm is located */ int neighbor-alarm[MAX_ROOMS][10]; /* each detector has at most 10 neighboring locations */ The functions which operate on the system state are: interrogate_detectors(); get_detector_location(); determine_neighbor(); ring_alarm(); reset_alarm(); report_fire_location(); Object-Oriented Approach: class detector attributes: status, location, neighbors operations: create, sense-status, get-location, find-neighbors class alarm attributes: location, status operations: create, ring-alarm, get_location, reset-alarm In the object oriented program, an appropriate number of instances of the class detector and alarm should be created. If the function-oriented and the object-oriented programs are examined, then it is seen that in the function-oriented program the system state is centralized and several functions on this central data is defined. In case of the object-oriented program, the state information is distributed among various objects. It is not necessary that an object-oriented design be implemented by using an object-oriented language only. However, an object-oriented language such as C++, supports the definition of all the basic mechanisms of class, inheritance, objects, methods, etc., and also supports all key object-oriented concepts that we have just discussed. Thus, an object-oriented language facilitates the implementation of an OOD. However, an OOD can as well be implemented using a conventional procedural language though it may require more effort to implement an OOD using a procedural language as compared to the effort required for implementing the same design using an object-oriented language. Even though object-oriented and function-oriented approaches are remarkably different approaches to software design, they do not replace each other but complement each other in some sense. For example, usually one applies the top-down function oriented techniques to design the internal methods of a class, once the classes are identified. In this case, though outwardly the system appears to have been developed in an object-oriented fashion, inside each class there may be a small hierarchy of functions designed in a top-down manner.
Data Store
Process
External Entity
Data Flow
Output
Fig. 36.3 Symbols used for designing DFDs The main reason why the DFD technique is so popular is probably because of the fact that DFD is a very simple formalism it is simple to understand and use. Starting with a set of highlevel functions that a system performs, a DFD model hierarchically represents various subfunctions. In fact, any hierarchical model is simple to understand. The human mind is such that it can easily understand any hierarchical model of a system because in a hierarchical model, starting with a very simple and abstract model of a system, different details of the system are slowly introduced through different hierarchies. The data flow diagramming technique also follows a very simple set of intuitive concepts and rules. DFD is an elegant modeling technique that turns out to be useful not only to represent the results of structured analysis of a software problem but also for several other applications such as showing the flow of documents or items in an organization.
engineers working in a project. A consistent vocabulary for data items is very important, since in large projects different engineers of the project have a tendency to use different terms to refer to the same data, which unnecessarily causes confusion.
The data dictionary provides the analyst with a means to determine the definition of
P13 0.1.3 d3
To develop the context diagram of the system, we have to analyse the SRS document to identify the different types of users who would be using the system and the kinds of data they would be inputting to the system and the data they would be receiving from the system. Here, the term users of the system also includes the external systems which supply data to or receive data from the system. The bubble in the context diagram is annotated with the name of the software system being developed (usually a noun). This is in contrast with the bubbles in all other levels which are annotated with verbs. This is expected since the purpose of the context diagram is to capture the context of the system rather than its functionality. Example 1: RMS Calculating Software A software system called RMS calculating software would read three integral numbers from the user in the range of -1000 and +1000 and then determine the root mean square (rms) of the three input numbers and display it. In this example, the context diagram (fig. 36.5) is simple to draw. The system accepts three integers from the user and returns the result to him. User
dataitems
rms
rms Calculator 0
Fig. 36.5 Context Diagram Example 2: Tic-Tac-Toe Computer Game Tic-tac-toe is a computer game in which a human player and the computer make alternative moves on a 3 3 square. A move consists of marking previously unmarked square. The player, who is first to place three consecutive marks along a straight line (i.e. along a row, column, or diagonal) on the square, wins. As soon as either of the human player or the computer wins, a message congratulating the winner should be displayed. If neither player manages to get three consecutive marks along a straight line, nor all the squares on the board are filled up, then the game is drawn. The computer always tries to win a game. The context diagram of this problem is shown in fig. 36.6.
display
Tic-Tac-Toe Software
Human Player
move
Context Diagram
Level 1 DFD: To develop the level 1 DFD, examine the high-level functional requirements. If there are between 3 to 7 high-level functional requirements, then these can be directly represented as bubbles in the level 1 DFD. We can then examine the input data to these functions, the data output by these functions, and represent them appropriately in the diagram. If a system has more than 7 high-level functional requirements, then some of the related requirements have to be combined and represented in the form of a bubble in the level 1 DFD. Such a bubble can be split in the lower DFD levels. If a system has less than three high-level functional requirements, then some of them need to be split into their sub-functions so that we have roughly about 5 to 7 bubbles on the diagram. Decomposition: Each bubble in the DFD represents a function performed by the system. The bubbles are decomposed into sub-functions at the successive levels of the DFD. Decomposition of a bubble is also known as factoring or exploding a bubble. Each bubble at any level of DFD is usually decomposed to anything between 3 to 7 bubbles. Too few bubbles at any level make that level superfluous. For example, if a bubble is decomposed to just one bubble or two bubbles, then this decomposition becomes redundant. Also, too many bubbles, i.e. more than 7 bubbles at any level of a DFD makes the DFD model hard to understand. Decomposition of a bubble should be carried on until a level is reached at which the function of the bubble can be described using a simple algorithm.
Numbering the Bubbles: It is necessary to number the different bubbles occurring in the DFD. These numbers help in uniquely identifying any bubble in the DFD from its bubble number. The bubble at the context level is usually assigned the number 0 to indicate that it is the 0 level DFD. Bubbles at level 1 are numbered, 0.1, 0.2, 0.3, etc. When a bubble numbered x is decomposed, its children bubble are numbered x.1, x.2, x.3, etc. In this numbering scheme, by looking at the number of a bubble, we can unambiguously determine its level, its ancestors and its successors. Example: Supermarket Prize Scheme A supermarket needs to develop the following software to encourage regular customers. For this, the customer needs to supply his/her residence address, telephone number and the driving license number. Each customer who registers for this scheme is assigned a unique customer number (CN) by the computer. A customer can present his CN to the check out staff when he makes any purchase. In this case, the value of his purchase is credited against his CN. At the end of each year, the supermarket intends to award surprise gifts to 10 customers who make the highest total purchase over the year. Also, it intends to award a 22 carat gold coin to every customer whose purchase exceeds Rs.10,000. The entries against the CN are the reset on the day of every year after the prize winners lists are generated.
Sales-clerk
Sales details
Winner-list
Manager
Customer Fig. 36.7 Context diagram for supermarket problem The context diagram for this problem is shown in fig. 36.7, the level 1 DFD in fig. 36.8, and the level 2 DFD in fig. 36.9.
Customer-data
Sales-info
Fig. 36.8 Level 1 diagram for supermarket problem Generate-winner-command Surprise-gift winner-list Gen-surprisegift-winner 0.2.1 Sales-info Find-totalsales 0.2.3
Fig. 36.9 Level 2 diagram for supermarket problem Version 2 EE IIT, Kharagpur 15
Data Dictionary for the DFD Model address: name + house# + street# + city + pin sales-details: {item + amount}* + CN CN: integer customer-data: {address + CN}* sales-info: {sales-details}* winner-list: surprise-gift-winner-list + gold-coin-winner-list surprise-gift-winner-list: {address + CN}* gold-coin-winner-list: {address + CN}* gen-winner-command: command total-sales: {CN + integer}*
entities interacting with the system should be represented only in the context diagram. The external entities should not appear at other levels of the DFD.
It is a common oversight to have either too less or too many bubbles in a DFD. Only 3
to 7 bubbles per diagram should be allowed, i.e. each bubble should be decomposed to between 3 and 7 bubbles.
Many beginners leave different levels of DFD unbalanced. A common mistake committed by many beginners while developing a DFD model is
attempting to represent control information in a DFD. It is important to realize that a DFD is the data flow representation of a system and it does not represent control information. The following examples represent some mistakes of this kind:
A book can be searched in the library catalogue by inputting its name. If the book is available in the library, then the details of the book are displayed. If the book is not listed in the catalogue, then an error message is generated. While generating the DFD model for this simple problem, many beginners commit the mistake of drawing an arrow (as shown in fig. 36.10) to indicate the error function is invoked after the search book. But, this is a control information and should not be shown on the DFD.
Searchbook Search-results
Another error is trying to represent when or in what order different functions (processes) are invoked and the conditions under which different functions are invoked. If a bubble A invokes either the bubble B or the bubble C depending upon some conditions, we need only to represent the data that flows between bubbles A and B or bubbles A and C and not the conditions depending on which the two modules are invoked.
A data store should be connected only to bubbles through data arrows. A data store
represented, i.e. the designer should not assume functionality of the system not specified by the SRS document and then try to represent them in the DFD.
Improper or unsatisfactory data dictionary. The data and function names must be intuitive. Some students and even practicing
engineers use symbolic data names such a, b, c, etc. Such names hinder understanding the DFD model.
performed by a bubble from its label. However, a short label may not capture the entire functionality of a bubble. For example, a bubble named find-book-position has only intuitive meaning and does not specify several things, e.g. what happens when some input information is missing or is incorrect. Further, the find-book-position bubble may not convey anything regarding what happens when the required book is missing.
Control aspects are not defined by a DFD. For instance, the order in which inputs are
consumed and outputs are produced by a bubble is not specified. A DFD model does not specify the order in which the different bubbles are executed. Representation of such aspects is very important for modeling real-time systems. Version 2 EE IIT, Kharagpur 17
The method of carrying out decomposition to arrive at the successive levels and the
ultimate level to which decomposition is carried out are highly subjective and depend on the choice and judgment of the analyst. Due to this reason, even for the same problem, several alternative DFD representations are possible. Further, many a times it is not possible to say which DFD representation is superior or preferable to another.
The data flow diagramming technique does not provide any specific guidance as to
how exactly to decompose a given function into its sub-functions and we have to use subjective judgment to carry out decomposition.
For this example, the context diagram was drawn earlier. To draw the level 1 DFD (fig. 36.11), from a cursory analysis of the problem description, we can see that there are four basic functions that the system needs to perform accept the input numbers from the user, validate the numbers, calculate the root mean square of the input numbers and, then display the result.
data-items
validateinput 0.1
valid-data
computerms 0.2
rms
displayresult 0.3
rms
Fig. 36.11 Level 1 DFD By observing the level 1 DFD, we identify the validate-input as the afferent branch, and write-output as the efferent branch, and the remaining (i.e. compute-rms) as the central transform. By applying the step 2 and step 3 of transform analysis, we get the structure chart shown in fig. 36.12.
main
rms
get-gooddata
data-items data-items
computerms
valid-data
write-result
read-input
validateinput
Fig. 36.12 Structure chart
root
customerregistration
registercustomer customerdetails
CN
totalsales
surprise gold totallist coin sales list totalsales gengoldcoinwinnerlist salesdetails getsalesdetails sales-details
gensurprisegift-list
recordsalesdetails
3. Exercises
1. Mark the following as True or False. Justify your answer. a. Coupling between two modules is nothing but a measure of the degree of dependence between them. b. The primary characteristic of a good design is low cohesion and high coupling. c. A module having high cohesion and low coupling is said to be functionally independent of other modules. d. The degree of coupling between two modules does not depend on their interface complexity. e. In the function-oriented design approach, the system state is decentralized and not shared among different functions. f. The essence of any good function-oriented design technique is to map the functions performing similar activities into a module. g. In the object-oriented design, the basic abstraction is real-world functions. h. An OOD (Object-Oriented Design) can be implemented using object-oriented languages only. i. A DFD model of a system represents the functions performed by the system and the data flow taking place among these functions. j. A data dictionary lists all data items appearing in the DFD model of a system but does not capture the composition relationship among the data. k. The context diagram of a system represents it using more than one bubble. l. A DFD captures the order in which the processes (bubbles) operate. m. There should be at the most one control relationship between any two modules in a properly designed structure chart. For the following, mark all options which are true. a. The desirable characteristics that every good software design need are Correctness Understandability Efficiency Maintainability All of the above b. A module is said to have logical cohesion, if it performs a set of tasks that relate to each other very loosely. all the functions of the module are executed within the same time span. all elements of the module perform similar operations, e.g. error handling, data input, data output, etc. None of the above. c. High coupling among modules makes it difficult to understand and maintain the product difficult to implement and debug expensive to develop the product as the modules having high coupling cannot be developed independently all of the above d. The desirable characteristics that every good software design need are error isolation scope of reuse Version 2 EE IIT, Kharagpur 22
2.
e.
f.
g.
h.
i.
j.
k.
l.
m.
understandability all of the above The purpose of structured analysis is to capture the detailed structure of the system as perceived by the user to define the structure of the solution that is suitable for implementation in some programming language all of the above Structured analysis technique is based on top-down decomposition approach bottom-up approach divide and conquer principle none of the above Data Flow Diagram (DFD) is also known as a: structure chart bubble chart Gantt chart PERT chart The context diagram of a DFD is also known as level 0 DFD level 1 DFD level 2 DFD none of the above Decomposition of a bubble is also known as classification factoring exploding aggregation Decomposition of a bubble should be carried on till the atomic program instructions are reached up to two levels until a level is reached at which the function of the bubble can be described using a simple algorithm none of the above The bubbles in a level-1 DFD represent exactly one high-level functional requirement described in SRS document more than one high-level functional requirement part of a high-level functional requirement any of the above depending on the problem By looking at the structure chart, we can say whether a module calls another module just once or many times not say whether a module calls another module just once or many times tell the order in which the different modules are invoked not tell the order in which the different modules are invoked In which of the following ways does a structure chart differ from a flow chart? it is always difficult to identify the different modules of the software from its flow chart representation Version 2 EE IIT, Kharagpur 23
data interchange among different modules is not presented in a flow chart sequential ordering of tasks inherent in a flow chart is suppressed in a structure chart none of the above n. The input portion in the DFD that transforms input data from physical to logical form is called central transform efferent branch afferent branch none of the above o. If during structured design, you observe that the data entering a DFD are incident on different bubbles, then you would use: transform analysis transaction analysis combination of transform and transaction analysis neither transform nor transaction analysis p. During detailed design, which of the following activities take place? the pseudo code for the different modules of the structure chart are developed in the form of MSPECs data structures are designed for the different modules of the structure chart module structure is designed none of the above State the major design activities. Identify separately, the activities undertaken during highlevel design and detailed design. Why is functional independence of a module a key factor for a good software design? What the salient features of a function-oriented design approach and object-oriented design approach. Differentiate between both these approaches. Identify the aim of the structured analysis activity. Which documents are produced at the end of structured analysis activity? Identify the necessity of constructing DFDs in the context of a good software design. Write down the importance of data dictionary in the context of good software design. Explain the term balancing a DFD with an example Discuss the essential activities required to develop the DFD of a system more systematically. What do you understand by top-down decomposition in the context of structured analysis? Explain with a suitable example. Identify the common errors made during construction of a DFD model. Identify the shortcomings of the DFD model. Differentiate between a structure chart and a flow chart. Explain transform analysis with a suitable example. Explain transaction analysis with an example.
Module 7
Software Engineering Issues
Version 2 EE IIT, Kharagpur 1
Lesson 37
Software Design Part 2
Version 2 EE IIT, Kharagpur 2
m1 m2
Data
m5 m6
Object
mi are methods of the object m 3m 4 Fig. 37.1 A Model of an object Version 2 EE IIT, Kharagpur 4
The data internal to an object are called the attributes of the object, and the functions supported by an object are called its methods. Fig. 37.2 shows LibraryMember class with eight attributes and five methods.
1.3. Inheritance
The inheritance feature allows us to define a new class by extending or modifying an existing class. The original class is called the base class (or super class) and the new class obtained through inheritance is called as the derived class (or sub class). A base class is a generalization of its derived classes. This means that the base class contains only those properties that are common to all the derived classes. Again each derived class is a specialization of its base class because it Version 2 EE IIT, Kharagpur 5
modifies or extends the basic properties of the base class in certain ways. Thus, the inheritance relationship can be viewed as a generalization-specialization relationship. Using the inheritance relationship, different classes can be arranged in a class hierarchy (or class tree). In addition to inheriting all properties of the base class, a derived class can define new properties. That is, it can define new data and methods. It can even give new definitions to methods which already exist in the base class. Redefinition of methods which existed in the base class is called as method overriding. In fig. 37.3, LibraryMember is the base class for the derived classes Faculty, Student, and Staff. Similarly, Student is the base class for the derived classes Undergraduate, Postgraduate, and Research. Each derived class inherits all the data and methods of the base class. It can also define additional data and methods or modify some of the inherited data and methods. The different classes in a library automation system and the inheritance relationship among them are shown in the fig. 37.3. The inheritance relationship has been represented in fig. 37.3 using a directed arrow drawn from a derived class to its base class. In fig. 37.3, the LibraryMember base class might define the data for name, address, and library membership number for each member. Though Faculty, Student, and Staff classes inherit these data, they might have to redefine the respective issue-book methods because the number of books that can be borrowed and the duration of loan may be different for the different category of library members. Thus, the issue-book method is overridden by each of the derived classes and the derived classes might define additional data max-number-books and max-duration-ofissue which may vary for the different member categories. Library Member Bass class
Faculty
Student
Staff
Derived Classes
Under Graduate
Post Graduate
Research
classes separately, these methods and data are defined only once in the base class and are inherited by each of its subclasses. For example, in the Library Information System example of fig. 37.3, each category of member objects Faculty, Student, and Staff need the data membername, member-address, and membership-number and therefore these data are defined in the base class LibraryMember and inherited by its subclasses. Another advantage of the inheritance mechanism is the conceptual simplification that comes from reducing the number of independent features of the classes.
Faculty
Student
Staff
Derived Classes
Under Graduate
Post Graduate
Research
Multiple Inheritance
1.4. Encapsulation
The property of an object by which it interfaces with the outside world only through messages is referred to as encapsulation. The data of an object are encapsulated within its methods and are available only through message-based communication. This concept is schematically represented in fig. 37.5.
m3
m4
m2
Data
m5
m1 Methods
m6
includes protection from unauthorized access and protection from different types of problems that arise from concurrent access of data such as deadlock and inconsistent values.
Encapsulation hides the internal structure of an object so that interaction with the
object is simple and standardized. This facilitates reuse of objects across different projects. Furthermore, if the internal structure or procedures of an object are modified, other objects are not affected. This results in easy maintenance.
Since objects communicate among each other using messages only, they are weakly
coupled. The fact that objects are inherently weakly coupled enhances understanding of design since each object can be studied and understood almost in isolation from other objects.
1.5. Polymorphism
Polymorphism literally means polymorphism denotes the following:
poly
(many) morphs
(forms).
Broadly speaking,
The same message can result in different actions when received by different objects. This is also referred to as static binding. This occurs when multiple methods with the same operation name exist. When we have an inheritance hierarchy, an object can be assigned to another object of its ancestor class. When such an assignment occurs, a method call to the ancestor object would result in the invocation of the appropriate method of the object of the derived class. The exact method to which a method call would be bound cannot be known at compile time, and is dynamically decided at the runtime. This is also known as dynamic binding.
Shape
Circle
Rectangle
Line
Ellipse
Square
Object-oriented Code
shape.draw();
Fig. 37.8 Traditional code and object-oriented code using dynamic binding
new if-then-else clause. However, in case of the object-oriented program, the code need not change. Only a new class called Ellipse has to be defined.
Fig. 37.9 Different types of diagrams and views supported in UML Structural view: The structural view defines the kinds of objects (classes) important to the understanding of the working of a system and to its implementation. It also captures the relationships among the classes (objects). The structural model is also called the static model, since the structure of a system does not change with time. Behavioral view: The behavioural view captures how objects interact with each other to realize the system behaviour. The system behaviour captures the time-dependent (dynamic) behaviour of the system. Implementation view: This view captures the important components of the system and their dependencies. Version 2 EE IIT, Kharagpur 12
Environmental view: This view models how the different components are implemented on different pieces of hardware.
the communication relationship. It indicates that the actor makes use of the functionality provided by the use case. Both the human users and the external systems can be represented by stick person icons. When a stick person icon represents an external system, it is annotated by the stereotype <<external system>>. Example The use case model for the Tic-Tac-Toe problem is shown in fig. 37.10. This software has only one use case play move. Note that the use case get-user-move is not used here. The name get-user-move would be inappropriate because the use cases should be named from the users perspective.
Play move
Player Tic-tac-toe game Fig. 37.10 Use case model for tic-tac-toe game Text Description Each ellipse on the use case diagram should be accompanied by a text description. The text description should define the details of the interaction between the user and the computer and other aspects of the use case. It should include all the behaviour associated with the use case in terms of the mainline sequence, different variations to the normal behaviour, the system responses associated with the use case, the exceptional conditions that may occur in the behaviour, etc. The behaviour description is often written in a conversational style describing the interactions between the actor and the system. The text description may be informal, but some structuring is recommended. The following are some of the information which may be included in a use case text description in addition to the mainline sequence, and the alternative scenarios. Contact persons: This section lists personnel of the client organization with whom the use case was discussed, date and time of the meeting, etc. Actors: In addition to identifying the actors, some information about actors using this use case which may help the implementation of the use case may be recorded. Pre-condition: The preconditions would describe the state of the system before the use case execution starts. Post-condition: This captures the state of the system after the use case has successfully completed. Non-functional requirements: This could contain the important constraints for the design and implementation, such as platform and environment conditions, qualitative statements, response time requirements, etc. Exceptions, error situations: This contains only the domain-related errors such as lack of users access rights, invalid entry in the input fields, etc. Obviously, errors that are not domain related, such as software errors, need not be discussed here. Sample dialogs: These serve as examples illustrating the use case. Version 2 EE IIT, Kharagpur 14
Specific user interface requirements: These contain specific requirements for the user interface of the use case. For example, it may contain forms to be used, screen shots, interaction style, etc. Document references: This part contains references to specific domain-related documents which may be useful to understand the system operation.
Fig. 37.12 Representation of use case inclusion Includes The includes relationship in the older versions of UML (prior to UML 1.1) was known as the uses relationship. The includes relationship involves one use case including the behaviour of another use case in its sequence of events and actions. The includes relationship occurs when a chunk of behaviour is similar across a number of use cases. The factoring of such behaviour will help in not repeating the specification and implementation across different use cases. Thus, the includes relationship explores the issue of reuse by factoring out the commonality across use cases. It can also be gainfully employed to decompose a large and complex use cases into more manageable parts. As shown in fig. 37.12, the includes relationship is represented using a predefined stereotype <<include>>. In the includes relationship, a base use case compulsorily and automatically includes the behaviour of the common use cases. As shown in example fig. 37.13, issue-book and renew-book both include check-reservation use case. The base use case may include several use cases. In such cases, it may interleave their associated common use cases together. The common use case becomes a separate use case and the independent text description should be provided for it.
Issue Book
Renew Book
Fig. 37.13 Example use case inclusion Extends The main idea behind the extends relationship among the use cases is that it allows you to show optional system behaviour. An optional system behaviour is extended only under certain conditions. This relationship among use cases is also predefined as a stereotype as shown in fig.37.14. The extends relationship is similar to generalization. But unlike generalization, the extending use case can add additional behaviour only at an extension point only when certain conditions are satisfied. The extension points are points within the use case where variation to the mainline (normal) action sequence may occur. The extends relationship is normally used to capture alternate paths or scenarios. << extends >>
Fig. 37.14 Representation of use case extension Organization of Use Cases When the use cases are factored, they are organized hierarchically. The high-level use cases are refined into a set of smaller and more refined use cases as shown in fig. 37.15. Top-level use cases are super-ordinate to the refined use cases. The refined use cases are sub-ordinate to the top-level use cases. Note that only the complex use cases should be decomposed and organized in a hierarchy. It is not necessary to decompose simple use cases. The functionality of the super-ordinate use cases is traceable to their sub-ordinate use cases. Thus, the functionality provided by the super-ordinate use cases is composite of the functionality of the sub-ordinate use cases. In the highest level of the use case model, only the fundamental use cases are shown. The focus is on the application context. Therefore, this level is also referred to as the context diagram. In the context diagram, the system limits are emphasized. The toplevel diagram contains only those use cases with which the external users of the system interact. The subsystem-level use cases specify the services offered by the subsystems to the other subsystems. Any number of levels involving the subsystems may be utilized. In the
lowest level of the use case hierarchy, the class-level use cases specify the functional fragments or operations offered by the classes.
use case 3
External users
2.4.1. Association
Associations are needed to enable objects to communicate with each other. An association describes a connection between classes. The association relation between two objects is called object connection or link. Links are instances of associations. A link is a physical or conceptual connection between object instances. For example, suppose Amit has borrowed the book Graph Theory. Here, borrowed is the connection between the objects Amit and Graph Theory book. Mathematically, a link can be considered to be a tuple, i.e. an ordered list of object instances. An association describes a group of links with a common structure and common semantics. For example, consider the statement that Library Member borrows Books. Here, borrows is the association between the class LibraryMember and the class Book. Usually, an association is a binary relation (between two classes). However, three or more different classes can be involved in an association. A class can have an association relationship with itself (called recursive association). In this case, it is usually assumed that two different objects of the class are linked by the association relationship. Association between two classes is represented by drawing a straight line between the concerned classes. Fig. 37.16 illustrates the graphical representation of the association relation. The name of the association is written along side the association line. An arrowhead may be placed on the association line to indicate the reading direction of the association. The arrowhead should not be misunderstood to be indicating the direction of a pointer implementing an association. On each side of the association relation, the multiplicity is noted as an individual number or as a value range. The multiplicity indicates how many instances of one class are associated with each other. Value ranges of multiplicity are noted by specifying the minimum and maximum value, separated by two dots, e.g. 1..5. An asterisk is a wild card and means many (zero or more). The association of fig. 37.16 should be read as Many books may be borrowed by a Library Member. Observe that associations (and links) appear as verbs in the problem statement. 1 Library Member borrowed by *
Book
Fig. 37.16 Association between two classes Associations are usually realized by assigning appropriate reference attributes to the classes involved. Thus, associations can be implemented using pointers from one object class to another. Links and associations can also be implemented by using a separate class that stores which objects of a class are linked to which objects of another class. Some CASE tools use the role names of the association relation for the corresponding automatically generated attribute.
2.4.2. Aggregation
Aggregation is a special type of association where the involved classes represent a whole-part relationship. The aggregate takes the responsibility of forwarding messages to the appropriate parts. Thus, the aggregate takes the responsibility of delegation and leadership. When an instance of one object contains instances of some other objects, then aggregation (or composition) relationship exists between the composite object and the component object. Aggregation is
represented by the diamond symbol at the composite end of a relationship. The number of instances of the component class aggregated can also be shown as in fig. 37.17 (a). 1 * 1 *
Document
Paragraph
Line
Fig. 37.17(a) Representation of aggregation The aggregation relationship cannot be reflexive (i.e. recursive). That is, an object cannot contain objects of the same class as itself. Also, the aggregation relation is not symmetric. That is, two classes A and B cannot contain instances of each other. However, the aggregation relationship can be transitive. In this case, aggregation may consist of an arbitrary number of levels.
2.4.3. Composition
Composition is a stricter form of aggregation, in which the parts are existence-dependent on the whole. This means that the life of the parts are closely tied to the life of the whole. When the whole is created, the parts are created and when the whole is destroyed, the parts are destroyed. A typical example of composition is an invoice object with invoice items. As soon as the invoice object is created, all the invoice items in it are created and as soon as the invoice object is destroyed, all invoice items in it are also destroyed. The composition relationship is represented as a filled diamond drawn at the composite-end. An example of the composition relationship is shown in fig. 37.17 (b). 1 *
Order
Item
chart as boxes attached to a vertical dashed line. Inside the box, the name of the object is written with a colon separating it from the name of the class, and both the name of the object and class are underlined. The objects appearing at the top signify that the object already existed when the use case execution was initiated. However, if some object is created during the execution of the use case and participates in the interaction (e.g. a method call), then the object should be shown at the appropriate place on the diagram where it is created. The vertical dashed line is called the objects lifeline. The lifeline indicates the existence of the object at any particular point of time. The rectangle drawn on the lifetime is called the activation symbol and indicates that the object is active as long as the rectangle exists. Each message is indicated as an arrow between the lifelines of two objects. The messages are shown in chronological order from the top to the bottom. That is, reading the diagram from the top to the bottom would show the sequence in which the messages occur. Each message is labeled with the message name. Some control information can also be included. Two types of control information are particularly valuable. A condition (e.g. [invalid]) indicates that a message is sent, only if the condition is true. An iteration marker shows the message is sent many times to multiple receiver objects as would happen when a collection or the elements of an array are being iterated. The basis of the iteration can also be indicated e.g. [for every book object].
Library Book Renewal controller Library Book Register
Library Boundary
Book
Library Member
renewBook
display Borrowing
findMemberBorrowing
confirm
updateMemberBorrowing
Fig. 37.18 Sequence diagram for the renew book use case
The sequence diagram for the book renewal use case for the Library Automation Software is shown in fig. 37.18. The development of the sequence diagram in the development methodology would help us in determining the responsibilities of the different classes; i.e. what methods should be supported by each class.
6: *find 9: update
Book
Fig. 37.19 Collaboration diagram for the renew book use case
a state with an internal action and one or more outgoing transitions which automatically follow the termination of the internal activity. If an activity has more than one outgoing transition, then these must be identified through conditions. An interesting feature of the activity diagrams is the swim lanes. Swim lanes enable you to group activities based on who is performing them, e.g. academic department vs. hostel office. Thus swim lanes subdivide activities based on the responsibilities of some components. The activities in a swim lane can be assigned to some model elements, e.g. classes or some component, etc. Activity diagrams are normally employed in business process modelling. This is carried out during the initial stages of requirements analysis and specification. Activity diagrams can be very useful to understand complex processing activities involving many components. Later these diagrams can be used to develop interaction diagrams which help to allocate activities (responsibilities) to classes.
Academic Section check student records Accounts Section Hostel Office Hospital Department
receive fees
allot hostel
receive fees
allot room
Fig. 37.20 Activity diagram for student admission procedure at IIT The student admission process in IIT is shown as an activity diagram in fig. 37.20. This shows the part played by different components of the Institute in the admission procedure. After the fees are received at the account section, parallel activities start at the hostel office, hospital, and the Department. After all these activities are completed (this synchronization is represented as a horizontal line), the identity card can be issued to a student by the Academic section. Version 2 EE IIT, Kharagpur 23
Rejected order
Accepted order
[some items not available] processed [all items available] processed/deliver all items available Pending order newsupply Fulfilled order
Fig. 37.21 State chart diagram for an order object The basic elements of the state chart diagram are as follows: Initial state. This is represented as a filled circle. Final state. This is represented by a filled circle inside a larger circle. Version 2 EE IIT, Kharagpur 24
State. These are represented by rectangles with rounded corners. Transition. A transition is shown as an arrow between two states. Normally, the name
of the event which causes the transition is placed along side the arrow. A guard to the transition can also be assigned. A guard is a Boolean logic condition. The transition can take place only if the grade evaluates to true. The syntax for the label of the transition is shown in 3 parts: event[guard]/action. An example state chart for the order object of the Trade House Automation software is shown in fig. 37.21.
The problem The context in which the problem occurs The solution The context within which the solution works
Solution: Assign responsibility to the information expert the class that has the information necessary to fulfill the required responsibility. The expert pattern expresses the common intuition that objects do things related to the information they have. The class diagram and collaboration diagrams for this solution to the problem of which class should compute the total sales is shown in the fig. 37.1.1. Sale Transaction Saleltem (a) 1: total 2: subtotal 3: price ItemSpecification
SaleTransaction
Saleltem
ItemSpecification
(b) Fig. 37.22 Expert pattern: (a) Class diagram (b) Collaboration diagram
Creator Pattern
Problem: Which class should be responsible for creating a new instance of some class? Solution: Assign a class C1 the responsibility to create an instance of class C2, if one or more of the following are true: C1 is an aggregation of objects of type C2 C1 contains objects of type C2 C1 closely uses objects of type C2 C1 has the data that would be required to initialize the objects of type C2, when they are created
Controller Pattern
Problem: Who should be responsible for handling the actor requests? Solution: For every use case, there should be a separate controller object which would be responsible for handling requests from the actor. Also, the same controller should be used for all the actor requests pertaining to one use case so that it becomes possible to maintain the necessary information about the state of the use case. The state information maintained by a controller can be used to identify the out-of-sequence actor requests, e.g. whether voucher request is received before arrange payment request. Model View Separation Pattern Problem: How should the non-GUI classes communicate with the GUI classes? Context in which the problem occurs: This is a very commonly occurring pattern which is found in almost every problem. Here, model is a synonym for the domain layer objects, view is a synonym for the presentation layer objects such as the GUI objects.
Solution: The model view separation pattern states that model objects should not have direct knowledge (or be directly coupled) of the view objects. This means that there should not be any direct calls from other objects to the GUI objects. This results in a good solution, because the GUI classes are related to a particular application whereas the other classes may be reused. There are actually two solutions to this problem which work in different circumstances. These are as follows: Solution 1: Polling or Pull from above It is the responsibility of a GUI object to ask for the relevant information from the other objects, i.e. the GUI objects pull the necessary information from the other objects whenever required. This model is frequently used. However, it is inefficient for certain applications. For example, simulation applications which require visualization, the GUI objects would not know when the necessary information becomes available. Other examples are, monitoring applications such as network monitoring, stock market quotes, and so on. In these situations, a push-from-below model of display update is required. Since push-from-below is not an acceptable solution, an indirect mode of communication from the other objects to the GUI objects is required. Solution 2: Publish- subscribe pattern An event notification system is implemented through which the publisher can indirectly notify the subscribers as soon as the necessary information becomes available. An event manager class can be defined as one which keeps track of the subscribers and the types of events they are interested in. An event is published by the publisher by sending a message to the event manager object. The event manager notifies all registered subscribers usually via a parameterized message (called a callback). Some languages specifically support event manager classes. For example, Java provides the EventListener interface for such purposes.
they normally do not include any processing logic. However, they may be responsible for validating inputs, formatting, outputs, etc. The boundary objects were earlier being called the interface objects. However, the term interface class is being used for Java, COM/DCOM, and UML with different meaning. A recommendation for the initial identification of the boundary classes is to define one boundary class per actor/use case pair.
3.2.4. Example
Lets consider the query book availability use case of the Library Information System (LIS). Realization of the use case involves only matching the given book name against the books available in the catalog. More complex use cases may require more than one controller object to realize the use case. A complex use case can have several controller objects such as transaction manager, resource coordinator, and error handler. There is another situation where a use case can have more than one controller object. Sometimes the use cases require the controller object to transit through a number of states. In such cases, one controller object might have to be created for each execution of the use case.
Boundary 1
Controller
Boundary 2
Entity 1
Entity 2
Entity 3
Fig. 37.23 A typical realization of a use case through the collaboration of boundary, controller, and entity objects
single method, because an object having only a single data element or method is usually implemented as a part of another object. Common operations: A set of operations can be defined for potential objects. If these operations apply to all occurrences of the object, then a class can be defined. An attribute or operation defined for a class must apply to each instance of the class. If some of the attributes or operations apply only to some specific instances of the class, then one or more subclasses can be needed for these special objects. Normally, the actors themselves and the interactions among themselves should be excluded from the entity identification exercise. However, some times there is a need to maintain information about an actor within the system. This is not the same as modeling the actor. These classes are sometimes called surrogates. For example, in the Library Information System (LIS) we would need to store information about each library member. This is independent of the fact that the library member also plays the role of an actor of the system. Although the grammatical approach is simple and intuitively appealing, yet through a naive use of the approach, it is very difficult to achieve high quality results. In particular, it is very difficult to come up with useful abstractions simply by doing grammatical analysis of the problem description. Useful abstractions usually result from clever factoring of the problem description into independent and intuitively correct elements.
Class diagram is shown in fig. 37.24. The messages of the sequence diagram have
PlayMoveBoundary
PlayMoveController (b)
Board
Fig. 37.24 (a) Initial domain model (b) Refined domain model
Controller
announceInvalidMove announceResult
:playMove Boundary
:playMove Controller
:Board
Move
acceptMove
Fig. 37.26 Sequence diagram for the play move use case
4. Exercises
1. Mark the following as True or False. Justify your answer. a. All software engineering principles are backed by either scientific basis or theoretical proof. b. Data abstraction helps in easy code maintenance and code reuse. c. Classes can be considered equivalent to Abstract Data Types (ADTs). d. The inheritance relationship describes has a relationship among classes. e. Inheritance feature of the object oriented paradigm helps in code reuse. f. An important advantage of polymorphism is facilitation of reuse. g. Using dynamic binding a programmer can send a generic message to a set of objects which may be of different types i.e. belonging to different classes. h. In dynamic binding, address of an invoked method is known only at the compile time i. For any given problem, one should construct all the views using all the diagrams provided by UML. j. Use cases are explicitly dependent among themselves. Version 2 EE IIT, Kharagpur 32
k. l.
2.
Each actor can participate in one and only one use case. Class diagrams developed using UML can serve as the functional specification of a system. m. The terms method and operation are equivalent concepts and can be used interchangeably. n. The aggregation relationship can be recursively defined, i.e. an object can contain instances of itself. o. In a UML class diagram, the aggregation relationship defines an equivalence relationship among objects. p. The aggregation relationship can be considered to be a special type of association relationship. q. Normally, you use an interaction diagram to represent how the behaviour of an object changes over its life time. r. The interaction diagrams can be effectively used to describe how the behaviour of an object changes across several use cases. s. A state chart diagram is good at describing behaviour that involves multiple objects cooperating with each other to achieve some behaviour. t. Facade pattern tells how non-GUI classes should communicate with the GUI classes. u. The use cases should be tightly tied to the GUI. v. The responsibilities assigned to a controller object are closely related to the realization of a specific use case. w. There is a one-to-one correspondence between the classes of the domain model and the final class diagram. x. A large number of message exchanges between objects indicates good delegation and is a sure sign of a design well-done. y. Deep class hierarchies are the hallmark of any good OOD. z. Cohesiveness of the data and methods within a class is a sign of good OOD. For the following, mark all options which are true. a. In the object-oriented approach, each object essentially consists of some data that are private to the object a set of functions (or operations) that operate on those data the set of methods it provides to the other objects for accessing and manipulating the data none of the above b. Redefinition of methods in a derived class which existed in the base class is called function overloading operator overloading method overriding none of the above c. The mechanism by which a subclass inherits attributes and methods from more than one base class is called single inheritance multiple inheritance multi-level inheritance hierarchical inheritance d. In the object-oriented approach, the same message can result in different actions when received by different objects. This feature is referred to as static binding Version 2 EE IIT, Kharagpur 33
e.
f.
g.
h.
i.
j.
k.
l.
dynamic binding genericity overloading UML is a language to model syntax an object-oriented development methodology an automatic code generation tool none of the above In the context of use case diagram, the stick person icon is used to represent human users external systems internal systems none of the above The design pattern solutions are typically described in terms of class diagrams object diagrams interaction diagrams both class and interaction diagrams The class that should be responsible for doing certain things for which it has the necessary information is the solution proposed by creator pattern controller pattern expert pattern facade pattern The class that should be responsible for creating a new instance of some class is the solution proposed by creator pattern controller pattern expert pattern facade pattern The objects identified during domain analysis can be classified into boundary objects controller objects entity objects all of the above The most critical part of the domain modelling activity is to identify controller objects boundary objects entity objects none of the above The objects which effectively decouple the boundary and entity objects from one another making the system tolerant to changes of the user interface and processing logic are controller objects boundary objects entity objects Version 2 EE IIT, Kharagpur 34
none of the above What is the basic difference between a class and its object? Also, identify the basic difference between methods and messages. Explain what you understand by data abstraction. Identify its advantages. Explain the different types of inheritance with examples. Identify the advantages of inheritance. Explain encapsulation in the context of OO programming. State the advantages of encapsulation. Identify the differences between static binding and dynamic binding. What are the advantages of dynamic binding? Explain the advantages of object-oriented design. Explain the need of a model in the context of software development. Describe the different types of views of a system captured by UML diagrams. What is the purpose of a use case? What is the necessity for developing use case diagram? Which diagrams in UML capture the behavioural view of the system? Which UML diagrams capture the structural aspects of a system? Which UML diagrams capture the important components of the system and their dependencies? Represent the following relations among classes using UML diagram. a. Students credit 5 courses each semester. Each course is taught by one or more teachers. b. Bill contains a number of items. Each item describes some commodity, the price of unit, and total price. c. An order consists of one or more order items. Each order item contains the name of the item, its quantity and the date by which it is required. Each order item is described by an item type specification object having details such as its vendor addresses, its unit price, and the manufacturer. How should you identify use cases of a system? What is the difference between an operation and a method in the context of OOD technique? What does the association relationship among classes represent? Give examples of the association relationship. What does aggregation relationship between classes represent? Give examples of aggregation relationship between classes. Why are objects always passed by reference in all popular programming languages? What are design patterns? What are the advantages of using design patterns? Write down some popular design patterns and their necessities. Give an outline of object-oriented development process. What is meant by domain modelling? Differentiate the different types of objects that are identified during domain analysis.
Module 8
Testing of Embedded System
Version 2 EE IIT, Kharagpur 1
Lesson 38
Testing Embedded Systems
Version 2 EE IIT, Kharagpur 2
Instructional Objectives
After going through this lesson the student would be able to Distinguish between the terms testing and verification Describe the common types of faults that occur in embedded systems Explain the various types of models that are used to represent the faults Describe the methodology of testing systems with embedded cores Distinguish among terms like DFT, BIST and on-line testing Explain the need and mechanism of Automatic Test Pattern Generation in the context of testing embedded hard-ware software systems
What is testing?
Testing is an organized process to verify the behavior, performance, and reliability of a device or system against designed specifications. It ensures a device or system to be as defect-free as possible. Expected behavior, performance, and reliability must be both formally described and measurable.
Test application performed on every manufactured device. Responsible for quality of devices.
Real-Time System
Most, if not all, embedded systems are "real-time". The terms "real-time" and "embedded" are often used interchangeably. A real-time system is one in which the correctness of a computation not only depends on its logical correctness, but also on the time at which the result is produced. In hard real time systems if the timing constraints of the system are not met, system crash could be the consequence. For example, in mission-critical application where failure is not an option, time deadlines must be followed. In case of soft real time systems no catastrophe will occur if deadline fails and the time limits are negotiable.
In spite of the progress of hardware/software codesign, hardware and software in embedded system are usually considered separately in the design process. There is a strong interaction between hardware and software in their failure mechanisms and diagnosis, as in other aspects of system performance. System failures often involve defects in both hardware and software. Software does not break in the traditional sense, however it can perform inappropriately due to faults in the underlying hardware, as well as specification or design flaws in either the hardware or the software. At the same time, the software can be exploited to test for and respond to the presence of faults in the underlying hardware. It is necessary to understand the importance of the testing of embedded system, as its functions have been complicated. However the studies related to embedded system test are not adequate.
2.
Test methodologies and test goals differ in the hardware and software domains. Embedded software development uses specialized compilers and development software that offer means for debugging. Developers build application software on more powerful computers and eventually test the application in the target processing environment. Version 2 EE IIT, Kharagpur 4
In contrast, hardware testing is concerned mainly with functional verification and self-test after chip is manufactured. Hardware developers use tools to simulate the correct behavior of circuit models. Vendors design chips for self-test which mainly ensures proper operation of circuit models after their implementation. Test engineers who are not the original hardware developers test the integrated system. This conventional, divided approach to software and hardware development does not address the embedded system as a whole during the system design process. It instead focuses on these two critical issues of testing separately. New problems arise when developers integrate the components from these different domains. In theory, unsatisfactory performance of the system under test should lead to a redesign. In practice, a redesign is rarely feasible because of the cost and delay involved in another complete design iteration. A common engineering practice is to compensate for problems within the integrated system prototype by using software patches. These changes can unintentionally affect the behavior of other parts in the computing system. At a higher abstraction level, executable specification languages provide an excellent means to assess embedded-systems designs. Developers can then test system-level prototypes with either formal verification techniques or simulation. A current shortcoming of many approaches is, however, that the transition from testing at the system level to testing at the implementation level is largely ad hoc. To date, system testing at the implementation level has received attention in the research community only as coverification, which simulates both hardware and software components conjointly. Coverification runs simulations of specifications on powerful computer systems. Commercially available coverification tools link hardware simulators and software debuggers in the implementation phase of the design process. Since embedded systems are frequently employed in mobile products, they are exposed to vibration and other environmental stresses that can cause them to fail. Some embedded systems, such as those in automotive applications, are exposed to extremely harsh environments. These applications are preparing embedded systems to meet new and more stringent requirements of safety and reliability is a significant challenge for designers. Critical applications and applications with high availability requirements are the main candidates for on-line testing.
3.
Incorrectness in hardware systems may be described in different terms as defect, error and faults. These three terms are quite bit confusing. We will define these terms as follows [1]: Defect: A defect in a hardware system is the unintended difference between the implemented hardware and its intended design. This may be a process defects, material defects, age defects or package effects. Error: A wrong output signal produced by a defective system is called an error. An error is an effect whose cause is some defect. Errors induce failures, that is, a deviation from appropriate system behavior. If the failure can lead to an accident, it is a hazard. Fault: A representation of a defect at the abstraction level is called a fault. Faults are physical or logical defects in the design or implementation of a device.
1 1
AND
0 0
OR
0(1) Stuck-at-1
Fig. 38.1 An example of a stuck-at fault Bridging faults: These are due to a short between a group of signal. The logic value of the shorted net may be modeled as 1-dominant (OR bridge), 0-dominant (AND bridge), or intermediate, depending upon the technology in which the circuit is implemented. Stuck-Open and Stuck-Short faults: MOS transistor is considered as an ideal switch and two types of faults are modeled. In stuck-open fault a single transistor is permanently stuck in the open state and in stuck-short fault a single transistor is permanently shorted irrespective of its gate voltage. These are caused by bad connection of signal line. Power disturbance faults: These are caused by inconsistent power supplies and affect the whole system. Spurious current faults: that exposed to heavy ion affect whole system. Operational faults are usually classified according to their duration: Permanent faults exist indefinitely if no corrective action is taken. These are mainly manufacturing faults and are not frequently occur due to change in system operation or environmental disturbances. Intermittent faults appear, disappear, and reappear frequently. They are difficult to predict, but their effects are highly correlated. Most of these faults are due to marginal design or manufacturing steps. These faults occur under a typical environmental disturbance. Transient faults appear for an instant and disappear quickly. These are not correlated with each other. These are occurred due random environmental disturbances. Power disturbance faults and spurious current faults are transient faults.
currently applied to hardware-software designs have their origins in either the hardware [9] or the software [10] domains.
4.
The system-on-chip test is a single composite test comprised of the individual core tests of each core, the UDL tests, and interconnect tests. Each individual core or UDL test may involve surrounding components. Certain operational constraints (e.g., safe mode, low power mode, bypass mode) are often required which necessitates access and isolation modes. In a core-based system-on-chip [5], the system integrator designs the User Defined Logic (UDL) and assembles the pre-designed cores provided by the core vendor. A core is typically hardware description of standard IC e.g., DSP, RISC processor, or DRAM core. Embedded cores represent intellectual property (IP) and in order to protect IP, core vendors do not release the detailed structural information to the system integrator. Instead a set of test pattern is provided by the core vendor that guarantees a specific fault coverage. Though the cores are tested as part of overall system performance by the system integrator, the system integrator deals the core as a black box. These test patterns must be applied to the cores in a given order, using a specific clock strategy. The core internal test developed by a core provider need to be adequately described, ported and ready for plug and play, i.e., for interoperability, with the system chip test. For an internal test to accompany its corresponding core and be interoperable, it needs to be described in an commonly accepted, i.e., standard, format. Such a standard format is currently being developed by IEEE PI 500 and referred to as standardization of a core test description language [22]. In SOCs cores are often embedded in several layers of user-defined or other core-based logic, and direct physical access to its peripheries is not available from chip I/Os. Hence, an electronic access mechanism is needed. This access mechanism requires additional logic, such as a wrapper around the core and wiring, such as a test access mechanism to connect core peripheries to the test sources and sinks. The wrapper performs switching between normal mode Version 2 EE IIT, Kharagpur 9
and the test mode(s) and the wiring is meant to connect the wrapper which surrounds the core to the test source and sink. The wrapper can also be utilized for core isolation. Typically, a core needs to be isolated from its surroundings in certain test modes. Core isolation is often required on the input side, the output side, or both. source test access mechnism
sink
Fig. 38. 2 Overview of the three elements in an embedded-core test approach: (1) test pattern source, (2) test access mechanism, and (3) core test wrapper [5]. A conceptual architecture for testing embedded-core-based SOCs is shown in Figure 38.2 It consists of three structural elements:
5.
On-Line Testing
On-line testing addresses the detection of operational faults, and is found in computers that support critical or high-availability applications [23]. The goal of on-line testing is to detect fault effects, that is, errors, and take appropriate corrective action. On-line testing can be performed by external or internal monitoring, using either hardware or software; internal monitoring is referred to as self-testing. Monitoring is internal if it takes place on the same substrate as the circuit under test (CUT); nowadays, this usually means inside a single ICa system-on-a-chip (SOC). There are four primary parameters to consider in the design of an on-line testing scheme:
Error coverage (EC): This is defined as the fraction of all modeled errors that are detected, usually expressed in percent. Critical and highly available systems require very good error detection or error coverage to minimize the impact of errors that lead to system failure. Error latency (EL): This is the difference between the first time the error is activated and the first time it is detected. EL is affected by the time taken to perform a test and by how often tests are executed. A related parameter is fault latency (FL), defined as the difference between the onset of the fault and its detection. Clearly, FL EL, so when EL is difficult to determine, FL is often used instead. Space redundancy (SR): This is the extra hardware or firmware needed to perform on-line testing. Time redundancy (TR): This is the extra time needed to perform on-line testing. An ideal on-line testing scheme would have 100% error coverage, error latency of 1 clock cycle, no space redundancy, and no time redundancy. It would require no redesign of the CUT, and impose no functional or structural restrictions on the CUT. To cover all of the fault types described earlier, two different modes of on-line testing are employed: concurrent testing which takes place during normal system operation, and non-concurrent testing which takes place while normal operation is temporarily suspended. These operating modes must often be overlapped to provide a comprehensive on-line testing strategy at acceptable cost.
For critical or highly available systems, it is essential to have a comprehensive approach to on-line testing that covers all expected permanent, intermittent, and transient faults. In recent years, built-in-self-test (BIST) has emerged as an important method for testing manufacturing faults, and it is increasingly promoted for on-line testing as well.
6.
Timing specs
(with or without heuristics), or by pseudo-random methods. On the other hand, for (2), a test is subsequently applied many times to each integrated circuit and thus must be efficient both in space (storage requirements for the patterns) and in time. The main considerations in evaluating a test set are: (i) the time to construct a minimal test set; (ii) the size of the test set; (iii) the time involved to carry out the test; and (iv) the equipment required (if external). Most algorithmic test pattern generators are based on the concept of sensitized paths. The Sensitized Path Method is a heuristic approach to generating tests for general combinational logic networks. The circuit is assumed to have only a single fault in it. The sensitized path method consists of two parts: 1. The creation of a SENSITIZED PATH from the fault to the primary output. This involves assigning logic values to the gate inputs in the path from the fault site to a primary output, such that the fault effect is propagated to the output. 2. The JUSTIFICATION operation, where the assignments made to gate inputs on the sensitized path is traced back to the primary inputs. This may require several backtracks and iterations. In the case of sequential circuits the same logic is applied but before that the sequential elements are explicitly driven to a required state using scan based design-for-test (DFT) circuitry [1,24]. The best-known algorithms are the D-algorithm, PODEM and FAN [1,24]. Three steps can be identified in most automatic test pattern generation (ATPG) programs: (a) listing the signals on the inputs of a gate controlling the line on which a fault should be detected; (b) determining the primary input conditions necessary to obtain these signals (back propagation) and sensitizing the path to the primary outputs such that the signals and faults can be observed; (c) repeating this procedure until all detectable faults in a given fault set have been covered.
concatenating on to it the shortest path to an uncovered transition [26 A significant limitation to state machine test generation techniques is the time complexity of the state enumeration process performed during test generation. Coverage directed algorithms seek to improve coverage without targeting any specific fault. These algorithms heuristically modify an existing test set to improve total coverage, and then evaluate the fault coverage produced by the modified test set. If the modified test set corresponds to an improvement in fault coverage then the modification is accepted. Otherwise the modification is either rejected or another heuristic is used to determine the acceptability of the modification. The modification method is typically either random or directed random. An example of such a technique is presented in [25] which uses a genetic algorithm to successively improve the population of test sequences.
7.
is a significant part of the overall system. This is considered as white-box testing. Therefore, software validation testing is also the responsibility of the developer.
8.
In embedded system where hardware and software are combined, unexpected situation can occur owing to the interaction faults between hardware and software. As the functions of embedded system get more complicated, it gets more difficult to detect faults that cause such troubles. Hence, Faults Injection Technique is strongly recommended in a way it observes system behaviors by injecting faults into target system so as to detect interaction faults between hardware and software in embedded system. The test data selection technique discussed in [21] first simulates behaviors of embedded system to software program from requirement specification. Then hardware faults, after being converted to software faults, are injected into the simulated program. And finally, effective test data are selected to detect faults caused by the interactions between hardware and software.
9.
Conclusion
Rapid advances in test development techniques are needed to reduce the test cost of million-gate SOC devices. In this chapter a number of state-of-the-art techniques are discussed for testing of embedded systems. Modular test techniques for digital, mixed-signal, and hierarchical SOCs must develop further to keep pace with design complexity and integration density. The test data bandwidth needs for analog cores are significantly different than that for digital cores, therefore unified top-level testing of mixed-signal SOCs remains major challenge. This chapter also described granular based embedded software testing technique.
References
[1] [2] [3] [4] [5] [6] [7] [8] [9] M. L. Bushnell and V. D Agarwal, Essentials of Electronic Testing Kluwer academic Publishers, Norwell, MA, 2000. E. A. Lee, What's Ahead for Embedded Software?, IEEE Computer, pp 18-26, September, 2000. E. A. Lee, Computing for embedded systems, proceeding of IEEE Instrumentation and Measurement Technology Conference, Budapest, Hungary, May, 2001. Semiconductor Industry Association, International Technology Roadmap for Semiconductors, 2001 Edition, http://public.itrs.net/Files/2001ITRS/Home.html Y. Zorian, E.J.Marinissen, and S.Dey, Testing Embedded-Core Based System Chips, IEEE Computer, 32,52-60,1999 M-C Hsueh, T. K.Tsai, and R. K. Lyer, Fault Injection Techniques and Tools, IEEE Computer, pp75-82, April,1997. V. Encontre, Testing Embedded Systems: Do You Have The GuTs for It? www128.ibm.com/developerworks/rational/library/content/03July/1000/1050/1050.pdf D. D. Gajski and F. Vahid, Specification and design of embedded hardware-software systems, IEEE Design and Test of Computers, vol. 12, pp. 5367, 1995. S. Dey, A. Raghunathan, and K. D. Wagner, Design for testability techniques at the behavioral and register-transfer level, Journal of Electronic Testing: Theory and Applications (JETTA), vol. 13, pp. 7991, October 1998. B. Beizer, Software Testing Techniques, Second Edition, Van Nostrand Reinhold, 1990. Version 2 EE IIT, Kharagpur 17
[10]
[11] [12]
[13]
[18]
[19]
[20]
[21]
[26] [27]
G. Al Hayek and C. Robach, From specification validation to hardware testing: A unified method, in International Test Conference, pp. 885893, October 1996. A. von Mayrhauser, T. Chen, J. Kok, C. Anderson, A. Read, and A. Hajjar, On choosing test criteria for behavioral level harware design verification, in High Level Design Validation and Test Workshop, pp. 124130, 2000. L. A. Clarke, A. Podgurski, D. J. Richardson, and S. J. Zeil, A formal evaluation of data flow path selection criteria, IEEE Trans. on Software Engineering, vol. SE-15, pp. 13181332, 1989. S. C. Ntafos, A comparison of some structural testing strategies, IEEE Trans. on Software Engineering, vol. SE-14, pp. 868874, 1988. J. Laski and B. Korel, A data flow oriented program testing strategy, IEEE Trans. on Software Engineering, vol. SE-9, pp. 3343, 1983. Q. Zhang and I. G. Harris, A domain coverage metric for the validation of behavioral vhdl descriptions, in International Test Conference, October 2000. D. Moundanos, J. A. Abraham, and Y. V. Hoskote, Abstraction techniques for validation coverage analysis and test generation, IEEE Transactions on Computers, vol. 47, pp. 214, January 1998. N. Malik, S. Roberts, A. Pita, and R. Dobson, Automaton: an autonomous coveragebased multiprocessor system verification environment, in IEEE International Workshop on Rapid System Prototyping, pp. 168172, June 1997. K.-T. Cheng and A. S. Krishnakumar, Automatic functional test bench generation using the extended finite state machine model, in Design Automation Conference, pp. 16, 1993. J. P. Bergmann and M. A. Horowitz, Improving coverage analysis and test generation for large designs, in International Conference on Computer-Aided Design, pp. 580583, 1999. A. Sung and B. Choi, An Interaction Testing Technique between Hardware and Software in Embedded Systems, Proceedings of Ninth Asia-Pacific Software Engineering Conference, 2002. 4-6 Dec. 2002 Page(s):457 464 IEEE P I500 Web Site. http://grouper.ieee.org/groups/I SOO/. H. Al-Asaad, B. T. Murray, and J. P. Hayes, Online BIST for embedded systems IEEE Design & Test of Computers, Volume 15, Issue 4, Oct.-Dec. 1998 Page(s): 17 24 M. Abramovici, M.A. Breuer, AND A.D. Friedman, Digital Systems Testing and Testable Design, IEEE Press 1990. F. Corno, M. Sonze Reorda, G. Squillero, A. Manzone, and A. Pincetti, Automatic test bench generation for validation of RT-level descriptions: an industrial experience, in Design Automation and Test in Europe, pp. 385389, 2000. R. C. Ho, C. H. Yang, M. A. Horowitz, and D. L. Dill, Architecture validation for processors, in International ymposium on Computer Architecture, pp. 404413, 1995. P. Van Hentenryck, Constraint Satisfaction in Logic Programming, MIT Press, 1989.
Problems
1. 2. 3. 4. How testing differs from verification? What is embedded system? Define hard real-time system and soft real-time system with example. Why testing embedded system is difficult? How hardware testing differs from software testing? Version 2 EE IIT, Kharagpur 18
5. 6. 7. 8.
What is co-testing? Distinguish between defects, errors and faults with example. Calculate the total number of single and multiple stuck-at faults for a logic circuit with n lines. In the circuit shown in Figure 38.4 if any of the following tests detect the fault x1 sa-0? a) (0,1,1,1) b) (1,0,1,1) c) (1,1,0,1) d) (1,0,1,0)
x1 z x2 x3 x4 Fig. P1 9. Define the following fault models using examples where possible: a) Single and multiple stuck-at fault b) Bridging fault c) Stuck-open and stuck-short fault d) Operational fault What is meant by co-validation fault model? Describe different software fault model? Describe the basic structure of core-based testing approach for embedded system. What is concurrent or on-line testing? How it differs from non-concurrent testing? Define error coverage, error latency, space redundancy and time redundancy in view of on-line testing? What is a test vector? How test vectors are generated? Describe different techniques for test pattern generation. Define the following for software testing: a) Software unit testing b) Software integration testing c) Software validation testing d) System unit testing e) System integration testing f) System validation testing
Module 8
Testing of Embedded System
Version 2 EE IIT, Kharagpur 1
Lesson 39
Design for Testability
Version 2 EE IIT, Kharagpur 2
Instructional Objectives
After going through this lesson the student would be able to Explain the meaning of the term Design for Testability (DFT) Describe some adhoc and some formal methods of incorporating DFT in a system level design Explain the scan-chain based method of DFT Highlight the advantages and disadvantages of scan-based designs and discuss alternatives
The embedded system is an information processing system that consists of hardware and software components. Nowadays, the number of embedded computing systems in areas such as telecommunications, automotive electronics, office automation, and military applications are steadily growing. This market expansion arises from greater memory densities as well as improvements in embeddable processor cores, intellectual-property modules, and sensing technologies. At the same time, these improvements have increased the amount of software needed to manage the hardware components, leading to a higher level of system complexity. Designers can no longer develop high-performance systems from scratch but must use sophisticated system modeling tools. The increased complexity of embedded systems and the reduced access to internal nodes has made it not only more difficult to diagnose and locate faulty components, but also the functions of embedded components may be difficult to measure. Creating testable designs is key to developing complex hardware and/or software systems that function reliably throughout their operational life. Testability can be defined with respect to a fault. A fault is testable if there exists a well-specified procedure (e.g., test pattern generation, evaluation, and application) to expose it, and the procedure is implementable with a reasonable cost using current technologies. Testability of the fault therefore represents the inverse of the cost in detecting the fault. A circuit is testable with respect to a fault set when each and every fault in this set is testable. Design-for-testability techniques improve the controllability and observability of internal nodes, so that embedded functions can be tested. Two basic properties determine the testability of a node: 1) controllability, which is a measure of the difficulty of setting internal circuit nodes to 0 or 1 by assigning values to primary inputs (PIs), and 2) observability, which is a measure of the difficulty of propagating a nodes value to a primary output (PO) [1-3]. A node is said to be testable if it is easily controlled and observed. For sequential circuits, some have added predictability, which represents the ability to obtain known output values in response to given input stimuli. The factors affecting predictability include initializability, races, hazards, oscillations, etc. DFT techniques include analog test busses and scan methods. Testability can also be improved with BIST circuitry, where signal generators and analysis circuitry are implemented on chip [1, 3-4]. Without testability, design flaws may escape detection until a
product is in the hands of users; equally, operational failures may prove difficult to detect and diagnose. Increased embedded system complexity makes thorough assessment of system integrity by testing external black-box behavior almost impossible. System complexity also complicates test equipment and procedures. Design for testability should increase a systems testability, resulting in improved quality while reducing time to market and test costs. Traditionally, hardware designers and test engineers have focused on proving the correct manufacture of a design and on locating and repairing field failures. They have developed several highly structured and effective solutions to this problem, including scan design and self test. Design verification has been a less formal task, based on the designers skills. However, designers have found that structured design-for-test features aiding manufacture and repair can significantly simplify design verification. These features reduce verification cycles from weeks to days in some cases. In contrast, software designers and test engineers have targeted design validation and verification. Unlike hardware, software does not break during field use. Design errors, rather than incorrect replication or wear out, cause operational bugs. Efforts have focused on improving specifications and programming styles rather than on adding explicit test facilities. For example, modular design, structured programming, formal specification, and object orientation have all proven effective in simplifying test. Although these different approaches are effective when we can cleanly separate a designs hardware and software parts, problems arise when boundaries blur. For example, in the early design stages of a complex system, we must define system level test strategies. Yet, we may not have decided which parts to implement in hardware and which in software. In other cases, software running on general-purpose hardware may initially deliver certain functions that we subsequently move to firmware or hardware to improve performance. Designers must ensure a testable, finished design regardless of implementation decisions. Supporting hardware-software codesign requires cotesting techniques, which draw hardware and software test techniques together into a cohesive whole.
2.
Design for testability (DFT) refers to those design techniques that make the task of subsequent testing easier. There is definitely no single methodology that solves all embedded system-testing problems. There also is no single DFT technique, which is effective for all kinds of circuits. DFT techniques can largely be divided into two categories, i.e., ad hoc techniques and structured (systematic) techniques. DFT methods for digital circuits: Ad-hoc methods Structured methods: Scan Partial Scan Built-in self-test (discussed in Lesson 34) Boundary scan (discussed in Lesson 34)
Things to be followed
Large circuits should be partitioned into smaller sub-circuits to reduce test costs. One of the most important steps in designing a testable chip is to first partition the chip in an appropriate way such that for each functional module there is an effective (DFT) technique to test it. Partitioning must be done at every level of the design process, from architecture to circuit, whether testing is considered or not. Partitioning can be functional (according to functional module boundaries) or physical (based on circuit topology). Partitioning can be done by using multiplexers and/or scan chains. Test access points must be inserted to enhance controllability & observability of the circuit. Test points include control points (CPs) and observation points (OPs). The CPs are active test points, while the OPs are passive ones. There are also test points, which are both CPs and OPs. Before exercising test through test points that are not PIs and POs, one should investigate into additional requirements on the test points raised by the use of test equipments. Circuits (flip-flops) must be easily initializable to enhance predictability. A power-on reset mechanism controllable from primary inputs is the most effective and widely used approach. Test control must be provided for difficult-to-control signals. Automatic Test Equipment (ATE) requirements such as pin limitation, tri-stating, timing resolution, speed, memory depth, driving capability, analog/mixed-signal support, internal/boundary scan support, etc., should be considered during the design process to avoid delay of the project and unnecessary investment on the equipments. Internal oscillators, PLLs and clocks should be disabled during test. To guarantee tester synchronization, internal oscillator and clock generator circuitry should be isolated during the test of the functional circuitry. The internal oscillators and clocks should also be tested separately. Analog and digital circuits should be kept physically separate. Analog circuit testing is very much different from digital circuit testing. Testing for analog circuits refers to real measurement, since analog signals are continuous (as opposed to discrete or logic signals in digital circuits). They require different test equipments and different test methodologies. Therefore they should be tested separately.
Things to be avoided
Asynchronous(unclocked) logic feedback in the circuit must be avoided. A feedback in the combinational logic can give rise to oscillation for certain inputs. Since no clocking is employed, timing is continuous instead of discrete, which makes tester synchronization virtually impossible, and therefore only functional test by application board can be used.
Monostables and self-resetting logic should be avoided. A monostable (one-shot) multivibrator produces a pulse of constant duration in response to the rising or falling transition of the trigger input. Its pulse duration is usually controlled externally by a resistor and a capacitor (with current technology, they also can be integrated on chip). One-shots are used mainly for 1) pulse shaping, 2) switch-on delays, 3) switch-off delays, 4) signal delays. Since it is not controlled by clocks, synchronization and precise duration control are very difficult, which in turn reduces testability by ATE. Counters and dividers are better candidates for delay control. Redundant gates must be avoided. High fanin/fanout combinations must be avoided as large fan-in makes the inputs of the gate difficult to observe and makes the gate output difficult to control. Gated clocks should be avoided. These degrade the controllability of circuit nodes. The above guidelines are from experienced practitioners. These are not complete or universal. In fact, there are drawbacks for these methods: There is a lack of experts and tools. Test generation is often manual This method cannot guarantee for high fault coverage. It may increase design iterations. This is not suitable for large circuits
accessed by shifting out the chain. Figure 39.1 shows a typical circuit after the scan insertion operation. Input/output of each scan shift register must be available on PI/PO. Combinational ATPG is used to obtain tests for all testable faults in the combinational logic. Shift register tests are applied and ATPG tests are converted into scan sequences for use in manufacturing test. Primary Inputs Combinational Logic Primary Outputs SFF SFF SFF SCANOUT
TC SCANIN CLK Fig. 39.1 Scan structure to a design Fig. 39.1 shows a scan structure connected to design. The scan flip-flips (FFs) must be interconnected in a particular way. This approach effectively turns the sequential testing problem into a combinational one and can be fully tested by compact ATPG patterns. Unfortunately, there are two types of overheads associated with this technique that the designers care about very much. These are the hardware overhead (including three extra pins, multiplexers for all FFs, and extra routing area) and performance overhead (including multiplexer delay and FF delay due to extra load).
2.3
Scan Variations
MUXed Scan Scan path Scan-Hold Flip-Flop Serial scan Level-Sensitive Scan Design (LSSD) Scan set Random access scan
There have been many variations of scan as listed below, few of these are discussed here.
C/L X
SI C T DI SI T C
FF
FF
FF
SO
L1 D Q
L2 D Q
Fig. 39.2 The Shift-Register Modification approach Fig. 39.2 shows that when the test mode pin T=0, the circuit is in normal operation mode and when T=1, it is in test mode (or shift-register mode). The scan flip-flips (FFs) must be interconnected in a particular way. This approach effectively turns the sequential testing problem into a combinational one and can be fully tested by compact ATPG patterns. There are two types of overheads associated with this method. The hardware overhead due to three extra pins, multiplexers for all FFs, and extra routing area. The performance overhead includes multiplexer delay and FF delay due to extra load.
C2 SI DI DO SO
C1 L1
L2
Fig. 39.3 Logic diagram of the two-port raceless D-FF This approach gives a lower hardware overhead (due to dense layout) and less performance penalty (due to the removal of the MUX in front of the FF) compared to the MUX Scan Approach. The real figures however depend on the circuit style and technology selected, and on the physical implementation.
D C D L C +L
CD 0 0 0 1 1 0 1 1
+L L L 0 1
C SI
DI C SI A
L1
+L1
+L2 B A B
L2
+L2
Fig. 39.5 The polarity-hold shift-register latch (SRL) LSSD requires that the circuit be LS, so we need LS memory elements as defined above. Figure 39.4 shows an LS polarity-hold latch. The correct change of the latch output (L) is not dependent on the rise/fall time of C, but only on C being `1' for a period of time greater than or equal to data propagation and stabilization time. Figure 39.5 shows the polarity-hold shift-register latch (SRL) used in LSSD as the scan cell. The scan cell is controlled in the following way: Normal mode: A=B=0, C=0 1. SR (test) mode: C=0, AB=10 01 to shift SI through L1 and L2.
Advantages of LSSD
1. Correct operation independent of AC characteristics is guaranteed. 2. FSM is reduced to combinational logic as far as testing is concerned. 3. Hazards and races are eliminated, which simplifies test generation and fault simulation.
Drawbacks of LSSD
1. Complex design rules are imposed on designers. There is no freedom to vary from the overall schemes. It increases the design complexity and hardware costs (4-20% more hardware and 4 extra pins). 2. Asynchronous designs are not allowed in this approach. 3. Sequential routing of latches can introduce irregular structures. 4. Faults changing combinational function to sequential one may cause trouble, e.g., bridging and CMOS stuck-open faults. 5. Test application becomes a slow process, and normal-speed testing of the entire test sequence is impossible. 6. It is not good for memory intensive designs.
PI
PO
SCANOUT
To comb. logic
CK TC SCAN OUT Fig. 39.7 The RAM cell The difference between this approach and the previous ones is that the state vector can now be accessed in a random sequence. Since neighboring patterns can be arranged so that they differ in only a few bits, and only a few response bits need to be observed, the test application time can be reduced. In this approach test length is reduced. This approach provides the ability to `watch' a node in normal operation mode, which is impossible with previous scan methods. This is suitable for delay and embedded memory testing. The major disadvantage of the approach is high hardware overhead due to address decoder, gates added to SFF, address register, extra pins and routing
SE
PI Combinational circuit
PO
CK1 FF FF CK2 SFF TC SFF SCANIN Fig. 39.9 Design using partial scan structure SCANOUT
3. Conclusions
Accessibility to internal nodes in a complex circuitry is becoming a greater problem and thus it is essential that a designer must consider how the IC will be tested and extra structures will be incorporated in the design. Scan design has been the backbone of design for testability in the industry for a long time. Design automation tools are available for scan insertion into a circuit which then generate test patterns. Overhead increases due to the scan insertion in a circuit. In ASIC design 10 to 15 % scan overhead is generally accepted.
References
[1] [2] [3] [4] [5] M. L. Bushnell and V. D Agarwal, Essentials of Electronic Testing Kluwer academic Publishers, Norwell, MA, 2000. M. Abramovici, M.A. Breuer, and A.D. Friedman, Digital Systems Testing and Testable Design, IEEE Press 1990. V.D. Agrawal, C.R. Kime, and K.K. Saluja, ATutorial on Built-In Self-Test, Part 1: Principles, IEEE Design and Test of Computers,Vol. 10, No. 1, Mar. 1993, pp. 73-82. V.D. Agrawal, C.R. Kime, and K.K. Saluja, ATutorial on Built-In Self-Test, Part 2: Applications, IEEE Design and Test of Computers, Vol. 10, No. 2, June 1993, pp. 69-77. S. DasGupta, R. G. Walther, and T. W. Williams, An Enhencement to LSSD and Some Applications of LSSD in Reliability, in Proc. Of the International Fault-Tolerant Computing Symposium. B. R. Wilkins, Testing Digital Circuits, An Introduction, Berkshire, UK: Van Nostrand Reinhold, 1986[RAM]. T.W.Williams, editor, VLSI Testing. Amsterdam, The Netherlands: North-Holand, 1986 [RAM]. A.Krstic and K-T. Cheng, Delay Fault Testing for VLSI Circuits. Boston: Kluwer Academic Publishers, 1998.
Review Questions
1. What is Design-for-Testability (DFT)? What are the different kinds of DFT techniques used for digital circuit testing? 2. What are the things that must be followed for ad-hoc testing? Describe drawbacks of adhoc testing. 3. Describe a full scan structure implemented in a digital design. What are the scan overheads? 4. Suppose that your chip has 100,000 gates and 2,000 flip-flops. A combinational ATPG produced 500 vectors to fully test the logic. A single scan-chain design will require about 106 clock cycles for testing. Find the scan test length if 10 scan chains are implemented. Given that the circuit has 10 PIs and 10 POs, and only one extra pin can be added for test, how much more gate overhead will be needed for the new design? 5. For a circuit with 100000 gates and 2000 flip-flops connected in a single chain, what will be the gate overhead for a scan design where scan-hold flip-flops are used? 6. Calculate the syndromes for the carry and sum outputs of a full adder cell. Determine whether there is any single stuck fault on any input for which one of the outputs is syndrome-untestable. If there is, suggest an implementation possibly with added inputs, which makes the cell syndrome-testable. 7. Describe the operation of a level-sensitive scan design implemented in a digital design. What are design rules to be followed to make the design race-free and hazard-free? What are the advantages and disadvantages of LSSD?
8. Consider the random-access scan architecture. How would you organize the test data to minimize the total test time? Describe a simple heuristic for ordering these data. 9. Make a comparison of different scan variations in terms of scan overhead. 10. Consider the combinational circuit below which has been portioned into 3 cones (two CONE Xs and one CONE Y) and one Exclusive-OR gate.
J A B C D E F CONE Y H CONE X K CONE X G
For those two cones, we have the following information. CONE X has a structure which can be tested 100% by using the following 4 vectors and its output is also specified. A/G 0 0 1 1 B/H 0 1 1 0 C/F 1 1 0 0 OUTPUT 0 0 1 1
CONE Y has a structure which can be tested 100% by using the following 4 vectors and its output is also specified. C 0 0 1 1 D 0 1 0 1 E 1 0 1 1 OUTPUT 0 1 1 0
Derive a smallest test set to test this circuit so that each partition is applied the required 4 test vectors. Also, the XOR gate should be exhaustively tested.
Fill in the blank entries below. (You may not add additional vectors). A 0 0 1 1 B 0 1 1 0 C 1 1 0 0 1 D 1 E F G 0 0 1 1 H J K
Module 8
Testing of Embedded System
Version 2 EE IIT, Kharagpur 1
Lesson 40
Built-In-Self-Test (BIST) for Embedded Systems
Version 2 EE IIT, Kharagpur 2
Instructional Objectives
After going through this lesson the student would be able to Explain the meaning of the term Built-in Self-Test (BIST) Identify the main components of BIST functionality Describe the various methods of test pattern generation for designing embedded systems with BIST Define what is a Signature Analysis Register and describe some methods to designing such units Explain what is a Built-in Logic Block Observer (BILBO) and describe how to use this block for designing BIST
BIST is a design-for-testability technique that places the testing functions physically with the circuit under test (CUT), as illustrated in Figure 40.1 [1]. The basic BIST architecture requires the addition of three hardware blocks to a digital circuit: a test pattern generator, a response analyzer, and a test controller. The test pattern generator generates the test patterns for the CUT. Examples of pattern generators are a ROM with stored patterns, a counter, and a linear feedback shift register (LFSR). A typical response analyzer is a comparator with stored responses or an LFSR used as a signature analyzer. It compacts and analyzes the test responses to determine correctness of the CUT. A test control block is necessary to activate the test and analyze the responses. However, in general, several test-related functions can be executed through a test controller circuit. Test Controller Test
ROM
Reference Signature Hard ware pattern generator M U X CUT Output Response Compactor PO Comparator
Good/Faulty Signature
Fig. 40.1 A Typical BIST Architecture As shown in Figure 40.1, the wires from primary inputs (PIs) to MUX and wires from circuit output to primary outputs (POs) cannot be tested by BIST. In normal operation, the CUT receives its inputs from other modules and performs the function for which it was designed. During test mode, a test pattern generator circuit applies a sequence of test patterns to the CUT, Version 2 EE IIT, Kharagpur 3
and the test responses are evaluated by a output response compactor. In the most common type of BIST, test responses are compacted in output response compactor to form (fault) signatures. The response signatures are compared with reference golden signatures generated or stored onchip, and the error signal indicates whether chip is good or faulty. Four primary parameters must be considered in developing a BIST methodology for embedded systems; these correspond with the design parameters for on-line testing techniques discussed in earlier chapter [2]. Fault coverage: This is the fraction of faults of interest that can be exposed by the test patterns produced by pattern generator and detected by output response monitor. In presence of input bit stream errors there is a chance that the computed signature matches the golden signature, and the circuit is reported as fault free. This undesirable property is called masking or aliasing. Test set size: This is the number of test patterns produced by the test generator, and is closely linked to fault coverage: generally, large test sets imply high fault coverage. Hardware overhead: The extra hardware required for BIST is considered to be overhead. In most embedded systems, high hardware overhead is not acceptable. Performance overhead: This refers to the impact of BIST hardware on normal circuit performance such as its worst-case (critical) path delays. Overhead of this type is sometimes more important than hardware overhead.
Benefits of BIST
It reduces testing and maintenance cost, as it requires simpler and less expensive ATE. BIST significantly reduces cost of automatic test pattern generation (ATPG). It reduces storage and maintenance of test patterns. It can test many units in parallel. It takes shorter test application times. It can test at functional system speed. BIST can be used for non-concurrent, on-line testing of the logic and memory parts of a system [2]. It can readily be configured for event-triggered testing, in which case, the BIST control can be tied to the system reset so that testing occurs during system start-up or shutdown. BIST can also be designed for periodic testing with low fault latency. This requires incorporating a testing process into the CUT that guarantees the detection of all target faults within a fixed time. On-line BIST is usually implemented with the twin goals of complete fault coverage and low fault latency. Hence, the test generation (TG) and response monitor (RM) are generally designed Version 2 EE IIT, Kharagpur 4
to guarantee coverage of specific fault models, minimum hardware overhead, and reasonable set size. These goals are met by different techniques in different parts of the system. TG and RM are often implemented by simple, counter-like circuits, especially linear-feedback shift registers (LFSRs) [3]. The LFSR is simply a shift register formed from standard flip-flops, with the outputs of selected flip-flops being fed back (modulo-2) to the shift registers inputs. When used as a TG, an LFSR is set to cycle rapidly through a large number of its states. These states, whose choice and order depend on the design parameters of the LFSR, define the test patterns. In this mode of operation, an LFSR is seen as a source of (pseudo) random tests that are, in principle, applicable to any fault and circuit types. An LFSR can also serve as an RM by counting (in a special sense) the responses produced by the tests. An LFSR RMs final contents after applying a sequence of test responses forms a fault signature, which can be compared to a known or generated good signature, to see if a fault is present. Ensuring that the fault coverage is sufficiently high and the number of tests is sufficiently low are the main problems with random BIST methods. Two general approaches have been proposed to preserve the cost advantages of LFSRs while making the generated test sequence much shorter. Test points can be inserted in the CUT to improve controllability and observability; however, they can also result in performance loss. Alternatively, some determinism can be introduced into the generated test sequence, for example, by inserting specific seed tests that are known to detect hard faults. A typical BIST architecture using LFSR is shown in Figure 40.2 [4]. Since the output patterns of the LFSR are time-shifted and repeated, they become correlated; this reduces the effectiveness of the fault detection. Therefore a phase shifter (a network of XOR gates) is often used to decorrelate the output patterns of the LFSR. The response of the CUT is usually compacted by a multiple input shift register (MISR) to a small signature, which is compared with a known faultfree signature to determine whether the CUT is faulty. Scan chain 1 (/bits) LFSR
. . .
Phase shifter
MISR
. . .
Scan chain n (/bits)
Fig. 40.2 A generic BIST architecture based on an LFSR, an MISR, and a phase shifter
2.
DQ1
DQ2
DQ3
Clock Reset
Q1 Q2 Q3
Five-Bit Binary Counter 1 0 for Counter 1 1 for Counter 2 Five-Bit Binary Counter 2
X1 X2 X3
2-Bit X4 2-1 X5 MUX
2 6 3 1 4 7 5 f h
X6 X7 X8
Circuit partitioning for pseudo-exhaustive pattern generation can be done by cone segmentation as shown in Figure 40.4. Here, a cone is defined as the fan-ins of an output pin. If the size of the largest cone in K, the patterns must have the property to guarantee that the patterns applied to any K inputs must contain all possible combinations. In Figure 40.4, the total circuit is divided into two cones based on the cones of influence. For cone 1 the PO h is influenced by X1, X2, X3, X4 and X5 while PO f is influenced by inputs X4, X5, X6, X7 and X8. Therefore the total test pattern needed for exhaustive testing of cone 1 and cone 2 is (25 +25) = 64. But the original circuit with 8 inputs requires 28 = 256 test patterns exhaustive test.
hn-1 D FF Xn-1
hn-2 D FF Xn-2
h2 D FF X1
h1 D FF X0
Fig. 40.5 Standard Linear Feedback Shift Register Figure 40.5 shows a standard, external exclusive-OR linear feedback shift register. There are n flip-flops (Xn-1,X0) and this is called n-stage LFSR. It can be a near-exhaustive test pattern generator as it cycles through 2n-1 states excluding all 0 states. This is known as a maximal length LFSR. Figure 40.6 shows the implementation of a n-stage LFSR with actual digital circuit. [1]
h2
h1 D Q x X1
D Q 1 X0
DQ X7
DQ X6
DQ X5
DQ X4
DQ DQ X3 X2
DQ X1
DQ X0
Weight W1 select W2
1/16
1/8 1 of 4 MUX
1/4
1/2
LFSR
D Q
D Q
D Q
D Q
D Q
D Q
D Q
1/8
0.8
0.6
0.8
0.4 (b)
0.5
0.3
0.3
Fig. 40.8 weighted pseudorandom patterns. Figure 40.7 shows a weighted pseudo-random pattern generator implemented with programmable probabilities of generating zeros and ones at the PIs. As we know, LFSR generates pattern with equal probability of 1s and 0s. As shown in Figure 40.8 (a), if a 3-input AND gate is used, the probability of 1s becomes 0.125. If a 2-input OR gate is used, the probability becomes 0.75. Second, one can use cellular automata to produce patterns of desired weights as shown in Figure 40.8(b).
Fca D Q
Fca DD QQ
Fca D Q
Fca D Q
Fca D Q
Fca D Q
(b) CA with null cyclic boundary conditions Fig. 40.9 The structure of cellular automata In addition to an LFSR, a straightforward way to compress the test response data and produce a fault signature is to use an FSM or an accumulator. However, the FSM hardware overhead and accumulator aliasing are difficult parameters to control. Keeping the hardware overhead acceptably low and reducing aliasing are the main difficulty in RM design.
3.
During BIST, large amount of data in CUT responses are applied to Response Monitor (RM). For example, if we consider a circuit of 200 outputs and if we want to generate 5 million random Version 2 EE IIT, Kharagpur 10
patterns, then the CUT response to RM will be 1 billion bits. This is not manageable in practice. So it is necessary to compact this enormous amount of circuit responses to a manageable size that can be stored on the chip. The response analyzer compresses a very long test response into a single word. Such a word is called a signature. The signature is then compared with the prestored golden signature obtained from the fault-free responses using the same compression mechanism. If the signature matches the golden copy, the CUT is regarded fault-free. Otherwise, it is faulty. There are different response analysis methods such as ones count, transition count, syndrome count, and signature analysis. Compression: A reversible process used to reduce the size of the response. It is difficult in hard ware. Compaction: An irreversible (lossy) process used to reduce the size of the response. a) b) c) d) Parity compression: It computes the parity of a bit stream. Syndrome: It counts the number of 1s in the bit stream. Transition count: It counts the number of times 01 and 10 condition occur in the bit stream. Cyclic Redundancy Check (CRC): It is also called signature. It computes CRC check word on the bit stream.
Signature analysis Compact good machine response into good machine signature. Actual signature generated during testing, and compared with good machine signature. Aliasing: Compression is like a function that maps a large input space (the response) into a small output space (signature). It is a many-to-one mapping. Errors may occur in the in the input bit stream. Therefore, a faulty response may have the signature that matches the to the golden signature and the circuit is reported as the fault-free one. Such a situation is referred as the aliasing or masking. The aliasing probability is the possibility that a faulty response is treated as fault-free. It is defined as follows: Let us assume that the possible input patterns are uniformly distributed over the possible mapped signature values. There are 2m input patterns, 2r signatures and 2n-r input patterns map into given signature. Then the aliasing or masking probability
P(M)= =
Number of erroneos input that map into the golden signature Number of faulty input responses
The aliasing probability is the major considerations in response analysis. Due to the n-to-1 mapping property of the compression, it is unlikely to do diagnosis after compression. Therefore, the diagnosis resoluation is very poor after compression. In addition to the aliasing probability, hardware overhead and hardware compatibility are also important issues. Here, hardware compatibility is referred to how well the BIST hardware can be incorporated in the CUT or DFT. Version 2 EE IIT, Kharagpur 11
CUT
Counter
Fig. 40.10 Ones count compression circuit structure For N-bit test length with r ones the masking probability is shown as follows: Number of masking sequences = 1 r 2N possible output sequences with only one fault free.
N
Test Pattern
DFF CUT
Clock
Counter
Fig. 40.11 Transition count compression circuit structure For N-bit test length with r transitions the masking probability is shown as follows: For the test length of N, there are N-1 transitions. Number of masking sequences = 1 r Hence, is the number of sequences that has r transitions. r Since the first output can be either one or zero, therefore, the total number must be multiplied by 2. Therefore total number of sequences with same transition counts : 2 . Again, only one r of them is fault-free.
N 1 N 1
N
CUT
Syndrome Counter
Syndrome Fig. 40.12 Syndrome testing circuit structure Version 2 EE IIT, Kharagpur 13
The originally design of syndrome test applies exhaustive patterns. Hence, the syndrome is S = K / 2 n , where n is the number of inputs and K is the number of minterms. A circuit is syndrome testable if all single stuck-at faults are syndrome detectable. The interesting part of syndrome testing is that any function can be designed as being syndrome testable.
Fig. 40.13 Two types of LFSR One of the most important properties of LFSRs is their recurrence relationship. The recurrence relation guarantees that the states of a LFSR are repeated in a certain order. For a given sequence of numbers a0, a1, a2,an,.. We can define a generating function: G(x) = a0 + a1x + a2x2 + + amxm + =
m=0
xm
G ( x ) = ci am i x m
m = 0 i =1 n
= ci x i am i x m
i =1 n m=0
1 i = ci x a i x + .... + a1 x + am x m i =1 m=0
i
G ( x) =
c x (a
n i i =1 i
x i + .... + a1 x 1 )
n
1 ci x i
i =1
n
G(x) has been expressed in terms of the initial state and the feedback coefficients. The denominator of the polynomial G(x), f ( x ) = 1 ci x i is called the characteristic polynomial of
i =1
the LFSR.
Any divisor polynomial G(x) with two or more non-zero coefficients will detect all single-bit errors.
. . .
Figure 40.15 illustrates a m-stage MISR. After test cycle i, the test responses are stable on CUT outputs, but the shifting clock has not yet been applied. Ri(x)= (m-1)th polynomial representing the test responses after test cycle i. Si(x)=polynomial representing the state of the MISR after test cycle i. Ri ( x ) = ri , m 1 x m 1 + ri ,m 2 x m 2 + ........ + ri ,1 x + ri ,0 Si ( x ) = Si , m 1 x m 1 + Si ,m 2 x m 2 + ........ + Si ,1 x + Si ,0 G ( x ) is the characteristic polynomial Assume initial state of MISR is 0. So, S0 ( x ) = 0
S1 ( x ) = R0 ( x ) + xS0 ( x ) mod G ( x ) = R0 ( x ) S 2 ( x ) = R1 ( x ) + xS1 ( x ) mod G ( x ) = R1 ( x ) + R0 ( x ) mod G ( x ) . . S n ( x ) = x n 1 R0 ( x ) + x n 2 R1 ( x ) + ....... + xRn 2 ( x ) + Rn 1 ( x ) mod G ( x )
Si +1 ( x ) = Ri ( x ) + xSi ( x ) mod G ( x )
This is the signature left in MISR after n patterns are applied. Let us consider a n-bit response compactor with m-bit error polynomial. Then the error polynomial is of (m+n-2) degree that Version 2 EE IIT, Kharagpur 16
gives (2m+n-1-1) non-zero values. G(x) has 2n-1-1 nonzero multiples that result m polynomials of degree <=m+n-2. 2n1 1 P( M ) = m+ n 1 1 2 Probability of masking 1 m 2
CUT
CUT
MISR
B1 B2 S1 Clock MUX 0 1
D1
D2
Dn-1
Dn
DQ C Q1
DQ C Q2
DQ C Qn-1
D Q SO C Qn
L F S R
In this mode of operation BILBO1 operates in MISR mode and BILBO2 operates in LFSR mode. CUT A and CUT C are tested in parallel.
Phase 2
In this of operation BILBO1 operates in LFSR mode and BILBO2 operates in MISR mode. Only CUT B is tested in this mode of operation.
responses, the test speed is much slower than the test-per-clock approach. The clocks required for a test cycle is the maximal of the scan stages of input and output scan registers. Also fall in this category include CEBS, LOCST, and STUMP.
SI LFSR Scan Register SRI LFSR SI Scan Register SRI
CUT
3.7.5 Self-Testing Using MISR and Parallel Shift register sequence generator (STUMP)
The architecture of the self-testing using MISR and parallel SRSG (STUMP) is shown in Figure 40.20. Instead of using only one scan chain, it uses multiple scan chains to minimize the test time. Since the scan chains may have different lengths, the LFSR runs for N cycles (the length of the longest scan chain) to load all the chains. For such a design, the internal type LFSR is preferred. If the external type is used, the difference between two LFSR output bits is only the time shift. Hence, the correlation between two scan chains can be very high.
Pseudo-Random Test Pattern Generator
SR1
CUT
SR2
SR n-1
CUT
SR n
Test Procedure of STUMP 1. 2. 3. 4. Scan in patterns from LFSR to all scan chain. Switch to normal function mode and apply one clock. Scan out chains into MISR. Overlap steps 1 and 3. Version 2 EE IIT, Kharagpur 19
4.
Structured design techniques are the keys to the high integration of VLSI circuits. The structured circuits include read only memories (ROM), random access memories (RAM), programmable logic array (PLA), and many others. In this section, we would like to focus on PLAs because they are tightly coupled with the logic circuits. While, memories are usually categorized as different category. Due to the regularity of the structure and the simplicity of the design, PLAs are commonly used in digital systems. PLAs are efficient and effective for the implementation of arbitrary logic functions, combinational or sequential. Therefore, in this section, we would like to discuss the BIST for PLAs. A PLA is conceptually a two level AND-OR structure realization of Boolean function. Figure 40.21 shows a general structure of a PLA. A PLA typically consists of three parts, input decoders, the AND plane, the OR plane, and the output buffer. The input decoders are usually implemented as single-bit decoders which produce the direct and the complement form of inputs. The AND plane is used to generate all the product terms. The OR plane sum the required product terms to form the output bits. In the physical implementation, they are implemented as NANDNAND or NOR-NOR structure.
...
Input Decoders
...
Output Buffers
...
PLA Inputs
...
PLA Outputs
As mentioned earlier in the fault model section, PLAs has the following faults, stuck-at faults, bridging faults, and crosspoint faults. Test generation for PLAs is more difficult than that for the conventional logic. This is because that PLAs have more complicated fault models. Further, a typical PLA may have as many as 50 inputs, 67 inputs, and 190 product terms [10-11]. Functional testing of such PLAs can be a difficult task. PLAs often contain unintentional and unidentifiable redundancy which might cause fault masking. Further more, PLAs are often embedded in the logic which complicates the test application and response observation. Therefore, many people proposed the use of BIST to handle the test of PLAs.
5.
BIST Applications
Manufactures are increasingly employing BIST in real products. Examples of such applications are given to illustrate the use of BIST in semiconductor, communications, and computer industrial.
References
[1] [2] [3] [4] M. L. Bushnell and V. D Agarwal, Essentials of Electronic Testing Kluwer academic Publishers, Norwell, MA, 2000. H. Al-Asaad, B. T. Murray, and J. P. Hayes, Online BIST for embedded systems IEEE Design & Test of Computers, Volume 15, Issue 4, Oct.-Dec. 1998 Page(s): 17 24 M. Abramovici, M.A. Breuer, AND A.D. Friedman, Digital Systems Testing and Testable Design, IEEE Press 1990. R. Zurawski, Embedded Systems Handbook, Taylor & Francis, 2005. Version 2 EE IIT, Kharagpur 21
[5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15]
C. V. Krishna, A. Jalas, and N. A. Tauba, Test vector encoding using partial LFSR reseeding, in Proceeding of the International Test Conference, pp. 885-893, 2001. J. Rajski, J. Tyszer, and N. Zacharia, Test data decompression for multiple scan designs with boundary scan, IEEE Transactions on Computers, 47, pp. 1188-1200, 1998. N. A. Tauba and E.J.MaCluskey, Altering a pseudo-random bit sequence for scan based, in Proceedings of International Test Conference, 1996, pp. 167-175. S. Wang, Low hardware overhead scan based 3-weight weighted random BIST, in Proceedings of International Test Conference, 2001, pp. 868-877. H. J. Wunderlich and G.Kiefer, Bit-flipping BIST, in Proceedings of International Conference on Computer-Aided Design, 1996, pp. 337-343. C.Y. Liu, K.K Saluja, and J.S. Ypadhyaya, BIST-PLA: A Built-in Self-Test Design of Large Programmable Logic Arrays, Proc. 24th Design Automation Conf., June 1987, pp. 385-391. C.Y.Liu and K.K.Saluja, Built -In Self-Test Techniques for Programmable logic Arrays, in VLSI Fault Modeling and Testing Techniques, G. W. Zobrist,ed., Ablex Publishing, Norwood, N.J.,1993. P. Gelsinger, Design and Test of the 80386, IEEE Design & Test of Computers, Vol. 4, No. 3, June 1987, pp.42-50. I.M. Ratiu and H.B. Bakouglu, Pseudorandom Built-In Self-Test Methodology and implementation for the IBM RISC System/6000 Processor, IBM J. Research and Development, Vol. 34. 1990, pp.78-84. A.L. Crouch, M. Pressly, J. Circello, Testability Features of the MC68060 Microprocessor, Proc. Intl Test Conf., 1994, pp. 60-69. J. Broseghini and D.H. Lenhert, An ALU-Based Programmable MISR/Pseudorandom Generator for a MC68HC11 Family Self-Test, Proc. Intl Test Conf., 1993, pp. 349-358.
Problems
1. What is Built-In-Self-Test? Discuss the issues and benefits of BIST. Describe BIST architecture and its operation. 2. Excluding the circuit under test, what are the four basic components of BIST and what function does each component perform? 3. Which two BIST components are necessary for system-level testing and why? 4. What are the different techniques for test pattern generation? 5. Discuss exhaustive and pseudo-exhaustive pattern generation. Give an example to show that pseudo-exhaustive testing requires less number of test pattern than exhaustive testing. 6. What is pseudorandom pattern generation? What is an LFSR? Describe pattern generation using LFSR. 7. Make a comparison of different test strategies based on fault coverage, hardware overhead, test time overhead and design effort. 8. An LFSR based signature register compresses an n-bit input pattern into an m-bit signature. Derive an expression for the probability of aliasing. Clearly state any assumptions you make. 9. Design a weighted pseudo-random pattern generator with programmable weights 1/2, 1/4, 11/32 and 1/16. 10. Prove that the number of 1s in an m-sequence differs from the number of 0s by one.
11. Consider a LFSR based pattern generator where the feedback network is a single XOR gate before the first stage. If the number of (feedback) inputs to the XOR is odd, is it possible for the LFSR to generate maximal length sequence? Justify or contradict. 12. Show the schematic diagram of a 4-bit BILBO register. 13. A given data path has p number of n-bit registers. For having BIST capability, suppose a% of the registers are converted to BILBO. Estimate the percentage overhead in the registers in terms of extra hardware. All gates may be assumed to have unit cost in your calculation. 14. It is said that by adding some extra hardware, a combinational circuit can be made syndrome testable for single stuck-at faults. Illustrate the process for a circuit realizing the Boolean function f = AB + BC. 15. Define the following: a) Compression b) Compaction c) Signature analysis d) Aliasing or masking 16. Describe different response compaction techniques. 17. What are different types of LFSR? What is modular LFSR? What is characteristic polynomial? 18. Implement a standard LFSR for the characteristic polynomial f(x) = x8+x7+x2+1. 19. Given the polynomial P(x)=x4+x2+x+1: a. Design an external feedback LSFR with characteristic polynomial P(x). b. Starting this LFSR in the all 1s state, determine the sequence produced. c. Is this a maximal length LFSR? d. Is the characteristic polynomial primitive? 20. Describe how LFSR is used in signature analysis for response compaction. 21. For an internal feedback Signature Analysis Register (SAR) with characteristic polynomial P(x)=x6+x2+1: a) Draw a logic diagram for the complete register. b) Determine the resultant signature that would be obtained for the following serial sequence of output responses produced by a known good CUT assuming the SAR is initialized to the all 0s state. Give the binary value of the resultant signature as it would be contained in the SAR in your logic diagram above. 101001010010 time 22. What is MISR? Give architecture of an m-stage MISR and derive its signature. What is the masking probability of MISR? 23. Describe with example and diagram what are test-per-clock system and test-per-scan system. What is the difference between them? 24. What is BILBO? Describe BILBO architecture and its operation? 25. Describe how BILBO is implemented in digital circuits? 26. Describe STUMPS testing system and its test procedure. 27. Give some examples of practical BIST application in industry.
Module 8
Testing of Embedded System
Version 2 EE IIT, Kharagpur 1
Lesson 41
Boundary Scan Methods and Standards
Version 2 EE IIT, Kharagpur 2
Instructional Objectives
After going through this lesson the student would be able to Explain the meaning of the term Boundary Scan List the IEEE 1149 series of standards with their important features Describe the architecture of IEEE 1149.1 boundary scan and explain the functionality of each of its components Explain, with the help of an example, how a board-level design can be equipped with the boundary scan feature Describe the advantages and disadvantages of the boundary scan technique
Boundary Scan Methods and Standards 1. Boundary Scan History and Family
Boundary Scan is a family of test methodologies aiming at resolving many test problems: from chip level to system level, from logic cores to interconnects between cores, and from digital circuits to analog or mixed-mode circuits. It is now widely accepted in industry and has been considered as an industry standard in most large IC system designs. Boundary-scan, as defined by the IEEE Std. 1149.1 standard [1-3], is an integrated method for testing interconnects on printed circuit board that is implemented at the IC level. Earlier, most Printed Circuit Board (PCB) testing was done using bed-of-nail in-circuit test equipment. Recent advances with VLSI technology now enable microprocessors and Application Specific Integrated Circuits (ASICs) to be packaged into fine pitch, high count packages. The miniaturization of device packaging, the development of surface-mounted packaging, double-sided and multi-layer board to accommodate the extra interconnects between the increased density of devices on the board reduces the physical accessibility of test points for traditional bed-of-nails in-circuit tester and poses a great challenge to test manufacturing defects in future. The long-term solution to this reduction in physical probe access was to consider building the access inside the device i.e. a boundary scan register. In 1985, a group of European companies formed Joint European Test Action Group (JETAG) and by 1988 the Joint Test Action Group (JTAG) was formed by several companies to tackle these challenges. The JTAG has developed a specification for boundary-scan testing that was standardized in 1990 by IEEE as the IEEE Std. 1149.1-1990. In 1993 a new revision to the IEEE Std. 1149.1 standard was introduced (1149.1a) and it contained many clarifications, corrections, and enhancements. In 1994, a supplement that contains a description of the boundary-scan Description Language (BSDL) was added to the standard. Since that time, this standard has been adopted by major electronics companies all over the world. Applications are found in high volume, high-end consumer products, telecommunication products, defense systems, computers, peripherals, and avionics. Now, due to its economic advantages, smaller companies that cannot afford expensive in-circuit testers are using boundary-scan. Figure 41.1 gives an overview of the boundary scan family, now known as the IEEE 1149.x standards.
Description
Year
Testing of digital chips and interconnections Std 1149.1 1990 between chips
IEEE 1149.1a
Added supplement A. Rewrite of the chapter Std 1149.1a 1993 describing boundary register
IEEE 1149.1b
Supplement B - formal description of the Std 1149.1b 1994 boundary-scan Description Language (BSDL)
IEEE 1149.1c
Corrections, clarifications and enhancements of Std 1149.1 2001 IEEE Std 1149.1a and Std 1149.1b. Combines 1149.1a & 1149.1b
IEEE 1149.2
Extended Digital Serial Interface. It has merged Obsolete with 1149.1 group.
Standard Module Test and Maintenance (MTM) Std. 1149.5 1995 Bus Protocol. Deals with test at system level, 1149.2 has merged with.
It is a derivative standard for in-system 2000 programming (ISP) of digital devices. Fig. 41.1 IEEE 1149 Family
The Std. 1149.1, usually referred to as the digital boundary scan, is the one that has been used widely. It can be divided into two parts: 1149.1a, or the digital Boundary Scan Standard, and 1149.1b, or the Boundary Scan Description Language (BSDL) [1,6]. Std. 1149.1 defines the chip level test architecture for digital circuits, and Std. 1149.1b is a hardware description language used to describe boundary scan architecture. The 1149.2 defines the extended digital series interface in the chip level. It has merged with 1149.1 group. The 1149.3 defines the direct access interface in contrast to 1149.2. Unfortunately this work has been discontinued. 1149.4 IEEE Standard deals with Mixed-Signal Test Bus [4]. This standard extends the test structure defined in IEEE Std. 1149.1 to allow testing and measurement of mixed-signal circuits. The standard describes the architecture and the means of control and access to analog and digital test data. The Std.1149.5 defines the bus protocol at the module level. By combining this level and Std.1149.1a one can easily carry out the testing of a PC board. 1149.6 IEEE Standard for Boundary-Scan Testing of Advanced Digital Networks is released in 2002. This standard augments 1149.1 for the testing of conventional digital networks and 1149.4 for analog networks. The 1149.6 standard defines boundary-scan structures and methods Version 2 EE IIT, Kharagpur 4
required to test advanced digital networks that are not fully covered by IEEE Std. 1149.1, such as networks that are AC-coupled, differential, or both. 1532 IEEE Standard is developed for In-System Configuration of Programmable Devices [5]. This extension of 1149.1 standardizes programming access and methodology for programmable integrated circuit devices. Devices such as CPLDs and FPGAs, regardless of vendor, that implement this standard may be configured (written), read back, erased and verified, singly or concurrently, with a standardized set of resources based upon the algorithm description contained in the 1532 BSDL file. JTAG Technologies programming tools contain support for 1532-compliant devices and automatically generate the applications. Clearly the testing of mixed-mode circuits at the various levels of integration will be a critical test issue for the system-on-chip design. Therefore there is a demand to combine all the boundary scan standards into an integrated one.
2.
The boundary-scan test architecture provides a means to test interconnects between integrated circuits on a board without using physical test probes. It adds a boundary-scan cell that includes a multiplexer and latches, to each pin on the device. Figure 41.2 [1] illustrates the main elements of a universal boundary-scan device. The Figure 41.2 shows the following elements: Test Access Port (TAP) with a set of four dedicated test pins: Test Data In (TDI), Test Mode Select (TMS), Test Clock (TCK), Test Data Out (TDO) and one optional test pin Test Reset (TRST*). A boundary-scan cell on each device primary input and primary output pin, connected internally to form a serial boundary-scan register (Boundary Scan). A TAP controller with inputs TCK, TMS, and TRST*. An n-bit (n >= 2) instruction register holding the current instruction. A 1-bit Bypass register (Bypass). An optional 32-bit Identification register capable of being loaded with a permanent device identification code.
Internal Register Any Digital Chip 1 Bypass Register TDI Identification Register 1 TMS TCK Instruction Register TAP Controller 1 TRST* (optional) Fig. 41.2 Main Elements of a IEEE 1149.1 Device Architecture The test access ports (TAP), which define the bus protocol of boundary scan, are the additional I/O pins needed for each chip employing Std.1149.1a. The TAP controller is a 16-state final state machine that controls each step of the operations of boundary scan. Each instruction to be carried out by the boundary scan architecture is stored in the Instruction Register. The various control signals associated with the instruction are then provided by a decoder. Several Test Data Registers are used to stored test data or some system related information such as the chip ID, company name, etc. TDO
test instructions and data are loaded from system input pins on the rising edge of TCK and driven through system output pins on its falling edge. TCK is pulsed by the equipment controlling the test and not by the tested device. It can be pulsed at any frequency (up to a maximum of some MHz). It can be even pulsed at varying rates. Test Data Input (TDI): an input line to allow the test instruction and test data to be loaded into the instruction register and the various test data registers, respectively. Test Data Output (TDO): an output line used to serially output the data from the JTAG registers to the equipment controlling the test. Test Mode Selector (TMS): the test control input to the TAP controller. It controls the transitions of the test interface state machine. The test operations are controlled by the sequence of 1s and 0s applied to this input. Usually this is the most important input that has to be controlled by external testers or the on-board test controller.
Test Reset Input (TRST*): The optional TRST* pin is used to initialize the TAP controller, that is, if the TRST* pin is used, then the TAP controller can be asynchronously reset to a TestLogic-Reset state when a 0 is applied at TRST*. This pin can also be used to reset the circuit under test, however it is not recommended for this application.
Data In (PI) 0 1
0 1
UpdateDR
C U S
Figure 41.3 [1] shows a basic universal boundary-scan cell, known as a BC_1. The cell has four modes of operation: normal, update, capture, and serial shift. The memory elements are two Dtype flip-flops with front-end and back-end multiplexing of data. It is important to note that the circuit shown in Figure 41.3 is only an example of how the requirement defined in the Standard could be realized. The IEEE 1149.1 Standard does not mandate the design of the circuit, only its functional specification. The four modes of operation are as follows: 1) During normal mode also called serial mode, Data_In is passed straight through to Data_Out. 2) During update mode, the content of the Update Hold cell is passed through to Data_Out. Signal values already present in the output scan cells to be passed out through the device output pins. Signal values already present in the input scan cells will be passed into the internal logic. 3) During capture mode, the Data_In signal is routed to the input Capture Scan cell and the value is captured by the next ClockDR. ClockDR is a derivative of TCK. Signal values on device input pins to be loaded into input cells, and signal values passing from the internal logic to device output pins to be loaded into output cells 4) During shift mode, the Scan_Out of one Capture Scan cell is passed to the Scan_In of the next Capture Scan cell via a hard-wired path. The Test ClocK, TCK, is fed in via yet another dedicated device input pin and the various modes of operation are controlled by a dedicated Test Mode Select (TMS) serial control signal. Note that both capture and shift operations do not interfere with the normal passing of data from the parallel-in terminal to the parallel-out terminal. This allows on the fly capture of operational values and the shifting out of these values for inspection without interference. This application of the boundary-scan register has tremendous potential for real-time monitoring of the operational status of a system a sort of electronic camera taking snapshots and is one reason why TCK is kept separate from any system clocks.
Chip 4 Chip 3 TMS TCK TMS TCK Serial data out TDO TCK TMS
System interconnect
Fig. 41.4 MCM with Serial Boundary Scan Chain The advantage of this configuration is that only two pins on the PCB/MCM are needed for boundary scan data register support. The disadvantage is very long shifting sequences to deliver test patterns to each component, and to shift out test responses. This leads to expensive time on the external tester. As shown in Figure 41.5 [1], the single scan chain is broken into two parallel boundary scan chains, which share a common test clock (TCK). The extra pin overhead is one more pin. As there are two boundary scan chains, so the test patterns are half as long and test time is roughly halved. Here both chains share common TDI and TDO pins, so when the top two chips are being shifted, the bottom two chips must be disabled so that they do not drive their TDO lines. The opposite must hold true when the bottom two chips are being tested.
TDI
TDO
TDI
TDO
TDI
TDO
TDI
TDO
TDI
TDO
TAP Controller
TMS TCK TRST* 16-state FSM TAP Controller (Moore machine) ClockDR ShiftDR UpdateDR Reset* Select ClockIR ShiftIR UpdateIR Enable
Fig. 41.6 Top level view of TAP Controller Figure 41.6 shows a top-level view of TAP Controller. TMS and TCK (and the optional TRST*) go to a 16-state finite-state machine controller, which produces the various control signals. These signals include dedicated signals to the Instruction register (ClockIR, ShiftIR, UpdateIR) and generic signals to all data registers (ClockDR, ShiftDR, UpdateDR). The data register that actually responds is the one enabled by the conditional control signals generated at the parallel outputs of the Instruction register, according to the particular instruction. The other signals, Reset, Select and Enable are distributed as follows: Reset is distributed to the Instruction register and to the target Data Register Select is distributed to the output multiplexer Enable is distributed to the output driver amplifier
It must be noted that the Standard uses the term Data Register to mean any target register except the Instruction register
Fig. 41.7 State transition diagram of TAP controller Figure 41.7 shows the 16-state state table for the TAP controller. The value on the state transition arcs is the value of TMS. A state transition occurs on the positive edge of TCK and the controller output values change on the negative edge of TCK. The 16 states can be divided into three parts. The first part contains the reset and idle states, the second and third parts control the operations of the data and instruction registers, respectively. Since the only difference between the second and the third parts are on the registers they deal with, in the following only the states in the first and second parts are described. Similar description on the second part can be applied to the third part. 1. Test-Logic-Reset: In this state, the boundary scan circuitry is disabled and the system is in its normal function. Whenever a Reset* signal is applied to the BS circuit, it also goes back to this state. One should also notice that whatever state the TAP controller is at, it will goes back to this state if 5 consecutive 1's are applied through TMS to the TAP controller. Run-Test/Idle: This is a state at which the boundary scan circuitry is waiting for some test operations such as BIST operations to complete. One typical example is that if a BIST operation requires 216 cycles to complete, then after setting up the initial condition for the BIST operation, the TAP controller will go back to this state and wait for 216 cycles before it starts to shift out the test results. Select-DR-Scan: This is a temporary state to allow the test data sequence for the selected test-data register to be initiated.
2.
3.
4. 5.
Capture-DR: In this state, data can be loaded in parallel to the data registers selected by the current instruction. Shift-DR: In this state, test data are scanned in series through the data registers selected by the current instruction. The TAP controller may stay at this state as long as TMS=0. For each clock cycle, one data bit is shifted into (out of) the selected data register through TDI (TDO). Exit-DR: All parallel-loaded (from the Capture-DR state) or shifted (from the Shift-DR state) data are held in the selected data register in this state. Pause-DR: The BS pauses its function here to wait for some external operations. For example, when a long test data is to be loaded to the chip(s) under test, the external tester may need to reload the data from time to time. The Pause-DR is a state that allows the boundary scan architecture to wait for more data to shift in. Exit2-DR: This state represents the end of the Pause-DR operation, allows the TAP controller to go back to ShiftDR state for more data to shift in. Update-DR: The test data stored in the first stage of boundary scan cells is loaded to the second stage in this state.
6. 7.
8. 9.
ShiftDR ClockDR
Clk
operation). It is also possible to load (Capture) internal hard-wired values into the shift section of the Instruction register. The Instruction register must be at least two-bits long to allow coding of the four mandatory instructions Extest, Bypass, Sample, Preload but the maximum length of the Instruction register is not defined. In capture mode, the two least significant bits must capture a 01 pattern. (Note: by convention, the least-significant bit of any register connected between the device TDI and TDO pins, is always the bit closest to TDO.) The values captured into higher-order bits of the Instruction register are not defined in the Standard. One possible use of these higher-order bits is to capture an informal identification code if the optional 32-bit Identification register is not implemented. In practice, the only mandated bits for the Instruction register capture is the 01 pattern in the two least-significant bits. We will return to the value of capturing this pattern later in the tutorial.
Instruction Register
DR select and control signals routed to selected target register
Decode Logic
From TDI
TAP Controller
To TDO
IR Control
0 1 Higher order bits: current instruction, status bits, informal ident, results of a power-up self test,
Standard Instructions
Instruction Mandatory: Extest Bypass Sample Preload Optional: Intest Boundary scan Idcode identification (initialized state if present) Usercode Identification (for PLDs) Runbist Result register Clamp Bypass (output pins in safe state) HighZ Bypass (output pins in high-Z state) NB. All unused instruction codes must default to Bypass EXTEST: This instruction is used to test interconnect between two chips. The code for Extest used to be defined to be the all-0s code. The EXTEST instruction places an IEEE 1149.1 compliant device into an external boundary test mode and selects the boundary scan register to be connected between TDI and TDO. During this instruction, the boundary scan cells associated with outputs are preloaded with test patterns to test downstream devices. The input boundary cells are set up to capture the input data for later analysis. BYPASS: A device's boundary scan chain can be skipped using the BYPASS instruction, allowing the data to pass through the bypass register. The Bypass instruction must be assigned an all-1s code and when executed, causes the Bypass register to be placed between the TDI and TDO pins. This allows efficient testing of a selected device without incurring the overhead of traversing through other devices. The BYPASS instruction allows an IEEE 1149.1 compliant device to remain in a functional mode and selects the bypass register to be connected between the TDI and TDO pins. The BYPASS instruction allows serial data to be transferred through a device from the TDI pin to the TDO pin without affecting the operation of the device. SAMPLE/PRELOAD: The Sample and Preload instructions, and their predecessor the Sample/Preload instruction, selects the Boundary-Scan register when executed. The instruction sets up the boundary-scan cells either to sample (capture) values or to preload known values into the boundary-scan cells prior to some follow-on operation. During this instruction, the boundary scan register can be accessed via a data scan operation, to take a sample of the functional data entering and leaving the device. This instruction is also used to preload test data into the boundary-scan register prior to loading an EXTEST instruction. INTEST: With this command the boundary scan register (BSR) is connected between the TDI and the TDO signals. The chip's internal core-logic signals are sampled and captured by the BSR cells at the entry to the "Capture_DR" state as shown in TAP state transition diagram. The contents of the BSR register are shifted out via the TDO line at exits from the "Shift_DR" state. As the contents of the BSR (the captured data) are shifted out, new data are sifted in at the entries to the "Shift_DR" state. The new contents of the BSR are applied to the chip's core-logic signals during the "Update_DR" state. Version 2 EE IIT, Kharagpur 15 Selected Data Register Boundary scan (formerly all-0s code) Bypass (initialized state, all-1s code) Boundary scan (device in functional mode) Boundary scan (device in function mode)
IDCODE: This is used to select the Identification register between TDI and TDO, preparatory to loading the internally-held 32-bit identification code and reading it out through TDO. The 32 bits are used to identify the manufacturer of the device, its part number and its version number. USERCODE: This instruction selects the same 32-bit register as IDCODE, but allows an alternative 32 bits of identity data to be loaded and serially shifted out. This instruction is used for dual-personality devices, such as Complex Programmable Logic Devices and Field Programmable Gate Arrays. RUNBIST: An important optional instruction is RunBist. Because of the growing importance of internal self-test structures, the behavior of RunBist is defined in the Standard. The self-test routine must be self-initializing (i.e., no external seed values are allowed), and the execution of RunBist essentially targets a self-test result register between TDI and TDO. At the end of the self-test cycle, the targeted data register holds the Pass/Fail result. With this instruction one can control the execution of the memory BIST by the TAP controller, and hence reducing the hardware overhead for the BIST controller. CLAMP: Clamp is an instruction that uses boundary-scan cells to drive preset values established initially with the Preload instruction onto the outputs of devices, and then selects the Bypass register between TDI and TDO (unlike the Preload instruction which leaves the device with the boundary-scan register still selected until a new instruction is executed or the device is returned to the Test_Logic Reset state). Clamp would be used to set up safe guarding values on the outputs of certain devices in order to avoid bus contention problems, for example. HIGH-Z: It is similar to Clamp instruction, but it leaves the device output pins in a highimpedance state rather than drive fixed logic-1 or logic-0 values. HighZ also selects the Bypass register between TDI and TDO.
3.
So far the test architecture of boundary scan inside the chip under test has been discussed. A major problem remains is "Who is going to control the whole boundary scan test procedure?" In general there are two solutions for this problem: using an external tester and using a special onboard controller. The former is usually expensive because of the involving of an IC tester. The latter provides an economic way to complete the whole test procedure. As clear from the above description, in addition to the test data, the most important signal that a test controller has to provide is the TMS signal. There exist two methods to provide this signal in a board: the star configuration and the ring configuration as shown in Figure 41.10. In the star configuration the TMS is broadcast to all chips. Hence all chips must execute the same operation at any time. For the ring structure, the test controller provides one independent TMS signal for each chip, therefore great flexibility of the test procedure is facilitated.
#2
#2
#N
#N
(a)
(b)
Fig. 41.10 BUS master for chips with BS: (a) star structure, (b) ring structure
4.
In a board design there usually can be many JTAG compliant devices. All these devices can be connected together to form a single scan chain as illustrated in Figure 41.11, "Single Boundary Scan Chain on a Board." Alternatively, multiple scan chains can be established so parallel checking of devices can be performed simultaneously. Figure 41.11, "Single Boundary Scan Chain on a Board," illustrates the on onboard TAP controllers connected to an offboard TAP control device, such as a personal computer, through a TAP access connector. The offboard TAP control device can perform different tests during board manufacturing without the need of bed-of-nail equipment.
Test Connector
5.
One of the first tests that should be performed for a PCB test is called the infra-structure test. This test is used to determine whether all the components are installed correctly. This test relies on the fact that the last two bits of the instruction register (IR) are always ``01''. By shifting out the IR of each device in the chain, it can be determined whether the device is properly installed. This is accomplished through sequencing the TAP controller for IR read. After the infra-structure test is successful, the board level interconnect test can begin. This is accomplished through the EXTEST command. This test can be used to check out ``opens'' and ``shorts'' on the PCB. The test patterns are preloaded into the output pins of the driving devices. Then they are propagated to the receiving devices and captured in the input boundary scan cells. The result can then be shifted out through the TDO pin for analysis. These patterns can be generated and analyzed automatically, via software programs. This feature is normally offered through tools like Automatic Test Pattern Generation (ATPG) or Boundary Scan Test Pattern Generation (BTPG).
6.
Boundary Scan Description Language (BSDL) has been approved as the IEEE Std. 1149.1b (the original boundary scan standard is IEEE Std. 1149.1a) [1,6]. This VHDL compatible Version 2 EE IIT, Kharagpur 18
language can greatly reduce the effort to incorporate boundary scan into a chip, and hence is quite useful when a designer wishes to design boundary scan in his own style. Basically for those parts that are mandatory to the Std. 1149.1a such as the TAP controller and the BYPASS register, the designer does not need to describe them; they can be automatically generated. The designer only has to describe the specifications related to his own design such as the length of boundary scan register, the user-defined boundary scan instructions, the decoder for his own instructions, the I/O pins assignment. In general these descriptions are quite easy to prepare. In fact, currently many CAD tools already implement the boundary scan generation procedure and thus it may even not needed for a designer to write the BSDL file: the tools can automatically generate the needed boundary scan circuitry for any circuit design as long as the I/O of the design is specified. Any manufacturer of a JTAG compliant device must provide a BSDL file for that device. The BSDL file contains information on the function of each of the pins on the device - which are used as I/Os, power or ground. BSDL files describe the Boundary Scan architecture of a JTAGcompliant device, and are written in VHDL. The BSDL file includes: 1. Entity Declaration: The entity declaration is a VHDL construct that is used to identify the name of the device that is described by the BSDL file. 2. Generic Parameter: The Generic parameter specifies which package is described by the BSDL file. 3. Logical Port Description: lists all of the pads on a device, and states whether that pin is an input(in bit;), output(out bit;), bidirectional (inout bit;) or unavailable for boundary scan (linkage bit;). .4. Package Pin Mapping: The Package Pin Mapping shows how the pads on the device die are wired to the pins on the device package. 5. Use statements: The use statement calls VHDL packages that contain attributes, types, constants, etc. that are referenced in the BSDL File. 6. Scan Port Identification: The Scan Port Identification identifies the JTAG pins: TDI, TDO, TMS, TCK and TRST (if used). 7. TAP description: provides additional information on the device's JTAG logic; the Instruction Register length, Instruction Opcodes, device IDCODE, etc. These characteristics are device specific. 8. Boundary Register description: provides the structure of the Boundary Scan cells on the device. Each pin on a device may have up to three Boundary Scan cells, each cell consisting of a register and a latch.
12 D6 D5 D4 D3 D2 D1 CLK C O R E L O G I C Q6 Q5 Q4 Q3 Q2 Q1 D6 13 D5 14 D4 D3 D2 D1 15 16 17 1 12
TAP Controller
6 7 8 9 10 11
C O R E L O G I C
0 1 2 3 4 5
Q6 10 Q5 9 Q4 8 Q3 7 Q2 6 Q1
11
CLK
(a)
Fig. 41.12 Example to illustrate BSDL (a) core logic (b) after BS insertion
7.
The decision whether to use boundary-scan usually involves economics. Designers often hesitate to use boundary-scan due to the additional silicon involved. In many cases it may appear that the penalties outweigh the benefits for an ASIC. However, considering an analysis spanning all assembly levels and all test phases during the system's life, the benefits will usually outweigh the penalties.
Benefits
The benefits provided by boundary-scan include the following:
lower test generation costs reduced test time reduced time to market simpler and less costly testers compatibility with tester interfaces high-density packaging devices accommodation
By providing access to the scan chain I/Os, the need for physical test points on the board is eliminated or greatly reduced, leading to significant savings as a result of simpler board layouts, less costly test fixtures, reduced time on in-circuit test systems, increased use of standard interfaces, and faster time-to-market. In addition to board testing, boundary-scan allows programming almost all types of CPLDs and flash memories, regardless of size or package type, on the board, after PCB assembly. In-system programming saves money and improves throughput by reducing device handling, simplifying inventory management, and integrating the programming steps into the board production line. Version 2 EE IIT, Kharagpur 20
Penalties
The penalties incurred in using boundary-scan include the following:
extra silicon due to boundary scan circuitry added pins additional design effort degradation in performance due to gate delays through the additional circuitry increased power consumption
Table: 1 Gate requirements for a Gate Array Boundary-scan Design It must be noted that in Table 1 the boundary-scan implementation requires 868 gates, requiring an estimated 8 percent overhead. It also be noted that the cells used in this example were created prior to publication of the IEEE 1149.1 standard. If specific cell designs had been available to support the standard or if the vendor had placed the boundary-scan circuitry in areas of the ASIC not available to the user, then the design would have required less.
9.
Conclusion
Board level testing has become more complex with the increasing use of fine pitch, high pin count devices. However with the use of boundary scan the implementation of board level testing is done more efficiently and at lower cost. This standard provides a unique opportunity to simplify the design debug and test processes by enabling a simple and standard means of automatically creating and applying tests at the device, board, and system levels. Boundary scan is the only solution for MCMs and limited-access SMT/ML boards. The standard supports external testing with an ATE. The IEEE 1532-2000 In-System Configuration (ISC) standard makes use of 1149.1 boundary-scan structures within the CPLD and FPGA devices.
References
[1] IEEE-SA Standards Board, 3 Park Avenue, New York, NY 10016-5997, USA, IEEE Standard Test Access Port and Boundary-Scan Architecture, IEEE Std 1149.1-2002, (Revision of IEEE Std 1149.1-1990), http://grouper.ieee.org/groups/1149/1or http://standards.ieee.org/catalog/ Parker, The boundary-scan handbook: analog and digital, Kluwer Academic Press, 1998 (2nd Edition). M. L. Bushnell and V. D Agarwal, Essentials of Electronic Testing Kluwer academic Publishers, Norwell, MA, 2000. IEEE 1149.4 Mixed-Signal Test Bus Standard web site: http://grouper.ieee.org/groups/1149/4 IEEE 1532 In-System Configuration Standard web site: http://grouper.ieee.org/groups/1532/ Agilent Technologies BSDL verification service: http://www.agilent.com/see/bsdl_service
Problems
1. What is Boundary Scan? What is the motivation of boundary scan? 2. How boundary scan technique differs from so-called bed-of-nails techniques? 3. What are the different device packaging styles? 4. What is JTAG? 5. Give an overview of the boundary scan family i.e., 1149. 6. Show boundary scan architecture and describe functions of its elements. 7. Show the basic cell of a boundary-scan register. Describe different modes of its operation. 8. A board is composed of 100 chips with 100 pins each. The length of the total scan chain is 10,000 bits. Find a possible testing strategy to reduce the scan chain length. 9. What is TAP controller? What are the main functions of TAP controller? 10. Describe a serial boundary scan chain and its operation. What are its disadvantages and discuss a strategy to overcome these. 11. Discuss different instruction sets and their functions. 12. Considering a board populated by IEEE 1149.1-compliant devices (a "pure" boundaryscan board), summarize a board-test strategy. 13. What is the goal of the infrastructure test? Is the infrastructure test mandatory or optional? Which are the main steps of an infrastructure test? 14. Consider the example depicted in the following figure.
This circuit has two primary inputs, two primary outputs and two nets that connect the ICs one to the other. There is only 1 TAP, which connects the TDI and TDO of both ICs. Prepare a test plan for this circuit. 15. Consider a board composed of 100 40-pin Boundary-Scan devices, 2,000 interconnects, an 8-bit Instruction Register per device, a 32-bit Identification Register per device, and a 10 MHz test application rate. Compute the test time to execute a test session. 16. What is BSDL. What are the different BSDL files?
Module 8
Testing of Embedded System
Version 2 EE IIT, Kharagpur 1
Lesson 42
On-line Testing of Embedded Systems
Version 2 EE IIT, Kharagpur 2
Instructional Objectives
After going through this lesson the student would be able to Explain the meaning of the term On-line Testing Describe the main issues in on-line testing and identify applications where on-line testing are required for embedded systems Distinguish among concurrent and non-concurrent testing and their relations with BIST and on-line testing Describe an application of on-line testing for System-on-Chip
EMBEDDED SYSTEMS are computers incorporated in consumer products or other devices to perform application-specific functions. The product user is usually not even aware of the existence of these systems. From toys to medical devices, from ovens to automobiles, the range of products incorporating microprocessor-based, software controlled systems has expanded rapidly since the introduction of the microprocessor in 1971. The lure of embedded systems is clear: They promise previously impossible functions that enhance the performance of people or machines. As these systems gain sophistication, manufacturers are using them in increasingly critical applications products that can result in injury, economic loss, or unacceptable inconvenience when they do not perform as required. Embedded systems can contain a variety of computing devices, such as microcontrollers, application-specific integrated circuits, and digital signal processors. A key requirement is that these computing devices continuously respond to external events in real time. Makers of embedded systems take many measures to ensure safety and reliability throughout the lifetime of products incorporating the systems. Here, we consider techniques for identifying faults during normal operation of the productthat is, online-testing techniques. We evaluate them on the basis of error coverage, error latency, space redundancy, and time redundancy.
2.
Cost constraints in consumer products typically translate into stringent constraints on product components. Thus, embedded systems are particularly cost sensitive. In many applications, low production and maintenance costs are as important as performance. Moreover, as people become dependent on computer-based systems, their expectations of these systems availability increase dramatically. Nevertheless, most people still expect significant downtime with computer systemsperhaps a few hours per month. People are much less patient with computer downtime in other consumer products, since the items in question did not demonstrate this type of failure before embedded systems were added. Thus, complex consumer products with high availability requirements must be quickly and easily repaired. For this reason, automobile manufacturers, among others, are increasingly providing online detection and diagnosis, capabilities previously found only in very complex and expensive applications Version 2 EE IIT, Kharagpur 3
such as aerospace systems. Using embedded systems to incorporate functions previously considered exotic in low-cost, everyday products is a growing trend. Since embedded systems are frequently components of mobile products, they are exposed to vibration and other environmental stresses that can cause them to fail. Embedded systems in automotive applications are exposed to extremely harsh environments, even beyond those experienced by most portable devices. These applications are proliferating rapidly, and their more stringent safety and reliability requirements pose a significant challenge for designers. Critical applications and applications with high availability requirements are the main candidates for online testing. Embedded systems consist of hardware and software, each usually considered separately in the design process, despite progress in the field of hardware-software co design. A strong synergy exists between hardware and software failure mechanisms and diagnosis, as in other aspects of system performance. System failures often involve defects in both hardware and software. Software does not break in the common sense of the term. However, it can perform inappropriately due to faults in the underlying hardware or specification or design flaws in either hardware or software. At the same time, one can exploit the software to test for and respond to the presence of faults in the underlying hardware. Online software testing aims at detecting design faults (bugs) that avoid detection before the embedded system is incorporated and used in a product. Even with extensive testing and formal verification of the system, some bugs escape detection. Residual bugs in well-tested software typically behave as intermittent faults, becoming apparent only in rare system states. Online software testing relies on two basic methods: acceptance testing and diversity [1]. Acceptance testing checks for the presence or absence of well-defined events or conditions, usually expressed as true-or-false conditions (predicates), related to the correctness or safety of preceding computations. Diversity techniques compare replicated computations, either with minor variations in data (data diversity) or with procedures written by separate, unrelated design teams (design diversity). This chapter focuses on digital hardware testing, including techniques by which hardware tests itself, built-in self-test (BIST). Nevertheless, we must consider the role of software in detecting, diagnosing, and handling hardware faults. If we can use software to test hardware, why should we add hardware to test hardware? There are two possible answers. First, it may be cheaper or more practical to use hardware for some tasks and software for others. In an embedded system, programs are stored online in hardware-implemented memories such as ROMs (for this reason, embedded software is sometimes called firmware). This program storage space is a finite resource whose cost is measured in exactly the same way as other hardware. A function such as a test is soft only in the sense that it can easily be modified or omitted in the final implementation. The second answer involves the time that elapses between a faults occurrence and a problem arising from that fault. For instance, a fault may induce an erroneous system state that can ultimately lead to an accident. If the elapsed time between the faults occurrence and the corresponding accident is short, the fault must be detected immediately. Acceptance tests can detect many faults and errors in both software and hardware. However, their exact fault coverage is hard to measure, and even when coverage is complete, acceptance tests may take a long time to detect some faults. BIST typically targets relatively few hardware faults, but it detects them quickly. These two issues, cost and latency, are the main parameters in deciding whether to use hardware or software for testing and which hardware or software technique to use. This decision requires system-level analysis. We do not consider software methods here. Rather, we emphasize the appropriate use of widely implemented BIST methods for online hardware testing. These methods are components in the hardware-software trade-off. Version 2 EE IIT, Kharagpur 4
3.
Online testing
Faults are physical or logical defects in the design or implementation of a digital device. Under certain conditions, they lead to errorsthat is, incorrect system states. Errors induce failures, deviations from appropriate system behavior. If the failure can lead to an accident, it is a hazard. Faults can be classified into three groups: design, fabrication, and operational. Design faults are made by human designers or CAD software (simulators, translators, or layout generators) during the design process. Fabrication defects result from an imperfect manufacturing process. For example, shorts and opens are common manufacturing defects in VLSI circuits. Operational faults result from wear or environmental disturbances during normal system operation. Such disturbances include electromagnetic interference, operator mistakes, and extremes of temperature and vibration. Some design defects and manufacturing faults escape detection and combine with wear and environmental disturbances to cause problems in the field. Operational faults are usually classified by their duration: Permanent faults remain in existence indefinitely if no corrective action is taken. Many are residual design or manufacturing faults. The rest usually occur during changes in system operation such as system start-up or shutdown or as a result of a catastrophic environmental disturbance such as a collision. Intermittent faults appear, disappear, and reappear repeatedly. They are difficult to predict, but their effects are highly correlated. When intermittent faults are present, the system works well most of the time but fails under atypical environmental conditions. Transient faults appear and disappear quickly and are not correlated with each other. They are most commonly induced by random environmental disturbances. One generally uses online testing to detect operational faults in computers that support critical or high-availability applications. The goal of online testing is to detect fault effects, or errors, and take appropriate corrective action. For example, in some critical applications, the system shuts down after an error is detected. In other applications, error detection triggers a reconfiguration mechanism that allows the system to continue operating, perhaps with some performance degradation. Online testing can take the form of external or internal monitoring, using either hardware or software. Internal monitoring, also called self-testing, takes place on the same substrate as the circuit under test (CUT). Today, this usually means inside a single ICa system on a chip. There are four primary parameters to consider in designing an online-testing scheme: error coveragethe fraction of modeled errors detected, usually expressed as a percentage. Critical and highly available systems require very good error coverage to minimize the probability of system failure. error latencythe difference between the first time an error becomes active and the first time it is detected. Error latency depends on the time taken to perform a test and how often tests are executed. A related parameter is fault latency, the difference between the onset of the fault and its detection. Clearly, fault latency is greater than or equal to error latency, so when error latency is difficult to determine, test designers often consider fault latency instead. space redundancythe extra hardware or firmware needed for online testing. time redundancythe extra time needed for online testing. The ideal online-testing scheme would have 100% error coverage, error latency of 1 clock cycle, no space redundancy, and no time redundancy. It would require no redesign of the CUT and impose no functional or structural restrictions on it. Most BIST methods meet some of these constraints without addressing others. Considering all four parameters in the design of an onlineVersion 2 EE IIT, Kharagpur 5
testing scheme may create conflicting goals. High coverage requires high error latency, space redundancy, and/or time redundancy. Schemes with immediate detection (error latency equaling 1) minimize time redundancy but require more hardware. On the other hand, schemes with delayed detection (error latency greater than 1) reduce time and space redundancy at the expense of increased error latency. Several proposed delayed-detection techniques assume equiprobability of input combinations and try to establish a probabilistic bound on error latency [2]. As a result, certain faults remain undetected for a long time because tests for them rarely appear at the CUTs inputs. To cover all the operational fault types described earlier, test engineers use two different modes of online testing: concurrent and non-concurrent. Concurrent testing takes place during normal system operation, and non-concurrent testing takes place while normal operation is temporarily suspended. One must often overlap these test modes to provide a comprehensive online-testing strategy at acceptable cost.
4.
Non-concurrent testing
This form of testing is either event-triggered (sporadic) or time-triggered (periodic) and is characterized by low space and time redundancy. Event triggered testing is initiated by key events or state changes such as start-up or shutdown, and its goal is to detect permanent faults. Detecting and repairing permanent faults as soon as possible is usually advisable. Eventtriggered tests resemble manufacturing tests. Any such test can be applied online, as long as the required testing resources are available. Typically, the hardware is partitioned into components, each exercised by specific tests. RAMs, for instance, are tested with manufacturing tests such as March tests [3]. Time-triggered testing occurs at predetermined times in the operation of the system. It detects permanent faults, often using the same types of tests applied by event-triggered testing. The periodic approach is especially useful in systems that run for extended periods during which no significant events occur to trigger testing. Periodic testing is also essential for detecting intermittent faults. Such faults typically behave as permanent faults for short periods. Since they usually represent conditions that must be corrected, diagnostic resolution is important. Periodic testing can identify latent design or manufacturing flaws that appear only under certain environmental conditions. Time-triggered tests are frequently partitioned and interleaved so that only part of the test is applied during each test period.
5.
Concurrent testing
Non-concurrent testing cannot detect transient or intermittent faults whose effects disappear quickly. Concurrent testing, on the other hand, continuously checks for errors due to such faults. However, concurrent testing is not particularly useful for diagnosing the source of errors, so test designers often combine it with diagnostic software. They may also combine concurrent and non-concurrent testing to detect or diagnose complex faults of all types. A common method of providing hardware support for concurrent testing, especially for detecting control errors, is a watchdog timer [4]. This is a counter that the system resets repeatedly to indicate that the system is functioning properly. The watchdog concept assumes that the system is fault-freeor at least aliveif it can reset the timer at appropriate intervals. The ability to perform this simple task implies that control flow is correctly traversing timer-reset points. One can monitor system sequencing very precisely by guarding the watchdog- reset operations with software-based acceptance tests that check signatures computed while control Version 2 EE IIT, Kharagpur 6
flow traverses various checkpoints. To implement this last approach in hardware, one can construct more complex hardware watchdogs. A key element of concurrent testing for data errors is redundancy. For example, the duplication-with-comparison (DWC) technique5 detects any single error at the expense of 100% space redundancy. This technique requires two copies of the CUT, which operate in tandem with identical inputs. Any discrepancy in their outputs indicates an error. In many applications, DWCs high hardware overhead is unacceptable. Moreover, it is difficult to prevent minor timing variations between duplicated modules from invalidating comparison. A possible lower-cost alternative is time redundancy. A technique called double execution, or retry, executes critical operations more than once at diverse time points and compares their results. Transient faults are likely to affect only one instance of the operation and thus can be detected. Another technique, re-computing with shifted operands (RESO) [5] achieves almost the same error coverage as DWC with 100% time redundancy but very little space redundancy. However, no one has demonstrated the practicality of double execution and RESO for online testing of general logic circuits. A third, widely used form of redundancy is information redundancythe addition of redundant coded information such as a parity-check bit[5]. Such codes are particularly effective for detecting memory and data transmission errors, since memories and networks are susceptible to transient errors. Coding methods can also detect errors in data computed during critical operations.
6.
Built-in self-test
For critical or highly available systems, a comprehensive online-testing approach that covers all expected permanent, intermittent, and transient faults is essential. In recent years, BIST has emerged as an important method of testing manufacturing faults, and researchers increasingly promote it for online testing as well. BIST is a design-for-testability technique that places test functions physically on chip with the CUT, as illustrated in Figure 42.1. In normal operating mode, the CUT receives its inputs from other modules and performs the function for which it was de-signed. In test mode, a test pattern generator circuit applies a sequence of test patterns to the CUT, and a response monitor evaluates the test responses. In the most common type of BIST, the response monitor compacts the test responses to form fault signatures. It compares the fault signatures with reference signatures generated or stored on chip, and an error signal indicates any discrepancies detected. We assume this type of BIST in the following discussion. In developing a BIST methodology for embedded systems, we must consider four primary parameters related to those listed earlier for online-testing techniques: fault coveragethe fraction of faults of interest that the test patterns produced by the test generator can expose and the response monitor can detect. Most monitors produce a faultfree signature for some faulty response sequences, an undesirable property called aliasing. test set sizethe number of test patterns produced by the test generator. Test set size is closely linked to fault coverage; generally, large test sets imply high fault coverage. However, for online testing, test set size must be small to reduce fault and error latency. hardware overheadthe extra hardware needed for BIST. In most embedded systems, high hardware overhead is not acceptable.
performance penaltythe impact of BIST hardware on normal circuit performance, such as worst-case (critical) path delays. Overhead of this type is sometimes more important than hardware overhead.
System designers can use BIST for non-concurrent, online testing of a systems logic and memory[6]. They can readily configure the BIST hardware for event-triggered testing, tying the BIST control to the system reset so that testing occurs during system start-up or shutdown. BIST can also be designed for periodic testing with low fault latency. This requires incorporating a test process that guarantees the detection of all target faults within a fixed time. Designers usually implement online BIST with the goals of complete fault coverage and low fault latency. Hence, they generally design the test generator and the response monitor to guarantee coverage of specific fault models, minimum hardware overhead, and reasonable test set size. Different parts of the system meet these goals by different techniques. Test generator and response monitor implementations often consist of simple, counter like circuits; especially linear- feedback shift registers [5]. An LFSR is formed from standard flipflops, with outputs of selected flip-flops being fed back (modulo 2) to its inputs. When used as a test generator, an LFSR is set to cycle rapidly through a large number of its states. These states, whose choice and order depend on the LFSRs design parameters, define the test patterns. In this mode of operation, an LFSR is a source of pseudorandom tests that are, in principle, applicable to any fault and circuit types. An LFSR can also serve as a response monitor by counting (in a special sense) the responses produced by the tests. After receiving a sequence of test responses, an LFSR response monitor forms a fault signature, which it compares to a known or generated good signature to determine whether a fault is present. Ensuring that fault coverage is sufficiently high and the number of tests is sufficiently low are the main problems with random BIST methods. Researchers have proposed two general approaches to preserve the cost advantages of LFSRs while greatly shortening the generated test sequence. One approach is to insert test points in the CUT to improve controllability and observability. However, this approach can result in performance loss. Alternatively, one can introduce some determinism into the generated test sequencefor example, by inserting specific seed tests known to detect hard faults. Some CUTs, including data path circuits, contain hard-to detect faults that are detectable by only a few test patterns, denoted Thard. An N-bit LSFR can generate a sequence that eventually includes 2N - 1 patterns (essentially all possibilities). However, the probability that the tests in Thard will appear early in the sequence is low. In such cases, one can use deterministic testing, which tailors the generated test sequence to the CUTs functional properties, instead of random testing. Deterministic testing is especially suited to RAMs, ROMs, and other highly regular components. A deterministic technique called transparent BIST [3] applies BIST to RAMs while preserving the RAM contentsa particularly desirable feature for online testing. Keeping hardware overhead acceptably low is the main difficulty with deterministic BIST. A straightforward way to generate a specific test set is to store it in a ROM and address each stored test pattern with a counter. Unfortunately, ROMs tend to be much too expensive for storing entire test sequences. An alternative method is to synthesize a finite-state machine that directly generates the test set. However, the relatively large test set size and test vector width, as well as the test sets irregular structure, are much more than current FSM synthesis programs can handle. Another group of test generator design methods, loosely called deterministic, attempt to embed a complete test set in a specific generated sequence. Again the generated tests must meet the coverage, overhead, and test size constraints weve discussed. An earlier article [7] presents a representative BIST design method for data path circuits that meets these requirements. The test Version 2 EE IIT, Kharagpur 8
generators structure, based on a twisted-ring counter, is tailored to produce a regular, deterministic test sequence of reasonable size. One can systematically rescale the test generator as the size of anon-bit-sliced data path CUT, such as a carry-look-ahead adder, changes. Instead of using an LFSR, a straightforward way to compress test response data and produce a fault signature is to use an FSM or an accumulator. However, FSM hardware overhead and accumulator aliasing are difficult parameters to control. Keeping hardware overhead acceptably low and reducing aliasing are the main difficulties in response monitor design.
Inputs Test pattern sequence Test generator Multiplexer Circuit under test (CUT) Response monitor Error
Outputs
Control
An Example
IEEE 1149.4 based Architecture for OLT of a Mixed Signal SoC Analog/mixed signal blocks like DCDC converters, PLLs, ADCs, etc. and digital modules like application specific processors, micro controllers, UATRs, bus controllers etc. typically exist in SoCs. The have been used as cores of the SoC benchmark Controller for Electro-Hydraulic Actuators which is being used as the case study. It is to be noted that this case study is used only for illustration and the architecture is generic which applies for all Mixed Signal SoCs. All the digital blocks like instruction specific processor, microcontroller, bus controller etc. have been designed with OLT capability using the CAD tool descried in [8]. Further, all these digital cores are IEEE 1149.1 compliant. In other words, all the digital cores are designed with a blanket comprising an on-line monitor and IEEE 1149.1 compliance circuitry. For the analog modules the observer have been designed using ADCs and digital logic [9]. The test blanket for the analog/mixed signal cores comprises IEEE 1149.4 circuitry. A dedicated test controller is designed and placed on-chip that schedules the various lines tests during the operation of the SoC. The block diagram of the SoC being used as the case study is illustrated in Figure 42.2. The basic functionality of the SoC under consideration is discussed below.
motion of the dual tandem hydraulic jack. The motion of the spool of the hydraulic servo valve (Master control Valve), regulates the flow of oil to the tandem jacks, thereby determine the ram position. The Spool and ram positions are controlled by means of feedback loops. The actuator system is controlled by the on-board flight electronics. A lot of work has been done for On-line fault detection and diagnosis of the mechanical system, however OLT of the electronic systems were hardly looked into. It is to be noted that as Electro Hydraulic Actuators are mainly used in mission critical systems like avionics; for reliable operation on-line fault detection and diagnosis is required for both the mechanical and the electronic sub-systems. The IEEE 1149.1 and 1149.4 circuitry are utilized to perform the BIST of the interconnecting buses in between the cores. It may be noted that on-line tests are carried only for cores, which are more susceptible to failures. However, the interconnecting buses are tested during startup and at intervals when cores being connected by them are ideal. The test scheduling logic can be designed as suggested in [10]. The following three classes of tests are carried in the SoC:
7.
References
1) M.R. Lyu, ed., Software Fault Tolerance, John Wiley & Sons, New York, 1995. 2) K.K. Saluja, R. Sharma, and C.R. Kime, A Concurrent Testing Technique for Digital Circuits, IEEE Trans. Computer-Aided Design, Vol. 7, No. 12, Dec. 1988, pp. 12501259. 3) M. Nicolaidis, Theory of Transparent BIST for RAMs, IEEE Trans. Computers, Vol. 45, No. 10, Oct. 1996, pp. 1141-1156.
4) A. Mahmood and E. McCluskey, Concurrent Error Detection Using Watchdog ProcessorsA Survey, IEEE Trans. Computers, Vol. 37, No. 2, Feb. 1988, pp. 160-174. 5) B.W. Johnson, Design and Analysis of Fault Tolerant Digital Systems, Addison-Wesley, Reading, Mass., 1989. 6) B.T. Murray and J.P. Hayes, Testing ICs: Getting to the Core of the Problem, Computer, Vol. 29, No. 11, Nov. 1996, pp. 32-45. 7) H. Al-Asaad, J.P. Hayes, and B.T. Murray, Scalable Test Generators for High-Speed Data Path Circuits, J. Electronic Testing: Theory and Applications, Vol. 12, No. 1/2, Feb./Apr. 1998, pp. 111-125 (reprinted in On-Line Testing for VLSI, M. Nicolaidis, Y. Zorian, and D.K. Pradhan, eds., Kluwer, Boston, 1998). 8) A Formal Approach to On-Line Monitoring of Digital VLSI Circuits: Theory, Design and Implementation, Biswas, S Mukhopadhyay, A Patra, Journal of Electronic Testing: Theory and Applications, Vol. 20, October 2005, pp-503-537. 9) S. Biswas, B Chatterjee, S Mukhopadhyay, A Patra, A Novel Method for On-Line Testing of Mixed Signal System On a Chip: A Case study of Base Band Controller, 29th National System Conference, IIT Mumbai, INDIA 2005, pp 2.1-2.23. 10) An Optimal Test Sequence for the JTAG/IEEE P1149.1 Test Access Port Controller, A.T. Dahbura, M.U. Uyar, Chi. W. Yau, International Test Conference, USA, 1998, pp 55-62.
ADC DAC
AB1 AB2
Power supply to the cores Data and Control paths IEEE 1149.4/1149.1 Boundary Scan Bus Analog Buses (1149.4) AB1 and AB2 Digital Cores with on line Digital monitors [6] (FPGA) Analog/Mixed Signal Cores with Along Monitors [3] (ASIC) Program running in PC and data I/O using cards HILS
Fig. 42.2 Block Diagram of the SOC Representing On-Line Test Capability