Complete Embedded System

Module 1
Introduction
Version 2 EE IIT, Kharagpur 1
Lesson 1
Introduction to Real Time Embedded Systems Part I
Example, Definitions, Common Architecture Instructional Objectives

After going through this lesson the student would be able to Know what an embedded system is distinguish a Real Time Embedded System from other systems tell the difference between real and non-real time Learn more about a mobile phone Know the architecture Tell the major components of an Embedded system
Pre-Requisite
Digital Electronics, Microprocessors
Introduction
In the day-to-day life we come across a wide variety of consumer electronic products. We are habituated to use them easily and flawlessly to our advantage. Common examples are TV Remote Controllers, Mobile Phones, FAX machines, Xerox machines etc. However, we seldom ponder over the technology behind each of them. Each of these devices does have one or more programmable devices waiting to interact with the environment as effectively as possible. These are a class of embedded systems and they provide service in real time. i.e. we need not have to wait too long for the action. Let us see how an embedded system is characterized and how complex it could be? Take example of a mobile telephone: (Fig. 1.1)
Fig. 1.1 Mobile Phones
When we want to purchase any of them what do we look for? Let us see what are the choices available? Phone Price Phone 1 Rs 5000/Weight / Size 88.1 x 47.6 x 23.6 mm 116 g Screen TFT1 65k Color 96x32 screen Games Camera Yes 4 x Zoom Radio Ring tones Memory No Polyphonic
Phone 2 Rs 6000/-
89 x 49 x 24.8 mm 123 g
Phone 3 Rs 5000/-
133.7 x 69.7 x 20.2mm 137g
Stauntman2 & Monopoly3 included more downloadable J2ME TFT Games: 65k Stauntman Color and 176x220 Monopoly screen More downloadable Symbian and 176 x Java 208 download pixel games or backlit packaged on screen MMC cards with 4096 colors
Integrated Digital Camera 1 M Pixel
No
Polyphonic and MP3
No
FM Stereo
3.4 MB user memory built in.
Besides the above tabulated facts about the mobile handset, being a student of technology you may also like to know the following Network type GSM2 or CDMA3 (Bandwidth), Battery: Type and ampere hour Talk-time per one charge, Standby time
Short for thin film transistor, a type of LCD flat-panel display screen, in which each pixel is controlled by from one to four transistors. The TFT technology provides better resolution of all the flat-panel techniques, but it is also the most expensive. TFT screens are sometimes called active-matrix LCDs.
short form of Global System for Mobile Communications, one of the leading digital cellular systems. GSM uses narrowband Time Division Multiple Access (TDMA), which allows eight simultaneous calls on the same radio frequency. GSM was first introduced in 1991. As of the end of 1997, GSM service was available in more than 100 countries and has become the de facto standard in Europe and Asia.
3
Short form of Code-Division Multiple Access, a digital cellular technology that uses spread-spectrum techniques. Unlike competing systems, such as GSM, that use TDMA, CDMA does not assign a specific frequency to each user. Instead, every channel uses the full available spectrum. Individual conversations are encoded with a pseudo-random digital sequence. CDMA is a military technology first used during World War II by the English allies to foil German attempts at jamming transmissions. The allies decided to transmit over several frequencies, instead of one, making it difficult for the Germans to pick up the complete signal.
From the above specifications it is clear that a mobile phone is a very complex device which houses a number of miniature gadgets functioning coherently on a single device. Moreover each of these embedded gadgets such as digital camera or an FM radio along with the telephone has a number of operating modes such as: you may like to adjust the zoom of the digital camera, you may like to reduce the screen brightness, you may like to change the ring tone, you may like to relay a specific song from your favorite FM station to your friend using your mobile You may like to use it as a calculator, address book, emailing device etc.
These variations in the functionality can only be achieved by a very flexible device. This flexible device sitting at the heart of the circuits is none other than a Customized Microprocessor better known as an Embedded Processor and the mobile phone housing a number of functionalities is known as an Embedded System. Since it satisfies the requirement of a number of users at the same time (you and your friend, you and the radio station, you and the telephone network etc) it is working within a timeconstraint, i.e. it has to satisfy everyone with the minimum acceptable delay. We call this as to work in Real Time. This is unlike your holidaying attitude when you take the clock on your stride. We can also say that it does not make us wait long for taking our words and relaying them as well as receiving them, unlike an email server, which might take days to receive/deliver your message when the network is congested or slow. Thus we can name the mobile telephone as a Real Time Embedded System (RTES)
Definitions
Now we are ready to take some definitions
Real Time
Real-time usually means time as prescribed by external sources For example the time struck by clock (however fast or late it might be). The timings generated by your requirements. You may like to call someone at mid-night and send him a picture. This external timing requirements imposed by the user is the real-time for the embedded system.
Embedded (Embodiment)
Embodied phenomena are those that by their very nature occur in real time and real space In other words, A number of systems coexist to discharge a specific function in real time Thus A Real Time Embedded System (RTES) is precisely the union of subsystems to discharge a specific task coherently. Hence forth we call them as RTES. RTES as a generic term may mean a wide variety of systems in the real world. However we will be concerned about them which use programmable devices such as microprocessors or microcontrollers and have specific functions. We shall characterize them as follows.
Characteristics of an Rtes Single-Functioned

Here single-functioned means specific functions. The RTES is usually meant for very specific functions. Generally a special purpose microprocessor executes a program over and over again for a specific purpose. If the user wants to change the functionality, e.g. changing the mobile phone from conversation to camera mode or calculator mode the program gets flushed out and a new program is loaded which carries out the requisite function. These operations are monitored and controlled by an operating system called as Real Time Operating System (RTOS) which has much simpler complexity but more rigid constraints as compared to the conventional operating systems such as Micro Soft Windows and Unix etc.
Tightly Constrained
The constraints on the design and marketability of RTES are more rigid than their non-realtime non-embedded counter parts. Time-domain constraints are the first thing that is taken care while developing such a system. Size, weight, power consumption and cost4 are the other major factors.
Reactive and Real Time

Many embedded systems must continually react to changes in the systems environment and must compute certain results in real time without delay. For example, a cars cruise controller continually monitors and reacts to speed and brake sensors. It must compute acceleration or deceleration amounts repeatedly within a limited time; a delayed computation could result in a failure to maintain control of the car. In contrast a desktop computer system typically focuses on computations, with relatively infrequent (from the computers perspective) reactions to input devices. In addition, a delay in those computations, while perhaps inconvenient to the computer user, typically does not result in a system failure.
Very few in India will be interested to buy a mobile phone if it costs Rs50,000/- even if it provides you a faster processor with 200MB of memory to store your address, your favorite mp3 music and plays them , acts as a smallscreen TV whenever you desire, takes your call intelligently However in USA majority can afford it !!!!!!
Common Architecture of Real Time Embedded Systems

Unlike general purpose computers a generic architecture can not be defined for a Real Time Embedded Systems. There are as many architecture as the number of manufacturers. Generalizing them would severely dilute the soul purpose of embodiment and specialization. However for the sake of our understanding we can discuss some common form of systems at the block diagram level. Any system can hierarchically divided into subsystems. Each subsystem may be further segregated into smaller systems. And each of these smaller systems may consist of some discrete parts. This is called Hardware configuration. Some of these parts may be programmable and therefore must have some place to keep these programs. In RTES the on-chip or on-board non-volatile memory does keep these programs. These programs are the part of the Real Time Operating System (RTOS) and continually run as long as the gadget is receiving power. A part of the RTOS also executes itself in the stand-by mode while taking a very little power from the battery. This is also called the sleep mode of the system. Both the hardware and software coexist in a coherent manner. Tasks which can be both carried out by software and hardware affect the design process of the system. For example a multiplication action may be done by hardware or it can be done by software by repeated additions. Hardware based multiplication improves the speed at the cost of increased complexity of the arithmetic logic unit (ALU) of the embedded processor. On the other hand software based multiplication is slower but the ALU is simpler to design. These are some of the conflicting requirements which need to be resolved on the requirements as imposed by the overall system. This is known as Hardware-Software Codesign or simply Codesign. Let us treat both the hardware and the imbibed software in the same spirit and treat them as systems or subsystems. Later on we shall know where to put them together and how. Thus we can now draw a hierarchical block diagram representation of the whole system as follows:
System
Subsystems
Components
= interfaces = key interface = uses open standards Fig. 1.2 The System Interface and Architecture The red and grey spheres in Fig.1.2 represent interface standards. When a system is assembled it starts with some chassis or a single subsystem. Subsequently subsystems are added onto it to make it a complete system. Let us take the example of a Desktop Computer. Though not an Embedded System it can give us a nice example of assembling a system from its subsystems. You can start assembling a desktop computer (Fig.1.3) starting with the chassis and then take the SMPS (switched mode power supply), motherboard, followed by hard disk drive, CDROM drive, Graphic Cards, Ethernet Cards etc. Each of these subsystems consists of several components e.g. Application Specific Integrated Circuits (ASICs), microprocessors, Analog as well as Digital VLSI circuits, Miniature Motor and its control electronics, Multilevel Power supply units crystal clock generators, Surface mounted capacitors and resistors etc. In the end you close the chassis and connect Keyboard, Mouse, Speakers, Visual Display Units, Ethernet Cable, Microphone, Camera etc fitting them into certain well-defined sockets. As we can see that each of the subsystems inside or outside the Desktop has cables fitting well into the slots meant for them. These cables and slots are uniform for almost any Desktop you choose to assemble. The connection of one subsystem into the other and vice-versa is known as Interfacing. It is so easy to assemble because they are all standardized. Therefore, standardization of the interfaces is most essential for the universal applicability of the system and its compatibility with other systems. There can be open standards which makes it exchange Version 2 EE IIT, Kharagpur 9
information with products from other companies. It may have certain key standards, which is only meant for the specific company which manufactures them.
SMPS
CDROM drive
Hard Disk drive
Interface Cables Mother Board
Fig. 1.3 Inside Desktop Computer A Desktop Computer will have more open standards than an Embedded System. This is because of the level of integration in the later. Many of the components of the embedded systems are integrated on to a single chip. This concept is known as System on Chip (SOC) design. Thus there are only few subsystems left to be connected. Analyzing the assembling process of a Desktop let us comparatively assess the possible subsystems of the typical RTES. One such segregation is shown in Fig.1.4. The explanation of various parts as follows: User Interface: for interacting with users. May consists of keyboard, touch pad etc ASIC: Application Specific Integrated Circuit: for specific functions like motor control, data modulation etc. Microcontroller(C): A family of microprocessors
Real Time Operating System (RTOS): contains all the software for the system control and user interface Controller Process: The overall control algorithm for the external process. It also provides timing and control for the various units inside the embedded system. Digital Signal Processor (DSP) a typical family of microprocessors DSP assembly code: code for DSP stored in program memory Dual Ported Memory: Data Memory accessible by two processors at the same time CODEC: Compressor/Decompressor of the data User Interface Process: The part of the RTOS that runs the software for User Interface activities Controller Process: The part of the RTOS that runs the software for Timing and Control amongst the various units of the embedded system
User Interface Controller Process
ASIC
RTOS User Interface Process System Bus
DSP assembly code
Digital Signal Processor
Digital Signal Processor
DSP assembly code
Dual-port memory
CODEC Hardware Software
Fig. 1.4 Architecture of an Embedded System The above architecture represents a hypothetical Embedded System (we will see more realistic ones in subsequent examples). More than one microprocessor (2 DSPs and 1 C) are employed here to carry out different tasks. As we will learn later, the C is generally meant for simpler and slower jobs such as carrying out a Proportional Integral (PI) control action or interpreting the user commands etc. The DSP is a more heavy duty processor capable of doing real time signal processing and control. Both the DSPs along with their operating systems and codes are independent of each other. They share the same memory without interfering with each other. This kind of memory is known as dual ported memory or two-way post-box memory. The Real Time Operating System (RTOS) controls the timing requirement of all the devices. It executes the over all control algorithm of the process while diverting more complex tasks to the DSPs. It also specifically controls the C for the necessary user interactivity. The ASICs are specialized Version 2 EE IIT, Kharagpur 11
units capable of specialized functions such as motor control, voice encoding, modulation/demodulation (MODEM) action etc. They can be digital, analog or mixed signal VLSI circuits. CODECs are generally used for interfacing low power serial Analog-to-Digital Converters (ADCs). The analog signals from the controlled process can be monitored through an ADC interfaced through this CODEC.
Please click on
Questions and Answers

Q1 Which of the following is a real time embedded system? Justify your answer (a) Ceiling Fan (b) Microwave Oven (c) Television Set (d) Desktop Key Board (e) Digital Camera Ans: (b) and (e) are embedded systems (a) (b) Ceiling Fans: These are not programmable. & (e) obey all definitions of Embedded Systems such as (i) Working in Real Time (ii) Programmable (iii) A number of systems coexist on a single platform to discharge one function(single functioned) Television Set: Only a small part of it is programmable. It can work without being programmable. It is not tightly constrained. Desktop Keyboard: Though it has a processor normally it is not programmable.
(c) (d)
Definition of Real Time Systems

An operation within a larger dynamic system is called a real-time operation if the combined reaction- and operation-time of a task operating on current events or input, is no longer than the maximum delay allowed, in view of circumstances outside the operation. The task must also occur before the system to be controlled becomes unstable. A real-time operation is not necessarily fast, as slow systems can allow slow real-time operations. This applies for all types of dynamically changing systems. The polar opposite of a real-time operation is a batch job with interactive timesharing falling somewhere in between the two extremes. Alternately, a system is said to be hard real-time if the correctness of an operation depends not only upon the logical correctness of the operation but also upon the time at which it is performed. An operation performed after the deadline is, by definition, incorrect, and usually has no value. In a soft real-time system the value of an operation declines steadily after the deadline expires.
Embedded System
An embedded system is a special-purpose system in which the computer is completely encapsulated by the device it controls. Unlike a general-purpose computer, such as a personal computer, an embedded system performs pre-defined tasks, usually with very specific requirements. Since the system is dedicated to a specific task, design engineers can optimize it, reducing the size and cost of the product. Embedded systems are often mass-produced, so the cost savings may be multiplied by millions of items. Version 2 EE IIT, Kharagpur 13
Handheld computers or PDAs are generally considered embedded devices because of the nature of their hardware design, even though they are more expandable in software terms. This line of definition continues to blur as devices expand. Q.2 Write five advantages and five disadvantages of embodiment. Ans: Five advantages: 1. 2. 3. 4. 5. Five disadvantages 1. 2. 3. 4. 5.
Smaller Size Smaller Weight Lower Power Consumption Lower Electromagnetic Interference Lower Price
Lower Mean Time Between Failure Repair and Maintenance is not possible Faster Obsolesce Unmanageable Heat Loss Difficult to Design
Q3. What do you mean by Reactive in Real Time. Cite an example. Ans: Many embedded systems must continually react to changes in the systems environment and must compute certain results in real time without delay. For example, a cars cruise controller continually monitors and reacts to speed and brake sensors. It must compute acceleration or deceleration amounts repeatedly within a limited time; a delayed computation could result in a failure to maintain control of the car. In contrast a desktop computer system typically focuses on computations, with relatively infrequent (from the computers perspective) reactions to input devices. In addition, a delay in those computations, while perhaps inconvenient to the computer user, typically does not result in a system failure. Q4. Give at least five examples of embedded systems you are using/watching in your day to day life. (i) Mobile Telephone (ii)Digital Camera (iii) A programmable calculator (iv) An iPod digital blood pressure machine (v) A
iPod: The iPod is a brand of portable media players designed and marketed by Apple Computer. Devices in the iPod family are designed around a central scroll wheel (except for the iPod shuffle) and provide a simple user interface. The full-sized model stores media on a built-in hard drive, while the smaller iPod use flash memory. Like many digital audio players, iPods can serve as external data storage devices when connected to a computer.
Q5. Write the model number and detailed specification of your/friends mobile telephone. Manufacturer Model: Network Types: EGSM/ GSM /CDMA Form Factor: The industry standard that defines the physical, external dimensions of a particular device. The size, configuration, and other specifications used to describe hardware. Battery Life Talk (hrs): Battery Life Standby (hrs): Battery Type: Measurements Weight: Dimensions: Display Display Type: Colour or Black & White Display Size (px): Display Colours: General Options Camera: Mega Pixel: Email Client: Games: Yes High Speed Data: MP3 Player: PC Sync: Yes Phonebook: Platform Series Polyphonic Ring tones: Predictive Text: Streaming Multimedia: Text Messages: Wireless Internet: Opera Other Options Alarm: Bluetooth: Calculator: Calendar: Data Capable: EMS: FM Radio: Graphics (Custom): Infrared: Speaker Phone: USB: Vibrate:
Module 1
Introduction
Lesson 2
Introduction to Real Time Embedded Systems Part II
Structure and Design Instructional Objectives

After going through this lesson the student will Learn more about the numerous day-to-day real time embedded systems Learn the internal hardware of a typical mobile phone Learn about the important components of an RTES Learn more about a mobile phone Learn about the various important design issues Also learn the design flow
Pre-Requisite
Common Examples Of Embedded Systems

Some of the common examples of Embedded Systems are given below: Consumer electronics cell phones, pagers, digital cameras, camcorders, DVD players, portable video games, calculators, and personal digital assistants etc.
Fig. 2.1(a) Digital Camera
Fig. 2.1(b) Camcorder
Fig. 2.1(c) Personal Digital Assistants
Home appliances microwave ovens, answering machines. thermostats, home security systems, washing machines. and lighting systems etc.
Fig. 2.1(d) Microwave Oven
Fig. 2.1(e) Washer and Dryers
office automation fax machines, copiers, printers, and scanners
Fig. 2.1(f) Fax cum printer cum copier
business equipment electronic cash registers, curbside check-in, alarm systems, card readers product scanners, and automated teller machines
Fig. 2.1(g) Electronic Cash Registers
Fig. 2.1(h)Electronic Card Readers
Fig. 2.1(i)Automated Teller Machines
automobiles Electronic Control Unit(ECU) which includes transmission control, cruise control, fuel injection, antilock brakes, and active suspension in the same or separate modules.
Fig. 2.1(j)ECU of a Vehicle
Mobile Phone
Let us take the same mobile phone as discussed in Lesson 1 as example for illustrating the typical architecture of RTES. In general, a cell phone is composed of the following components:

A Circuit board (Fig. 2.2) Antenna Microphone Speaker Liquid crystal display (LCD) Keyboard Battery
Fig. 2.2 The Cell Phone Circuitry
RF receiver (Rx) DSP Antenna RF transmitter (Tx)
Speaker Microphone
Microcontroller
Display Keyboard
Fig. 2.3 The block diagram A typical mobile phone handset (Fig. 2.3) should include standard I/O devices (keyboard, LCD), plus a microphone, speaker and antenna for wireless communication. The Digital Signal Processor (DSP) performs the signal processing, and the micro-controller controls the user interface, battery management, call setup etc. The performance specification of the DSP is very crucial since the conversion has to take place in real time. This is why almost all cell phones contain such a special processor dedicated for making digital-to-analog (DA) and analog-todigital(AD) conversions and real time processing such as modulation and demodulation etc. The Read Only Memory (ROM) and flash memory (Electrically Erasable and Programmable Memory) chips provide storage for the phones operating system(RTOS) and various data such as phone numbers, calendars information, games etc. Version 2 EE IIT, Kharagpur 7
Components of an Embedded System

By this time we know where are our Embedded Systems and what makes them stand out from other systems like Calculators, Desktop Computers, and our Old Television Sets. We have also developed some 6th sense to guess the components of an RTES.
1. Microprocessor
This is the heart of any RTES. The microprocessors used here are different from the general purpose microprocessors like Pentium Sun SPARC etc. They are designed to meet some specific requirements. For example Intel 8048 is a special purpose microprocessor which you will find in the Keyboards of your Desktop computer. It is used to scan the keystrokes and send them in a synchronous manner to your PC. Similarly mobile phones Digital Cameras use special purpose processors for voice and image processing. A washer and dryer may use some other type of processor for Real Time Control and Instrumentation.
2. Memory
The microprocessor and memory must co-exit on the same Power Circuit Board(PCB) or same chip. Compactness, speed and low power consumption are the characteristics required for the memory to be used in an RTES. Therefore, very low power semiconductor memories are used in almost all such devices. For housing the operating system Read Only Memory(ROM) is used. The program or data loaded might exist for considerable duration. It is like changing the setup of your Desktop Computer. Similar user defined setups exist in RTES. For example you may like to change the ring tone of your mobile and keep it for some time. You may like to change the screen color etc. In these cases the memory should be capable of retaining the information even after the power is removed. In other words the memory should be non-volatile and should be easily programmable too. It is achieved by using Flash1 memories.
3. Input Output Devices and Interfaces

Input/Output interfaces are necessary to make the RTES interact with the external world. They could be Visual Display Units such as TFT screens in a mobile phone, touch pad key board, antenna, microphones, speakers etc. These RTES should also have open interfaces to other devices such as Desktop Computers, Local Area Networks (LAN) and other RTES. For example you may like to download your address book into your personal digital assistant (PDA). Or you may like to download some mp3 songs from your favorite internet site into your mp3 player. These input/output devices along with standard software protocols in the RTOS provide the necessary interface to these standards.
A memory technology similar in characteristics to EPROM(Erasable Programmable Read Only Memory) memory, with the exception that erasing is performed electrically instead of via ultraviolet light, and, depending upon the organization of the flash memory device, erasing may be accomplished in blocks (typically 64k bytes at a time) instead of the entire device.
4. Software
The RTES is the just the physical body as long as it is not programmed. It is like the human body without life. Whenever you switch on your mobile telephone you might have marked some activities on the screen. Whenever you move from one city to the other you might have noticed the changes on your screen. Or when you are gone for a picnic away from your city you might have marked the no-signal sign. These activities are taken care of by the Real Time Operating System sitting on the non-volatile memory of the RTES. Besides the above an RTES may have various other components and Application Specific Integrated Circuits (ASIC) for specialized functions such as motor control, modulation, demodulation, CODEC. The design of a Real Time Embedded System has a number of constraints. The following section discusses these issues.
Design Issues
The constraints in the embedded systems design are imposed by external as well as internal specifications. Design metrics are introduced to measure the cost function taking into account the technical as well as economic considerations.
Design Metrics
A Design Metric is a measurable feature of the systems performance, cost, time for implementation and safety etc. Most of these are conflicting requirements i.e. optimizing one shall not optimize the other: e.g. a cheaper processor may have a lousy performance as far as speed and throughput is concerned. Following metrics are generally taken into account while designing embedded systems
NRE cost (nonrecurring engineering cost)

It is one-time cost of designing the system. Once the system is designed, any number of units can be manufactured without incurring any additional design cost; hence the term nonrecurring. Suppose three technologies are available for use in a particular product. Assume that implementing the product using technology A would result in an NRE cost of $2,000 and unit cost of $100, that technology B would have an NRE cost of $30,000 and unit cost of $30, and that technology C would have an NRE cost of $100,000 and unit cost of $2. Ignoring all other design metrics, like time-to-market, the best technology choice will depend on the number of units we plan to produce.
Unit cost
The monetary cost of manufacturing each copy of the system, excluding NRE cost. Version 2 EE IIT, Kharagpur 9
Size
The physical space required by the system, often measured in bytes for software, and gates or transistors for hardware.
Performance
The execution time of the system
Power Consumption
It is the amount of power consumed by the system, which may determine the lifetime of a battery, or the cooling requirements of the IC, since more power means more heat.
Flexibility
The ability to change the functionality of the system without incurring heavy NRE cost. Software is typically considered very flexible.
Time-to-prototype
The time needed to build a working version of the system, which may be bigger or more expensive than the final system implementation, but it can be used to verify the systems usefulness and correctness and to refine the systems functionality.
Time-to-market
The time required to develop a system to the point that it can be released and sold to customers. The main contributors are design time, manufacturing time, and testing time. This metric has become especially demanding in recent years. Introducing an embedded system to the marketplace early can make a big difference in the systems profitability.
Maintainability
It is the ability to modify the system after its initial release, especially by designers who did not originally design the system.
Correctness
This is the measure of the confidence that we have implemented the systems functionality correctly. We can check the functionality throughout the process of designing the system, and we can insert test circuitry to check that manufacturing was correct.
The Performance Design Metric

Performance of a system is a measure of how long the system takes to execute our desired tasks. Version 2 EE IIT, Kharagpur 10
The two main measures of performance are:
Latency or response time

This is the time between the start of the tasks execution and the end. For example, processing an image may take 0.25 second.
Throughput
This is the number of tasks that can be processed per unit time. For example, a camera may be able to process 4 images per second These are the some of the cost measures for developing an RTES. Optimization of the overall cost of design includes each of these factors taken with some multiplying factors depending on their importance. And the importance of each of these factors depends on the type of application. For instance in defense related applications while designing an anti-ballistic system the execution time is the deciding factor. On the other hand, for de-noising a photograph in an embedded camera in your mobile handset the execution time may be little relaxed if it can bring down the cost and complexity of the embedded Digital Signal Processor. The design flow of an RTES involves several steps. The cost and performance is tuned and finetuned in a recursive manner. An overall design methodology is enumerated below.
Design Methodology (Fig. 2.4)

System Requirement and Specifications Define the problem What your embedded system is required to do? Define the requirements (inputs, outputs, control) What are the inputs and outputs of your system? Write down the specifications for them Specify if the signals are in digital or analogue form. Specify the voltage levels, frequency etc. The design task can be further segregated into the following steps
System level Design

Find out the possible subsystems of the system and the interconnections between them.
Sub-system or Node Level design

Each of these subsystems can be termed as the nodes. Elaborate on each of these subsystems and further make the block diagram and component level interconnections.
Processor Level Design

Each subsystem may consist of processor, memory, I/O devices. Specification and design at this level is required now. Version 2 EE IIT, Kharagpur 11
Task Level Design

Complete interconnection of these subsystems depending on the tasks they would perform.
Overall System specifications

Input to the design
System level design Node Level Specifications

Output to node level design
Node level design Processor Level Specifications

Output to processor level design
Processor level design Task Specifications

Output to task level design
Task level design

Fig. 2.4 The design approach
Conclusion
The scope of embedded systems has been encompassing more and more diverse disciplines of technology day by day. Obsolescence of technology occurs at a much faster pace as compared to the same in other areas. The development of Ultra-Low-Power VLSI mixed signal technology is the prime factor in the miniaturization and enhancement of the performance of the existing systems. More and more systems are tending to be compact and portable with the RTES technology. The future course of embedded systems depends on the advancements of sensor technology, mechatronics and battery technology. The design of these RTES by and large is application specific. The time-gap between the conception of the design problem and marketing has been the key factor for the industry. Most of the cases for very specific applications the system needs to be developed using the available processors rather than going for a custom design.
Questions
Q1. Give one example of a typical embedded system other than listed in this lecture. Draw the block diagram and discuss the function of the various blocks. What type of embedded processor they use? Ans:
Example 1: A handheld Global Positioning System Receiver
For details please http://www.gpsworld.com/ A GPS receiver receives signals from a constellation of at least four out of a total of 24 satellites. Based on the timing and other information signals sent by these satellites the digital signal processor calculates the position using triangulation.
The major block diagram is divided into (1) Active Antenna System (2)RF/IF front end (3) The Digital Signal Processor(DSP) The Active Antenna System houses the antenna a band pass filter and a low noise amplifier (LNA) The RF/IF front end houses another band pass filter, the RF amplifier and the demodulator and A/D converter. The DSP accepts the digital data and decodes the signal to retrieve the information sent by the GPS satellites. Q2. Discuss about the Hard Disk Drive housed in your PC. Is it an RTES?
Ans: Hard drives have two kinds of components: internal and external. External components are located on a printed circuit board called logic board while internal components are located in a sealed chamber called HDA or Hard Drive Assembly. For details browse http://www.hardwaresecrets.com/article/177/3 The big circuit is the controller. It is in charge of everything: exchanging data between the hard drive and the computer, controlling the motors on the hard drive, commanding the heads to read or write data, etc. All these tasks are carried out as demanded by the processor sitting on the motherboard. It can be verified to be single-functioned, tightly constrained, Therefore one can say that a Hard Disk Drive is an RTES.
Q3. Elaborate on the time-to-market design metric. Ans: The time required to develop a system to the point that it can be released and sold to customers. The main contributors are design time, manufacturing time, and testing time. This metric has become especially demanding in recent years. Introducing an embedded system to the marketplace early can make a big difference in the systems profitability. Q4. What is Moores Law? How was it conceived? Moore's law is the empirical observation that the complexity of integrated circuits, with respect to minimum component cost, doubles every 24 months. It is attributed to Gordon E. Moor, a cofounder of Intel.
References and Further Reading

[1] Richard Bohuslav Kosik , Digital ignition & Electronic fuel injection Department of Computer Science and Electrical Engineering The University of Queensland, Australia, Bachelors Thesis, October 2000 Frank Vahid, Tony Givargis, Embedded System Design, A Unified Hardware/Software Introduction,John Wiley and Sons Inc, 2002 Wayne Wolf, Computers as Components, Morgan Kaufmann, Harcourt India,2001 A.M Fox, J.E. Cooling, N.S. Cooling, Integrated Design approach for real time embedded systems, Proc. IEE-Softw., Vo.146, No.2., April 1999, page 75-85. Phen Edwards, Luciano Lavagno, Dward A. Lee.Alberto Sangiovanni- Vincentelli , Design of Embedded Systems: Formal Models, Validation, and Synthesis, PROCEEDINGS OF THE IEEE, VOL. 85, NO. 3, MARCH 1997, page-366-390 J.A. Debardelaben, V. K. Madisetti, A. J. Gadeint, Incorporating Cost Modeling in Embedded-System Design, IEEE Design and Test of Computers, July-September1997, Page 24-35
[2] [3] [4] [5]
[6]
Module 1
Introduction
Lesson 3
Embedded Systems Components Part I
Structural Layout with Example Instructional Objectives

After going through this lesson the student would Know the structural layout The specifications of major components of an Embedded system Especially learn about a single board computer
Pre-Requisite
Introduction
The various components of an Embedded System can be hierarchically grouped as System Level Components to Transistor Level Components. A system (subsystem) component is different than what is considered a "standard" electronic component. Standard components are the familiar active devices such as integrated circuits, microprocessors, memory, diodes, transistors, etc. along with passives such as resistors, capacitors, and inductors. These are the basic elements needed to mount on a circuit board for a customized, application-specific design. A system component on the other hand, has active and passive components mounted on circuit boards that are configured for a specific task. (Fig. 3.1) System components can be either single- or multi-function modules that serve as highly integrated building blocks of a system. A system component can be as simple as a digital I/O board or as complex as a computer with video, memory, networking, and I/O all on a single board. System components support industry standards and are available from multiple sources worldwide.
System
Subsystems (PCBs)
Processor Level Components (Integrated Circuits) (Microprocessors, Memory, I/O devices etc)
Gate Level Components Generally inside the Integrated Circuits rarely outside Fig. 3.1 The Hierarchical Components
Structure of an Embedded System

The typical structure of an embedded system is shown in Fig. 3.2. This can be compared with that of a Desktop Computer as shown in Fig. 3.3. Normally in an embedded system the primary memory, central processing unit and many peripheral components including analog-todigital converters are housed on a single chip. These single chips are called as Microcontrollers. This is shown by dotted lines in Fig. 3.2. On the other hand a desktop computer may contain all these units on a single Power Circuit Board (PCB) called as the Mother Board. Since these computers handle much larger dimension of data as compared to the embedded systems there has to be elaborate arrangements for storage and faster data transfer between the CPU and memory, CPU and input/output devices and memory and input/output devices. The storage is accomplished by cheaper secondary memories like Hard Disks and CDROM drives. The data transfer process is improved by incorporating multi-level cache and direct memory access methods. Generally no such arrangements are necessary for embedded systems. Because of the number of heterogeneous components in a desktop computer the power supply is required at multiple voltage-levels (typically 12, 5, 3, 25 volts). On the other hand an Embedded Systems chip may just need one level DC power supply (typically +5V). In a desktop computer various units operate at different speeds. Even the units inside a typical CPU such as Pentium-IV may operate at different speeds. The timing and control units are complex and provide multi-phase clock signal to the CPU and other peripherals at different voltage levels. The timing and control unit for an Embedded system may be much simpler.
Primary Memory Power Supply Power Supply Version 2 EE IIT, Kharagpur 5
Central Processing Unit
Input Output Devices (AD Converters, UARTs, Infrared Ports)
AD Converter-Analog to Digital Converter UART Universal Asynchronous Receiver and Transmitter Fig. 3.2 The typical structure of an Embedded System
Primary Memory
Cache Memory Direct Memory Access
Microprocessor
Input Output Interfaces
Keyboard, Hard Disk Drive, Network Card, Video Display Units Fig. 3.3 The structural layout of a desktop Computer
Typical Example
A Single Board Computer (SBC)
Since you are familiar with Desktop Computers, we should see how to make a desktop PC on a single power circuit board. They will be called Single Board Computers or SBC. These SBCs are typical embedded systems custom-made generally for Industrial Applications. In the introductory lectures you should have done some exercises on your PC. Now try to compare with this SBC with your desktop. Let us look at an example of a single board computer from EBC-C3PLUS SBC from Winsystems1.
Fig. 3.4 The Single Board Computer (SBC) Let us discuss and try to understand the features of the above single board Embedded computer. This will pave the way of our understanding more complex System-On-Chip (SOC) type of systems. The various unit and their specifications are as follows VIA 733MHz or 1 GHz low power C3 processor EBX-compliant board (Fig. 3.5) This is the processor on this SBC. VIA represents the company which manufactures the processor (www.via.com.tw), 733MHz or 1GHz is the clock frequency of this processor. C3 is
Courtesy WinSystems, Inc. 715 Stadium Drive, Arlington Texas 76011 http://sbc.winsystems.com/products/sbcs/ebcc3plus.html
the brand name as P3 and P4 for Intel. (You must be familiar with Intel processors as your PC has one)
Fig. 3.5 The Processor 32 to 512MB of system PC133 SDRAM supported in a 168-pin DIMM socket 32 to 512 MB tells the possible Random Access Memory size on the SBC. SDRAM stands for Synchronous Dynamic RAM. We will learn more about this in the memory chapter. 168-pin DIMM stands for Dual-In-Line Memory-Modules which holds the memory chips and can fit into the board easily.
DIMMs Look like this
Fig. 3.6 DIMM Socket for up to 1Giga Byte bootable DiskOnChip or 512KB SRAM or 1MB EPROM These are Static RAMs (SRAM) or EPROM which houses the operating system just like the Hard Disk in a Desktop computer Type I and II Compact Flash (CF) cards supported It is otherwise known as semiconductor hard-disk or floppy disk. Flash memory is an advanced form of Electrically Erasable and Programmable Read Only Memory (EEPROM). Type I and Type II are just two different designs Type II being more compact and is a recent version.
Fig. 3.7 Flash Memory PC-compatible supports Linux, Windows CE.NET and XP, plus other x86-compatible RTOS This indicates the different types of operating systems supported on this SBC platform. High resolution video controller supports: Color panels supported with up to 36-bits/pixel Supports resolutions up to 1920 x 1440 This is the video quality supported by the on-board video chips Simultaneous CRT and LCD operation: 4X AGP local bus for high speed operation: LVDS supported CRT is for cathode ray terminal, LCD for Liquid Crystal Display terminal AGP means Accelerated Graphic Port 4X represents the speed of the graphic port Accelerated Graphics Port: An extremely fast expansion-slot and bus (64 bit) designed for highperformance graphics cards LVDS Low Voltage Differential Signaling, a low noise, low power, low amplitude method for high-speed (gigabits per second) data transmission over copper wire on the Power Circuit Boards. Dual 10/100 Mbps Intel PCI Ethernet controllers The networking interface 4 RS-232 serial ports with FIFO, COM1 & COM2 with RS-422/485 support The serial interface FIFO stands for First in First Out, RS-232/RS-422/RS-485: These are the serial communication standards which you will study in due course. COM1 and COM2 stands for the same RS232 port. (your desktop has COM ports) Bi-directional LPT port supports EPP/ECP LPT stands for Line Printer Terminal: EPP/ECP stands for Enhanced Parallel Port and Extended Capabilities Port 48 bi-directional TTL digital I/O lines with 24 pins capable of event sense interrupt generation These are extra digital Input/Output lines. 24 lines are capable of sensing interrupts. Four USB ports onboard USB Universal Serial Bus, an external bus standard that supports data transfer rates of 12 Mbps. A single USB port can be used to connect up to 127 peripheral devices, such as mouse, modems, and keyboards.
Two, dual Ultra DMA 33/66/100 EIDE connectors Ultra DMA DMA stands for Direct Memory Access. It is a mode to transfer a bulk of data from the memory to hard-drive and vice-versa EIDE Short for Enhanced Integrated Drive Electronics (IDE), a newer version of the IDE mass storage device interface. It supports higher data rates about three to four times faster than the old IDE standard. In addition, it can support mass storage devices of up to 8.4 gigabytes, whereas the old standard was limited to 528 MB. The numbers 33/66/100 indicates bit rates in Mbps Floppy disk controller supports 1 or 2 drives AC97 Audio-Codec 97 Audio Codec '97 (AC'97) is the specification for, 20-bit audio architecture used in many desktop PCs. The specification was developed in the old Intel Architecture Labs in 1997 to provide system developers with a standardized specification for integrated PC audio devices. AC'97 defined a high-quality audio architecture for the PC and is capable of delivering up to 96kHz/20bit playback in stereo and 48kHz/20-bit in multi-channel playback modes PC/104 and PC/104-Plus expansion connectors PC104 gets its name from the popular desktop personal computers initially designed by IBM called the PC, and from the number of pins used to connect the cards together (104). PC104 cards are much smaller than ISA-bus cards found in PC's and stack together which eliminates the need for a motherboard, backplane, and/or card cage AT keyboard controller and PS/2 mouse support An 84-key keyboard introduced with the PC /AT. It was later replaced with the 101-key Enhanced Keyboard. Two interrupt controllers and 7 DMA channels, Three, 16-bit counter/timers, Real Time Clock, Watch Dog Timer and Power on Self Test The interrupt controllers, DMA channels, counter/timers and Real Time Clock are used for real time applications. Specifications +5 volt only operation Mechanical Dimensions: 5.75" x 8.0" (146mm x 203mm) Jumpers: 0.025" square posts Connectors Serial, Parallel, Keyboard: 50-pin on 0.100" grid COM3 & 4: 20-pin on 0.100" grid Floppy Disk Interface: 34-pin on 0.100" grid EIDE Interface: 40-pin on 0.100" grid (Primary) 44-pin on 2mm grid (Primary) 40-pin on 0.100" grid (Secondary) 50-pin 2mm Flash connector Parallel I/O: Two, 50-pin on 0.100" grid Version 2 EE IIT, Kharagpur 9
CRT: 14-pin on 2-mm. grid FP-100 Panel: Two, 50-pin on 2-mm. grid LVDS 20-pin on 0.100" grid Ethernet: Two RJ-45 PC/104 bus: 64-pin 0.100" socket, 40-pin 0.100" socket PC/104-Plus 120-pin (4 x 30; 2mm) stackthrough with shrouded header USB Four, 4-pin 0.100 Audio Three, 3.5mm stereo phone jacks Power: 9-pin in-line Molex Environmental Operating Temperature: -40 to +85C (733MHz) -40 to +60C (1GHz) Non-condensing relative humidity: 5% to 95%
Fig. 3.8 Another Single Board Computer
Conclusion
It is apparent from the above example that a typical embedded system consist of by and large the following units housed on a single board or chip. Version 2 EE IIT, Kharagpur 10
1. 2. 3. 4. 5. 6. 7.
Processor Memory Input/Output interface chips I/O Devices including Sensors and Actuators A-D and D-A converters Software as operating system Application Software
One or more of the above units can be housed on a single PCB or single chip In a typical Embedded Systems the Microprocessor, a large part of the memory and major I/O devices are housed on a single chip called a microcontroller. Being custom-made the embedded systems are required to function for specific purposes with little user programmability. The user interaction is converted into a series of commands which is executed by the RTOS by calling various subroutines. RTOS is stored in a flash memory or read-only-memory. There will be additional scratch-pad memory for temporary data storage. If the CPU sits on the same chip as memory then a part of the memory can be used for scratch-pad purposes. Otherwise a number of CPU registers will be required for the same. CPU communicates with the memory through the address and data bus. The timing and control of these data exchange takes place by the control unit of the CPU via the control lines. The memory which is housed on the same chip as the CPU has the fastest transfer rate. This is also known as the memory band-width or bit rate. The memory outside the processor chip is slower and hence has a lesser transfer rate. On the other hand Input/Output devices have a varied degree of bandwidth. These varying degrees of data transfer rates are handled in different ways by the processor. The slower devices need interface chips. Generally chips which are faster than the microprocessor are not used. Architecture of a typical embedded-system is shown in Fig. 3.8. The hardware unit consists of the above units along with a digital as well as an analog subsystem. The software in the form of a RTOS resides in the memory.
EMBEDDED SYSTEM hardware software mechanical optical subsystem digital subsystem sensors actuators
analog subsystem
Fig. 3.9 Typical Embedded System Architecture
Question Answers
Q1. What are the Hierarchical components in a embedded system design. Ans: System
Subsystems (PCBs)
Processor Level Components (Integrated Circuits) (Microprocessors, Memory, I/O devices etc)
Gate Level Components Generally inside the Integrated Circuits rarely outside The Hierarchical Components Q.2. What is LVDS? Ans: Known as Low Voltage Differential Signaling. The advantages of such a standard is low noise and low interference such that one can increase the data transmission rate. Instead of 0 and 5 V or 5V a voltage level of 1.5 or 3.3 V is used for High and 0 or 1 V is used for Low. The Low to High voltage swing reduces interference. A differential mode rejects common mode noises. Q.3. Is there any actuator in your mobile phone? Ans: There is a vibrator in a mobile phone which can be activated to indicate an incoming call or message. Generally there is a coreless motor which is operated by the microcontroller for generating the vibration.
Module 1
Introduction
Lesson 4
Embedded Systems Components Part II
Overview on Components Instructional Objectives

After going through this lesson the student would Overview of the following o Processors o Memory o Input/Output Devices
Pre-Requisite
Digital Electronics, Microprocessors You are now almost familiar with the various components of an embedded system. In this chapter we shall discuss some of the general components such as Processors Memory Input/Out Devices
Processors
The central processing unit is the most important component in an embedded system. It exists in an integrated manner along with memory and other peripherals. Depending on the type of applications the processors are broadly classified into 3 major categories 1. General Purpose Microprocessors 2. Microcontrollers 3. Digital Signal Processors For more specific applications customized processors can also be designed. Unless the demand is high the design and manufacturing cost of such processors will be high. Therefore, in most of the applications the design is carried out using already available processors in the market. However, the Field Programmable Gate Arrays (FPGA) can be used to implement simple customized processors easily. An FPGA is a type of logic chip that can be programmed. They support thousands of gates which can be connected and disconnected like an EPROM (Erasable Programmable Read Only Memory). They are especially popular for prototyping integrated circuit designs. Once the design is set, hardwired chips are produced for faster performance.
General Purpose Processors

A general purpose processor is designed to solve problems in a large variety of applications as diverse as communications, automotive and industrial embedded systems. These processors are Version 2 EE IIT, Kharagpur 3
generally cheap because of the manufacturing of large number of units. The NRE (Non-recurring Engineering Cost: Lesson I) is spread over a large number of units. Being cheaper the manufacturer can invest more for improving the VLSI design with advanced optimized architectural features. Thus the performance, size and power consumption can be improved. Most cases, for such processors the design tools are provided by the manufacturer. Also the supporting hardware is cheap and easily available. However, only a part of the processor capability may be needed for a specific design and hence the over all embedded system will not be as optimized as it should have been as far as the space, power and reliability is concerned. Processor
Control unit
Datapath ALU
Control /Status
Controller
Registers
PC
IR
I/O Memory
Fig. 4.1 The architecture of a General Purpose Processor Pentium IV is such a general purpose processor with most advanced architectural features. Compared to its overall performance the cost is also low. A general purpose processor consists of a data path, a control unit tightly linked with the memory. (Fig. 4.1) The Data Path consists of a circuitry for transforming data and storing temporary data. It contains an arithmetic-logic-unit(ALU) capable of transforming data through operations such as addition, subtraction, logical AND, logical OR, inverting, shifting etc. The data-path also contains registers capable of storing temporary data generated out of ALU or related operations. The internal data-bus carries data within the data path while the external data bus carries data to and from the data memory. The size of the data path indicates the bit-size of the CPU. An 8-bit data path means an 8-bit CPU such as 8085 etc. The Control Unit consists of circuitry for retrieving program instructions and for moving data to, from, and through the data-path according to those instructions. It has a program counter(PC) to hold the address of the next program instruction to fetch and an Instruction register(IR) to hold Version 2 EE IIT, Kharagpur 4
the fetched instruction. It also has a timing unit in the form of state registers and control logic. The controller sequences through the states and generates the control signals necessary to read instructions into the IR and control the flow of data in the data path. Generally the address size is specified by the control unit as it is responsible to communicate with the memory. For each instruction the controller typically sequences through several stages, such as fetching the instruction from memory, decoding it, fetching the operands, executing the instruction in the data path and storing the results. Each stage takes few clock cycles.
Microcontroller
Just as you put all the major components of a Desktop PC on to a Single Board Computer (SBC) if you put all the major components of a Single Board Computer on to a single chip it will be called as a Microcontroller. Because of the limitations in the VLSI design most of the input/output functions exist in a simplified manner. Typical architecture of such a microprocessor is shown in Fig. 4.2.
Address Bus
Interrupt Controller
Serial Port Parallel Port
IRAM
XRAM
Peripheral Bus
Timers Parallel Port
C500 Core (1 or 8 Datapointer)
ROM
Control
Access Control
A D
Housekeeper
MDU Port0/Port2 WDU
Fig. 4.2 The architecture of a typical microcontroller named as C500 from Infineon Technology, Germany *The double-lined blocks are core to the processor. Other blocks are on-chip The various units of the processors (Fig. 4.2) are as follows: The C500 Core contains the CPU which consists of the Instruction Decoder, Arithmetic Logic Unit (ALU) and Program Control section The housekeeper unit generates internal signals for controlling the functions of the individual internal units within the microcontroller. Port 0 and Port 2 are required for accessing external code and data memory and for emulation purposes. Version 2 EE IIT, Kharagpur 5
Data Bus
Ext. Control
RST EA PSEN ALE XTAL
The external control block handles the external control signals and the clock generation. The access control unit is responsible for the selection of the on-chip memory resources. The IRAM provides the internal RAM which includes the general purpose registers. The XRAM is another additional internal RAM sometimes provided The interrupt requests from the peripheral units are handled by an Interrupt Controller Unit. Serial interfaces, timers, capture/compare units, A/D converters, watchdog units (WDU), or a multiply/divide unit (MDU) are typical examples for on-chip peripheral units. The external signals of these peripheral units are available at multifunctional parallel I/O ports or at dedicated pins.
Digital Signal Processor (DSP)

These processors have been designed based on the modified Harvard Architecture to handle real time signals. The features of these processors are suitable for implementing signal processing algorithms. One of the common operations required in such applications is array multiplication. For example convolution and correlation require array multiplication. This is accomplished by multiplication followed by accumulation and addition. This is generally carried out by Multiplier and Accumulator (MAC) units. Some times it is known as MACD, where D stands for Data move. Generally all the instructions are executed in single cycle.
Processing Unit
Result/Operands
Data Memory
Status
Opcode
Address
Control Unit
Instructions Address
Program Memory
Fig. 4.3 The modified Harvard architecture The MACD type of instructions can be executed faster by parallel implementation. This is possible by separately accessing the program and data memory in parallel. This can be accomplished by the modified architecture shown in Fig. 4.3. These DSP units generally use Multiple Access and Multi Ported Memory units. Multiple access memory allows more than one access in one clock period. The Multi-ported Memory allows multiple addresses as well Data ports. This also increases the number of access per unit clock cycle.
Address Bus 1
Data Bus 1
Address Bus 2
Dual Port Memory
Data Bus 2
Fig. 4.4 Dual Ported Memory The Very Long Instruction Word (VLIW) architecture is also suitable for Signal Processing applications. This has got a number of functional units and data paths as seen in Fig. 4.5. The long instruction words are fetched from the memory. The operands and the operation to be performed by the various units are specified in the instruction itself. The multiple functional units share a common multi-ported register file for fetching the operands and storing the results. Parallel random access to the register file is possible through the read/write cross bar. Execution in the functional units is carried out concurrently with the load/store operation of data between RAM and the register file.
Multi-ported Register File
Program Control Unit
Read/Write Cross Bar
Functional Unit 1
.......
Functional Unit n
Instruction Cache
Fig. 4.5 Block Diagram of VLIW architecture
Microprocessors vs Microcontrollers
A microprocessor is a general-purpose digital computers central processing unit. To make a complete microcomputer, you add memory (ROM and RAM) memory decoders, an oscillator, and a number of I/O devices. The prime use of a microprocessor is to read data, perform extensive calculations on that data, and store the results in a mass storage device or display the results. These processors have complex architectures with multiple stages of pipelining and parallel processing. The memory is divided into stages such as multi-level cache and RAM. The development time of General Purpose Microprocessors is high because of a very complex VLSI design. Version 2 EE IIT, Kharagpur 7
ROM
EEPROM
RAM
Microprocessor
Serial I/O
A/D
Analog I/O
D/A
Input and output ports
Input and output ports
Parallel I/O
Timer
PWM
Fig. 4.6 A Microprocessor based System The design of the microcontroller is driven by the desire to make it as expandable and flexible as possible. Microcontrollers usually have on chip RAM and ROM (or EPROM) in addition to on chip i/o hardware to minimize chip count in single chip solutions. As a result of using on chip hardware for I/O and RAM and ROM they usually have pretty low performance CPU. Microcontrollers also often have timers that generate interrupts and can thus be used with the CPU and on chip A/D D/A or parallel ports to get regularly timed I/O. The prime use of a microcontroller is to control the operations of a machine using a fixed program that is stored in ROM and does not change over the lifetime of the system. The microcontroller is concerned with getting data from and to its own pins; the architecture and instruction set are optimized to handle data in bit and byte size.
ROM
EEPROM
RAM
Analog in
A/D
Serial I/O CPU core Parallel I/O
Timer Analog out Microcontroller PWM Filter Digital PWM Fig. 4.7 A Microcontroller The contrast between a microcontroller and a microprocessor is best exemplified by the fact that most microprocessors have many operation codes (opcodes) for moving data from external memory to the CPU; microcontrollers may have one or two. Microprocessors may have one or two types of bit-handling instructions; microcontrollers will have many. A basic Microprocessors vs a basic DSP
Program Memory Processor Data Memory
Fig. 4.8 The memory organization in a DSP DSP Characterization 1. Microprocessors specialized for signal processing applications 2. Harvard architecture 3. Two to Four memory accesses per cycle 4. Dedicated hardware performs all key arithmetic operations in 1 cycle Version 2 EE IIT, Kharagpur 9
5. Very limited SIMD(Single Instruction Multiple Data) features and Specialized, complex instructions 6. Multiple operations per instruction 7. Dedicated address generation units 8. Specialized addressing [ Auto-increment Modulo (circular) Bit-reversed ] 9. Hardware looping. 10. Interrupts disabled during certain operations 11. Limited or no register Shadowing 12. Rarely have dynamic features 13. Relatively narrow range of DSP oriented on-chip peripherals and I/O interfaces 14. synchronous serial port Processor Memory
Fig. 4.9 Memory Organization in General Purpose Processor Characterization of General Purpose Processor 1. CPUs for PCs and workstations E.g., Intel Pentium IV 2. Von Neumann architecture 3. Typically 1 access per cycle 4. Most operations take more than 1 cycle 5. General-purpose instructions Typically only one operation per instruction 6. Often, no separate address generation units 7. General-purpose addressing modes 8. Software loops only 9. Interrupts rarely disabled 10. Register shadowing common 11. Dynamic caches are common 12. Wide range of on-chip and off-chip peripherals and I/O interfaces 13. Asynchronous serial port...
Memory
Memory serves processor short and long-term information storage requirements while registers serve the processors short-term storage requirements. Both the program and the data are stored in the memory. This is known as Princeton Architecture where the data and program occupy the same memory. In Harvard Architecture the program and the data occupy separate Version 2 EE IIT, Kharagpur 10
memory blocks. The former leads to simpler architecture. The later needs two separate connections and hence the data and program can be made parallel leading to parallel processing. The general purpose processors have the Princeton Architecture. The memory may be Read-Only-Memory or Random Access Memory (RAM). It may exist on the same chip with the processor itself or may exist outside the chip. The on-chip memory is faster than the off-chip memory. To reduce the access (read-write) time a local copy of a portion of memory can be kept in a small but fast memory called the cache memory. The memory also can be categorized as Dynamic or Static. Dynamic memory dissipate less power and hence can be compact and cheaper. But the access time of these memories are slower than their Static counter parts. In Dynamic RAMs (or DRAM) the data is retained by periodic refreshing operation. While in the Static Memory (SRAM) the data is retained continuously. SRAMs are much faster than DRAMs but consume more power. The intermediate cache memory is an SRAM. In a typical processor when the CPU needs data, it first looks in its own data registers. If the data isn't there, the CPU looks to see if it's in the nearby Level 1 cache. If that fails, it's off to the Level 2 cache. If it's nowhere in cache, the CPU looks in main memory. Not there? The CPU gets it from disk. All the while, the clock is ticking, and the CPU is sitting there waiting.
Input/Output Devices and Interface Chips

Typical RTES interact with the environment and users through some inbuilt hardware. Occasionally external circuits are required for communicating with user, other computers or a network. In the mobile handset discussed earlier the input output devices are, keyboard, the display screen, the antenna, the microphone, speaker, LED indicators etc. The signal to these units may be analog or digital in nature. To generate an analog signal from the microprocessor we need an Digital to Analog Converter(DAC) and to accept analog signal we need and Analog to Digital Converter (ADC). These DAC and ADC again have certain control modes. They may also operate at different speed than the microprocessor. To synchronize and control these interface chips we may need another interface chip. Similarly we may have interface chips for keyboard, screen and antenna. These chips serve as relaying units to transfer data between the processor and input/output devices. The input/output devices are generally slower than the processor. Therefore, the processor may have to wait till they respond to any request for data transfer. Number of idle clock cycles may be wasted for doing so. However, the input-output interface chips carry out this task without making the processor to wait or idle. Sensor Signal Conditioning and Amplification Amplification A-D Converter Processor Actuator D-A Converter Memory
Fig. 4.10 The typical input/output interface blocks
Conclusion
Besides the above units some real time embedded systems may have specific circuits included on the same chip or circuit board. They are known as Application Specific Integrated Circuit (ASIC). Some examples are
1. MODEMs (modulator, demodulator units)

It is used to modulate a digital signal into high-frequency analog signal for wire-less transmission. There are various methods to convert a digital signal into analog form. Amplitude Shift Keying (ASK) Frequency Shift Keying (FSK) Phase Shift Keying (PSK) Quadrature Phase Shift Keying (QPSK) The same unit is also used to demodulate the analog signal into digital forms.
2. CODECs (Compress and Decompress Units)

It is generally used to process digital video and/or audio files. A CODEC reduces the amount of data to be transmitted by discarding redundant data on the transmitting end and reconstituting the signal on the receiving end.
3. Filters
Filters are used to condition the incoming signal by eliminating the out-band noise and other unnecessary signals. A specific class of filters called Anti-aliasing filters, are used before the AD converters to prevent aliasing while acquiring a broad-band signal (signal with a very wide frequency spectrum)
4. Controllers
These are specific circuits for controlling, motors, actuators and light-intensities etc.
Questions-Answers
Q1. Enumerate the similarities and differences between the Microcontroller and Digital Signal Processor Ans: Microcontrollers usually have on chip RAM and ROM (or EPROM) in addition to on chip i/o hardware to minimize chip count in single chip solutions. As a result of using on chip hardware for I/O and RAM and ROM they usually have pretty low performance CPU. Microcontrollers also often have timers that generate interrupts and can thus be used with the CPU and on chip A/D D/A or parallel ports to get regularly timed I/O. The prime use of a microcontroller is to control the operations of a machine using a fixed program that is stored in ROM and does not change over the lifetime of the system. The microcontroller is concerned with getting data from and to its own pins; the architecture and instruction set are optimized to handle data in bit and byte size. Digital Signal Processors have been designed based on the modified Harvard Architecture to handle real time signals. The features of these processors are suitable for implementing signal processing algorithms. One of the common operations required in such applications is array multiplication. For example convolution and correlation require array multiplication. This is accomplished by multiplication followed by accumulation and addition. This is generally carried out by Multiplier and Accumulator (MAC) units. Some times it is known as MACD, where D stands for Data move. Generally all the instructions are executed in single cycle. These DSP units generally use Multiple Access and Multi Ported Memory units. Multiple access memory allows more than one access in one clock period. The Multiported Memory allows multiple addresses as well Data ports. This also increases the number of access per unit clock cycle. Q2. Name few chips in each of the family of processors such as: Microcontroller, Digital Signal Processor, General Purpose Processor Ans: Microcontroller: Intel 8051, Intel 80196, Motorola 68705 Digital Signal Processors: TI 3206711, TI 3205000 General Purpose Processor: Intel Pentium IV, Power PC Q3. Enlist the following in the increasing order of their access speed Flash Memory, Dynamic Memory, Cache Memory, CDROM, Hard Disk, Magnetic Tape, Processor Memory Ans: Magnetic Tape, CDROM, Hard Disk, Dynamic Memory, Flash Memory, Cache Memory, Processor Memory
Q4. Draw the circuit of an anti-aliasing Filter using Operational amplifiers Ans:
Low Pass Sallen Key Butterworth Filter Q5. Is it possible to implement an anti-aliasing filter in the digital form? Ans: No it is not possible to implement an anti-aliasing filter in digital form. Because aliasing is an error introduced at the sampling phase of analog to digital converter. If the sampling frequency is less than twice of the highest frequency present the higher signal frequencies fold back to lower frequency band and hence can be distinguished in the digital/discrete domain. Q6. Download any free emulator of some simple microcontrollers such as 8051, 68705 etc and learn about it. Home work Q7. Draw the internal architecture of 8051 and explain the functions of various units. See http://www.atmel.com/products/8051/ Q8. State with justification if the following statements are right (or wrong) Cache memory can be a static RAM Dynamic RAMs occupy more space per word storage The full-form of SDRAM is static-dynamic RAM BIOS in your PC is not a Random Access Memory (RAM) Ans: Cache memory can be a static RAM right The cache memory need to have very fast access time which is possible with static RAM. Dynamic RAMs occupy more space per word storage wrong DRAMs are basically simple MOS based capacitors. Therefore occupy much lower space as compared to static RAMs. Version 2 EE IIT, Kharagpur 14
The full-form of SDRAM is static-dynamic RAM wrong SDRAM is Synchronous Dynamic RAM. Covered in later chapters BIOS in your PC is not a Random Access Memory (RAM) Wrong The BIOS is a CMOS based memory which can be accessed uniformly. Q9. Explain the function of the following units in a general purpose processor Instruction Register Program Counter Instruction Queue Control Unit Ans: Instruction Register: A register inside the CPU which holds the instruction code temporarily before sending it to the decoding unit. Program Counter: It is a register inside the CPU which holds the address of the next instruction code in a program. It gets updated automatically by the address generation unit. Instruction Queue: A set of memory locations inside the CPU to hold the instructions in a pipeline before rending them to the next instruction decoding unit. Control Unit: This is responsible in generating timing and control signals for various operations inside the CPU. It is very closely associated with the instruction decoding unit.
Module 2
Embedded Processors and Memory
Lesson 5
Memory-I
Instructional Objectives
After going through this lesson the student would o Different kinds of Memory Processor Memory Primary Memory Memory Interfacing
Pre-Requisite
5.1 Introduction
This chapter shall describe about the memory. Most of the modern computer system has been designed on the basis of an architecture called Von-Neumann Architecture1
Input Output Devices
Central Processing Unit

Fig. 5.1 The Von Neumann Architecture
Memory
The Memory stores the instructions as well as data. No one can distinguish an instruction and data. The CPU has to be directed to the address of the instruction codes. The memory is connected to the CPU through the following lines 1. Address 2. Data 3. Control
http://en.wikipedia.org/wiki/John_von_Neumann. The so-called von Neumann architecture is a model for a computing machine that uses a single storage structure to hold both the set of instructions on how to perform the computation and the data required or generated by the computation. Such machines are also known as storedprogram computers. The separation of storage from the processing unit is implicit in this model. By treating the instructions in the same way as the data, a stored-program machine can easily change the instructions. In other words the machine is reprogrammable. One important motivation for such a facility was the need for a program to increment or otherwise modify the address portion of instructions. This became less important when index registers and indirect addressing became customary features of machine architecture.
Data Lines
CPU
Address Lines
Control Lines
Fig. 5.2 The Memory Interface
Memory
In a memory read operation the CPU loads the address onto the address bus. Most cases these lines are fed to a decoder which selects the proper memory location. The CPU then sends a read control signal. The data is stored in that location is transferred to the processor via the data lines. In the memory write operation after the address is loaded the CPU sends the write control signal followed by the data to the requested memory location. The memory can be classified in various ways i.e. based on the location, power consumption, way of data storage etc The memory at the basic level can be classified as 1. Processor Memory (Register Array) 2. Internal on-chip Memory 3. Primary Memory 4. Cache Memory 5. Secondary Memory
Processor Memory (Register Array)

Most processors have some registers associated with the arithmetic logic units. They store the operands and the result of an instruction. The data transfer rates are much faster without needing any additional clock cycles. The number of registers varies from processor to processor. The more is the number the faster is the instruction execution. But the complexity of the architecture puts a limit on the amount of the processor memory.
Internal on-chip Memory

In some processors there may be a block of memory location. They are treated as the same way as the external memory. However it is very fast.
Primary Memory
This is the one which sits just out side the CPU. It can also stay in the same chip as of CPU. These memories can be static or dynamic.
Cache Memory
This is situated in between the processor and the primary memory. This serves as a buffer to the immediate instructions or data which the processor anticipates. There can be more than one levels of cache memory.
Secondary Memory
These are generally treated as Input/Output devices. They are much cheaper mass storage and slower devices connected through some input/output interface circuits. They are generally magnetic or optical memories such as Hard Disk and CDROM devices. The memory can also be divided into Volatile and Non-volatile memory.
Volatile Memory
The contents are erased when the power is switched off. Semiconductor Random Access Memories fall into this category.
Non-volatile Memory
The contents are intact even of the power is switched off. Magnetic Memories (Hard Disks), Optical Disks (CDROMs), Read Only Memories (ROM) fall under this category.
CPU
Control Unit
ALU
Registers
Input
Output
Memory
Fig. 5.3 The Internal Registers
5.2 Data Storage

An m word memory can store m x n: m words of n bits each. One word is located at one address therefore to address m words we need. k = Log2(m) address input signals or k number address lines can address m = 2k words Example 4,096 x 8 memory: 32,768 bits 12 address input signals 8 input/output data signals m n memory
m words
n bits per word Fig. 5.4 Data Array Version 2 EE IIT, Kharagpur 6
Memory access
The memory location can be accessed by placing the address on the address lines. The control lines read/write selects read or write. Some memory devices are multi-port i.e. multiple accesses to different locations simultaneously memory external view
r/w
enable
2k n read and write memory
A0 Ak-1
Qn-1 Fig. 5.5 Memory Array
Q0
Memory Specifications
The specification of a typical memory is as follows The storage capacity: The number of bits/bytes or words it can store The memory access time (read access and write access): How long the memory takes to load the data on to its data lines after it has been addressed or how fast it can store the data upon supplied through its data lines. This reciprocal of the memory access time is known as Memory
Bandwidth
The Power Consumption and Voltage Levels: The power consumption is a major factor in embedded systems. The lesser is the power consumption the more is packing density. Size: Size is directly related to the power consumption and data storage capacity.
Generation 1
Generation 2 Version 2 EE IIT, Kharagpur 7
Generation 3
Generation 4 Fig. 5.6 Four Generations of RAM chips There are two important specifications for the Memory as far as Real Time Embedded Systems are concerned. Write Ability Storage Performance
Write ability
It is the manner and speed that a particular memory can be written
Ranges of write ability High end processor writes to memory simply and quickly e.g., RAM Middle range processor writes to memory, but slower e.g., FLASH, EEPROM (Electrically Erasable and Programmable Read Only Memory) Lower range special equipment, programmer, must be used to write to memory e.g., EPROM, OTP ROM (One Time Programmable Read Only Memory) Low end bits stored only during fabrication e.g., Mask-programmed ROM In-system programmable memory Can be written to by a processor in the embedded system using the memory Memories in high end and middle range of write ability
Storage permanence
It is the ability to hold the stored bits. Range of storage permanence High end essentially never loses bits e.g., mask-programmed ROM Version 2 EE IIT, Kharagpur 8
Middle range holds bits days, months, or years after memorys power source turned off e.g., NVRAM Lower range holds bits as long as power supplied to memory e.g., SRAM
Low end begins to lose bits almost immediately after written e.g., DRAM Nonvolatile memory Holds bits after power is no longer supplied High end and middle range of storage permanence
5.3 Common Memory Types Read Only Memory (ROM)

This is a nonvolatile memory. It can only be read from but not written to, by a processor in an embedded system. Traditionally written to, programmed, before inserting to embedded system Uses
Store software program for general-purpose processor
program instructions can be one or more ROM words
Store constant data needed by system Implement combinational circuit External view enable A0
2k n ROM
Ak-1
Qn-1 Example
Q0
Fig. 5.7 The ROM Structure
The figure shows the structure of a ROM. Horizontal lines represents the words. The vertical lines give out data. These lines are connected only at circles. If address input is 010 the decoder sets 2nd word line to 1. The data lines Q3 and Q1 are set to 1 because there is a programmed Version 2 EE IIT, Kharagpur 9
connection with word 2s line. The word 2 is not connected with data lines Q2 and Q0. Thus the output is 1010 Internal view 8 4 ROM
word 0 enable
38 decoder
word 1 word 2 word line
A0 A1 A2
data line programmable connection wired-OR
Q3 Q2 Q1 Q0 Fig. 5.8 The example of a ROM with decoder and data storage
Implementation of Combinatorial Functions

Any combinational circuit of n functions of same k variables can be done with 2k x n ROM. The inputs of the combinatorial circuit are the address of the ROM locations. The output is the word stored at that location. Truth table
Inputs (address) a b c 0 0 0 0 0 1 0 1 0 0 1 1 1 0 0 1 0 1 1 1 0 1 1 1 Outputs y z 0 0 0 1 0 1 1 0 1 0 1 1 1 1 1 1
82 ROM
0 0 0 1 1 1 1 1 y 0 1 1 0 0 1 1 1 z
word 0 word 1
enable c b a
word 7
Fig. 5.9 The combinatorial table
Mask-programmed ROM
The connections programmed at fabrication. They are a set of masks. It can be written only once (in the factory). But it stores data for ever. Thus it has the highest storage permanence. The bits never change unless damaged. These are typically used for final design of high-volume systems. Version 2 EE IIT, Kharagpur 10
OTP ROM: One-time programmable ROM

The Connections programmed after manufacture by user. The user provides file of desired contents of ROM. The file input to machine called ROM programmer. Each programmable connection is a fuse. The ROM programmer blows fuses where connections should not exist. Very low write ability: typically written only once and requires ROM programmer device Very high storage permanence: bits dont change unless reconnected to programmer and more fuses blown Commonly used in final products: cheaper, harder to inadvertently modify
EPROM: Erasable programmable ROM

This is known as erasable programmable read only memory. The programmable component is a MOS transistor. This transistor has a floating gate surrounded by an insulator. The Negative charges form a channel between source and drain storing a logic 1. The Large positive voltage at gate causes negative charges to move out of channel and get trapped in floating gate storing a logic 0. The (Erase) Shining UV rays on surface of floating-gate causes negative charges to return to channel from floating gate restoring the logic 1. An EPROM package showing quartz window through which UV light can pass. The EPROM has Better write ability can be erased and reprogrammed thousands of times Reduced storage permanence program lasts about 10 years but is susceptible to radiation and electric noise Typically used during design development
0V floating
+15V
(b)
(a)
(d)
5-30 min
(c)
Fig. 5.10 The EPROM
EEPROM
EEPROM is otherwise known as Electrically Erasable and Programmable Read Only Memory. It is erased typically by using higher than normal voltage. It can program and erase individual words unlike the EPROMs where exposure to the UV light erases everything. It has Version 2 EE IIT, Kharagpur 11
Better write ability
can be in-system programmable with built-in circuit to provide higher than normal voltage
built-in memory controller commonly used to hide details from memory user busy pin indicates to processor EEPROM still writing
writes very slow due to erasing and programming
can be erased and programmed tens of thousands of times Similar storage permanence to EPROM (about 10 years) Far more convenient than EPROMs, but more expensive
Flash Memory
It is an extension of EEPROM. It has the same floating gate principle and same write ability and storage permanence. It can be erased at a faster rate i.e. large blocks of memory erased at once, rather than one word at a time. The blocks are typically several thousand bytes large Writes to single words may be slower
Entire block must be read, word updated, then entire block written back Used with embedded systems storing large data items in nonvolatile memory e.g., digital cameras, TV set-top boxes, cell phones
RAM: Random-access memory

Typically volatile memory
bits are not held without power supply Read and written to easily by embedded system during execution Internal structure more complex than ROM a word consists of several memory cells, each storing 1 bit each input and output data line connects to each cell in its column rd/wr connected to every cell when row is enabled by decoder, each cell has logic that stores input data bit when rd/wr indicates write or outputs stored bit when rd/wr indicates read
external view
r/w enable A0 Ak-1
2k n read and write memory
Qn-1
Q0
Fig. 5.11 The structure of RAM

internal view I3 I2 I1 I0
44 RAM enable 24 decoder
A0 A1
Memory cell
rd/wr
To every cell Q3 Q2 Q Q
Fig. 5.12 The RAM decoder and access
Basic types of RAM

SRAM: Static RAM
Memory cell uses flip-flop to store bit Requires 6 transistors
Holds data as long as power supplied DRAM: Dynamic RAM Memory cell uses MOS transistor and capacitor to store bit More compact than SRAM Refresh required due to capacitor leak
words cells refreshed when read
Typical refresh rate 15.625 microsec. Slower to access than SRAM Version 2 EE IIT, Kharagpur 13
SRAM
DRAM
Data' Data
Data W
Ram variations
PSRAM: Pseudo-static RAM
DRAM with built-in memory refresh controller
Popular low-cost high-density alternative to SRAM NVRAM: Nonvolatile RAM Holds data after external power removed Battery-backed RAM
SRAM with own permanently connected battery writes as fast as reads no limit on number of writes unlike nonvolatile ROM-based memory
SRAM with EEPROM or flash stores complete RAM contents on EEPROM or flash before power
5.4 Example: HM6264 & 27C256 RAM/ROM devices

Low-cost low-capacity memory devices Commonly used in 8-bit microcontroller-based embedded systems First two numeric digits indicate device type
RAM: 62
ROM: 27 Subsequent digits indicate capacity in kilobits
11-13, 15-19 2,23,21,24, 25, 3-10 22 27 20 26
data<70> addr<15...0> /OE /WE /CS1 CS2 HM6264
11-13, 15-19
27,26,2,23,21,
data<70> addr<15...0> /OE /CS
24,25, 3-10 22 20
27C256 block diagrams
Device Access Time (ns) HM6264 85-100 27C256 90
Standby Pwr. (mW) .01 .5
Active Pwr. (mW) 15 100
Vcc Voltage (V) 5 5
device characteristics Read operation data addr OE /CS1 CS2 data addr WE /CS1 CS2 timing diagrams Write operation
5.5 Example: TC55V2325FF-100 memory device

2-megabit synchronous pipelined burst SRAM memory device Designed to be interfaced with 32-bit processors Capable of fast sequential reads and writes as well as single byte I/O
data<310> addr<150> addr<10...0> /CS1 /CS2 CS3
Device Access Time (ns) TC55V23 10 25FF-100
Standby Pwr. (mW) na
Active Pwr. (mW) 1200
Vcc Voltage (V) 3.3
device characteristics
A single read operation

CLK
/WE /ADSP /OE /ADSC MODE /ADV /ADSP /ADSC /ADV CLK TC55V2325 FF-100 block diagram timing diagram addr <150> /WE /OE /CS1 and /CS2 CS3 data<310>
5.6 Composing memory

Memory size needed often differs from size of readily available memories When available memory is larger, simply ignore unneeded high-order address bits and higher data lines When available memory is smaller, compose several smaller memories into one larger memory Connect side-by-side to increase width of words Connect top to bottom to increase number of words added high-order address line selects smaller memory containing desired word using a decoder Combine techniques to increase number and width of words
Increase number of words (2m+1) n ROM 2m n ROM

12 decoder
A0 Am-1 Am
enable
2m n ROM

Qn-1
Q0
2m 3n ROM
enable
Increase width of words
2m n ROM

2m n ROM

2m n ROM
A0 Am
Q3n-1
Q2n-1
Q0
Increase number and width of words

enable outputs
Fig. 5.13 Composing Memory
5.7 Conclusion
In this chapter you have learnt about the following 1. Basic Memory types 2. Basic Memory Organization 3. Definitions of RAM, ROM and Cache Memory Version 2 EE IIT, Kharagpur 17
4. Difference between Static and Dynamic RAM 5. Various Memory Control Signals 6. Memory Specifications 7. Basics of Memory Interfacing
5.8 Questions
Q1. Ans:
11-13, 15-19 2,23,21,24, 25, 3-10 22 27 20 26 data<70> addr<15...0> /OE /WE /CS1 CS2 HM6264
Discuss the various control signals in a typical RAM device (say HM626)
/OE: output enable bar: the output is enables when it is low. It is same as the read bar line /WE: write enable bar: the line has to made low while writing to this device CS1: chip select 1 bar: this line has to be made low along with CS2 bar to enable this chip Q2. Download the datasheet of TC55V2325FF chip and indicate the various signals.
Module 2
Lesson 6
Memory-II
After going through this lesson the student would Memory Hierarchy Cache Memory - Different types of Cache Mappings - Cache Impact on System Performance Dynamic Memory - Different types of Dynamic RAMs Memory Management Unit
Pre-Requisite
6.1 Memory Hierarchy

Objective is to use inexpensive, fast memory Main memory Large, inexpensive, slow memory stores entire program and data Cache Small, expensive, fast memory stores copy of likely accessed parts of larger memory Can be multiple levels of cache
Process Registers Cache Main memory Disk Tape Fig. 6.1 The memory Hierarchy
6.2 Cache
Usually designed with SRAM faster but more expensive than DRAM Usually on same chip as processor space limited, so much smaller than off-chip main memory faster access (1 cycle vs. several cycles for main memory) Cache operation Request for main memory access (read or write) First, check cache for copy cache hit - copy is in cache, quick access cache miss - copy not in cache, read address and possibly its neighbors into cache Several cache design choices cache mapping, replacement policies, and write techniques
6.3 Cache Mapping

is necessary as there are far fewer number of available cache addresses than the memory Are address contents in cache? Cache mapping used to assign main memory address to cache address and determine hit or miss Three basic techniques: Direct mapping Fully associative mapping Set-associative mapping Caches partitioned into indivisible blocks or lines of adjacent memory addresses usually 4 or 8 addresses per line
Direct Mapping
Main memory address divided into 2 fields Index which contains - cache address - number of bits determined by cache size Tag - compared with tag stored in cache at address indicated by index - if tags match, check valid bit Valid bit indicates whether data in slot has been loaded from memory Offset used to find particular word in cache line
Tag
Index
Offset
T D
Data
Valid =
Fig. 6.2 Direct Mapping
Fully Associative Mapping

Complete main memory address stored in each cache address All addresses stored in cache simultaneously compared with desired address Valid bit and offset same as direct mapping
Tag Offset Data
V T V T V T
Valid = = =
Fig. 6.3 Fully Associative Mapping
Set-Associative Mapping
Compromise between direct mapping and fully associative mapping Index same as in direct mapping But, each cache address contains content and tags of 2 or more memory address locations Tags of that set simultaneously compared as in fully associative mapping Cache with set size N called N-way set-associative 2-way, 4-way, 8-way are common
Tag
V T
Index
D V T
Offset
D
Data
Valid = =
Fig. 6.4 Set Associative Mapping
6.4 Cache-Replacement Policy

Technique for choosing which block to replace when fully associative cache is full when set-associative caches line is full Direct mapped cache has no choice Random replace block chosen at random LRU: least-recently used replace block not accessed for longest time FIFO: first-in-first-out push block onto queue when accessed choose block to replace by popping queue
6.5 Cache Write Techniques

When written, data cache must update main memory Write-through write to main memory whenever cache is written to easiest to implement processor must wait for slower main memory write potential for unnecessary writes Write-back main memory only written when dirty block replaced extra dirty bit for each block set when cache block written to reduces number of slow main memory writes
6.6 Cache Impact on System Performance

Most important parameters in terms of performance: Version 2 EE IIT, Kharagpur 6
Total size of cache - total number of data bytes cache can hold - tag, valid and other house keeping bits not included in total Degree of associativity Data block size Larger caches achieve lower miss rates but higher access cost e.g., - 2 Kbyte cache: miss rate = 15%, hit cost = 2 cycles, miss cost = 20 cycles - avg. cost of memory access = (0.85 * 2) + (0.15 * 20) = 4.7 cycles 4 Kbyte cache: miss rate = 6.5%, hit cost = 3 cycles, miss cost will not change - avg. cost of memory access = (0.935 * 3) + (0.065 * 20) = 4.105 cycles (improvement) 8 Kbyte cache: miss rate = 5.565%, hit cost = 4 cycles, miss cost will not change - avg. cost of memory access = (0.94435 * 4) + (0.05565 * 20) = 4.8904 cycles
6.7 Cache Performance Trade-Offs

Improving cache hit rate without increasing size Increase line size Change set-associativity
0.16 0.14 0.12 % cache miss 0.1 0.08 0.06 0.04 0.02 0 1 Kb 2 Kb 4 Kb 8 Kb 16 Kb 32 Kb 64 Kb 128 Kb 1 way 2 way 4 ways 8 way
cache size
Fig. 6.5 Cache Performance
6.8 Advanced RAM

DRAMs commonly used as main memory in processor based embedded systems high capacity, low cost Many variations of DRAMs proposed need to keep pace with processor speeds FPM DRAM: fast page mode DRAM EDO DRAM: extended data out DRAM SDRAM/ESDRAM: synchronous and enhanced synchronous DRAM RDRAM: rambus DRAM
6.9 Basic DRAM

Address bus multiplexed between row and column components Row and column addresses are latched in, sequentially, by strobing ras (row address strobe) and cas (column address strobe) signals, respectively Refresh circuitry can be external or internal to DRAM device strobes consecutive memory address periodically causing memory content to be refreshed Refresh circuitry disabled during read or write operation
data
. Buffer In Buffer
Refresh Circuit Sense Amplifiers cas
rd/ wr
Data
Addr Col
Col Decoder
ras, clock
Out Buffer Data
Buff er Addr. Row
Row Decod er
cas,
ras Bit storage array
address
Fig. 6.6 The Basic Dynamic RAM Structure
Fast Page Mode DRAM (FPM DRAM)

Each row of memory bit array is viewed as a page Page contains multiple words Individual words addressed by column address Timing diagram: row (page) address sent 3 words read consecutively by sending column address for each Extra cycle eliminated on each read/write of words from same Version 2 EE IIT, Kharagpur 8
ras cas address data row col data col data col data
Fig. 6.7 The timing diagram in FPM DRAM
Extended data out DRAM (EDO DRAM)

Improvement of FPM DRAM Extra latch before output buffer allows strobing of cas before data read operation completed Reduces read/write latency by additional cycle
ras cas address data
row col data col data col data
Speedup through overlap Fig. 6.8 The timing diagram in EDORAM
(S)ynchronous and Enhanced Synchronous (ES) DRAM

SDRAM latches data on active edge of clock Eliminates time to detect ras/cas and rd/wr signals A counter is initialized to column address then incremented on active edge of clock to access consecutive memory locations ESDRAM improves SDRAM added buffers enable overlapping of column addressing faster clocking and lower read/write latency possible
clock ras cas address data

row col data data data
Fig. 6.9 The timing diagram in SDRAM
Rambus DRAM (RDRAM)

More of a bus interface architecture than DRAM architecture Data is latched on both rising and falling edge of clock Broken into 4 banks each with own row decoder can have 4 pages open at a time Capable of very high throughput
6.10 DRAM Integration Problem

SRAM easily integrated on same chip as processor DRAM more difficult Different chip making process between DRAM and conventional logic Goal of conventional logic (IC) designers: - minimize parasitic capacitance to reduce signal propagation delays and power consumption Goal of DRAM designers: - create capacitor cells to retain stored information Integration processes beginning to appear
6.11 Memory Management Unit (MMU)

Duties of MMU Handles DRAM refresh, bus interface and arbitration Takes care of memory sharing among multiple processors Translates logic memory addresses from processor to physical memory addresses of DRAM Modern CPUs often come with MMU built-in Single-purpose processors can be used
6.12 Question
Q1. Discuss different types of cache mappings.
Ans: Direct, Fully Associative, Set Associative Q2 Discuss the size of the cache memory on the system performance. Ans:
0.16 0.14 0.12 0.1 % cache miss 0.08 0.06 0.04 0.02 0 1 Kb 2 Kb 4 Kb 8 Kb 16 Kb 32 Kb 64 Kb 128 Kb 1 way 2 way 4 ways 8 way
cache size
Q3. Discuss the differences between EDORAM and SDRAM Ans:
EDO RAM
ras cas address data
row col data col data col data
Speedup through overlap
SDRAM
clock ras cas address data
row col data data data
Module 2
Lesson 7
Digital Signal Processors
After going through this lesson the student would learn o Architecture of a Real time Signal Processing Platform o Different Errors introduced during A-D and D-A converter stage o Digital Signal Processor Architecture o Difference in the complexity of programs between a General Purpose Processor and Digital Signal Processor
Pre-Requisite
Introduction
Evolution of Digital Signal Processors Comparative Performance with General Purpose Processor
7.1 Introduction
Digital Signal Processing deals with algorithms for handling large chunk of data. This branch identified itself as a separate subject in 70s when engineers thought about processing the signals arising from nature in the discrete form. Development of Sampling Theory followed and the design of Analog-to-Digital converters gave an impetus in this direction. The contemporary applications of digital signal processing was mainly in speech followed by Communication, Seismology, Biomedical etc. Later on the field of Image processing emerged as another important area in signal processing. The following broadly defines different processor classes General Purpose - high performance Pentiums, Alpha's, SPARC Used for general purpose software Heavy weight OS - UNIX, NT Workstations, PC's Embedded processors and processor cores ARM, 486SX, Hitachi SH7000, NEC V800 Single program Lightweight, real-time OS DSP support Cellular phones, consumer electronics (e. g. CD players) Microcontrollers Extremely cost sensitive Small word size - 8 bit common Highest volume processors by far Automobiles, toasters, thermostats, ... Version 2 EE IIT, Kharagpur 3
A Digital Signal Processor is required to do the following Digital Signal Processing tasks in real time Signal Modeling Difference Equation Convolution Transfer Function Frequency Response Signal Processing Data Manipulation Algorithms Filtering Estimation What is Digital Signal Processing? Application of mathematical operations to digitally represented signals Signals represented digitally as sequences of samples Digital signals obtained from physical signals via transducers (e.g., microphones) and analog-todigital converters (ADC) Digital signals converted back to physical signals via digital-to-analog converters (DAC) Digital Signal Processor (DSP): electronic system that processes digital signals Signal Processing Analog Processing Analog Processing Measurand Sensor Conditioner Analog Processor LPF ADC
Digital Processing DSP DAC Analog Processor LPF
Fig. 7.1 The basic Signal Processing Platform The above figure represents a Real Time digital signal processing system. The measurand can be temperature, pressure or speech signal which is picked up by a sensor (may be a thermocouple, microphone, a load cell etc). The conditioner is required to filter, demodulate and amplify the signal. The analog processor is generally a low-pass filter used for anti-aliasing effect. The ADC block converts the analog signals into digital form. The DSP block represents the signal processor. The DAC is for Digital to Analog Converter which converts the digital signals into Version 2 EE IIT, Kharagpur 4
analog form. The analog low-pass filter eliminates noise introduced by the interpolation in the DAC.
x (t )
Sampler
xs ( t ) p(t ) x ( n)
ADC Quantizer
xq ( t ) xq ( n )
Coder
bbits
xb ( n )
DAC bbits
xb ( n )
Decoder
Sample/hold
y ( n)
Fig. 7.2 D-A and A-D Conversion Process The performance of the signal processing system depends to the large extent on the ADC. The ADC is specified by the number of bits which defines the resolution. The conversion time decides the sampling time. The errors in the ADC are due to the finite number of bits and finite conversion time. Some times the noise may be introduced by the switching circuits. Similarly the DAC is represented by the number of bits and the settling time at the output. A DSP tasks requires Repetitive numeric computations Attention to numeric fidelity High memory bandwidth, mostly via array accesses Real-time processing And the DSP Design should minimize Cost Power Memory use Development time Take an Example of FIR filtering both by a General Purpose Processor as well as DSP
x (k )
Example FIR Filtering

h(k )
y (k )
y ( k ) = ( h0 + h1 z 1 + h2 z 2 + L + hN 1 z N 1 ) x ( k ) = h0 x ( k ) + h1 x ( k 1) + h2 x ( k 2 ) + L + hN 1 x ( k N + 1) = hi x ( k i ) = h ( k ) * x ( k )
i =0 N 1
An FIR (Finite Impulse Response filter) is represented as shown in the following figure. The output of the filter is a linear combination of the present and past values of the input. It has several advantages such as: Linear Phase Stability Improved Computational Time
x (k)
1
z-1 z
-1
h0 h1 h2
y (k)
z-1
hN -1
Fig. 7.3 Tapped Delay Line representation of an FIR filter
FIR filter on (simple) General Purpose Processor

loop: lw x0, (r0) lw y0, (r1) mul a, x0,y0 add b,a,b inc r0 inc r1 dec ctr Version 2 EE IIT, Kharagpur 6
tst ctr jnz loop sw b,(r2) inc r2 This program assumes that the finite window of input signal is stored at the memory location starting from the address specified by r1 and the equal number filter coefficients are stored at the memory location starting from the address specified by r0. The result will be stored at the memory location starting from the address specified by r2. The program assumes the content of the register b as 0 before the start of the loop. lw x0, (r0) lw y0, (r1) These two instructions load x0 and y0 registers with values from the memory location specified by the registers r0 and r1 with values x0 and y0 mul a, x0,y0 This instruction multiplies x0 with y0 and stores the result in a. add b,a,b This instruction adds a with b (which contains already accumulated result from the previous operation) and stores the result in b. inc r0 inc r1 dec ctr tst ctr jnz loop The above portion of the program increment the registers to point to the next memory location, decrement the counters, to see if the filter order has been reached and tests for 0. It jumps to the start of the loop. sw b,(r2) inc r2 This stores the final result and increments the register r2 to point to the next location.
Let us see the program for an early DSP TMS32010 developed by Texas
Instruments in 80s. It has got the following features 16-bit fixed-point Harvard architecture separate instruction and data memories Accumulator Version 2 EE IIT, Kharagpur 7
Specialized instruction set Load and Accumulate 390 ns Multiple-Accumulate(MAC)
TI TMS32010 (Ist DSP) 1982
Instruction Memory Processor Data Memory Datapath: Mem T-Register Multiplier ALU Accumulator
Fig. 7.4 Basic TMS32010 Architecture The program for the FIR filter (for a 3rd order) is given as follows Here X4, H4, ... are direct (absolute) memory addresses: LT X4 ;Load T with x(n-4) MPY H4 ;P = H4*X4 ;Acc = Acc + P LTD X3 ;Load T with x(n-3); x(n-4) = x(n-3); MPY H3 ; P = H3*X3 ; Acc = Acc + P LTD X2 MPY H2 ... Two instructions per tap, but requires unrolling ; for comment lines
P-Register
LT X4 Loading from direct address X4 MPY H4 Multiply and accumulate LTD X3 Loading and shifting in the data points in the memory The advantages of the DSP over the General Purpose Processor can be written as Multiplication and Accumulation takes place at a time. Therefore this architecture supports filtering kind of tasks. The loading and subsequent shifting is also takes place at a time. II. Questions 1. Discuss the different errors introduced in a typical real time signal processing systems. Answers Various errors are in ADC i. Sampling error ii. Quantization iii. Coding Algorithm iv. in accurate modeling v. Finite word length vi. Round of errors vii. Delay due to finite execution time of the processor DAC viii. Decoding ix. Transients in sampling time
Module 2
Lesson 8
General Purpose Processors - I
In this lesson the student will learn the following Architecture of a General Purpose Processor Various Labels of Pipelines Basic Idea on Different Execution Units Branch Prediction
Pre-requisite
Digital Electronics
8.1 Introduction
The first single chip microprocessor came in 1971 by Intel Corporation. It was called Intel 4004 and that was the first single chip CPU ever built. We can say that was the first general purpose processor. Now the term microprocessor and processor are synonymous. The 4004 was a 4-bit processor, capable of addressing 1K data memory and 4K program memory. It was meant to be used for a simple calculator. The 4004 had 46 instructions, using only 2,300 transistors in a 16pin DIP. It ran at a clock rate of 740kHz (eight clock cycles per CPU cycle of 10.8 microseconds). In 1975, Motorola introduced the 6800, a chip with 78 instructions and probably the first microprocessor with an index register. In 1979, Motorola introduced the 68000. With internal 32-bit registers and a 32-bit address space, its bus was still 16 bits due to hardware prices. On the other hand in 1976, Intel designed 8085 with more instructions to enable/disable three added interrupt pins (and the serial I/O pins). They also simplified hardware so that it used only +5V power, and added clock-generator and bus-controller circuits on the chip. In 1978, Intel introduced the 8086, a 16-bit processor which gave rise to the x86 architecture. It did not contain floating-point instructions. In 1980 the company released the 8087, the first math coprocessor they'd developed. Next came the 8088, the processor for the first IBM PC. Even though IBM engineers at the time wanted to use the Motorola 68000 in the PC, the company already had the rights to produce the 8086 line (by trading rights to Intel for its bubble memory) and it could use modified 8085-type components (and 68000-style components were much more scarce). Table 1 Development History of Intel Microprocessors Intel Processor 4004 8008 8080 8086 8088 Intel286TM Intel386TM Intel486TM PentiumTM PentiumTM Pro PentiumTM II Year of Introduction 1971 1972 1974 1978 1979 1982 1985 1989 1993 1995 1997 Initial Clock Speed 108 kHz 500-800 KHz 2 MHz 5 MHz 5 MHz 6 MHz 16 MHz 25 MHz 66 MHz 200 MHz 300 MHz Number of Transistors 2300 3500 4500 29000 29000 134,000 275,000 1.2 Million 3.1 Million 5.5 Million 7.5 Million Circuit Line Width 10 micron 10 micron 6 micron 3 micron 3 micron 1.5 micron 1.5 micron 1 Micron 0.8 Micron 0.35 Micron 0.25 Micron
CeleronTM PentiumTM III PentiumTM IV ItaniumTM Intel Xeon ItaniumTM 2 PentiumTM M
1998 1999 2000 2001 2001 2002 2005
266 MHz 500 MHz 1.5MHz 800 MHz 1.7 GHz 1 GHz 1.5 GHz
7.5 Million 9.5 Million 42 Million 25 Million 42 million 220 million 140 Million
0.25 Micron 0.25 Micron 0.18 Micron 0.18 Micron 0.18 micron 0.18 micron 90 nm
The development history of Intel family of processors is shown in Table 1. The Very Large Scale Integration (VLSI) technology has been the main driving force behind the development.
8.2 A Typical Processor
Fig. 8.2 The photograph The photograph and architecture of a modern general purpose processor from VIA (C3) (please refer lesson on Embedded components 2) is shown in Fig2 and Fig. 8.3 respectively.
I-Cache 64 KB 4-way
&
I-TLB
I I-Fetch
128-ent 8-way B 8-ent PDC V predecode
Return stack 3 BHTs Branch Prediction BTB Bus Unit L2 cache 64 Kb 4-way
decode buffer Decode 4-entry inst Q Translate 4-entry inst Q Register File address calculation ROM R A D G Execute Integer ALU Store-Branch Write back E S W MMX/ 3D unit FP Q FP unit X F Decode & Translate
D-Cache & D-TLB - 64 KB - 128-ent 8-way 4 way 8-ent PDC
Store Buffers
Write Buffers
Fig. 8.3 The architecture
Specification
Name: VIA C3TM in EBGA: VIA C3 is the name of the company and EBGA for Enhanced Ball Grid Array, clock speed is 1 GHz Ball Grid Array. (Sometimes abbreviated BG.) A ball grid array is a type of microchip connection methodology. Ball grid array chips typically use a group of solder dots, or balls, Version 2 EE IIT, Kharagpur 5
arranged in concentric rectangles to connect to a circuit board. BGA chips are often used in mobile applications where Pin Grid Array (PGA) chips would take up too much space due to the length of the pins used to connect the chips to the circuit board.
SIMM DIP
PGA
SIP Fig. 8.4 Pin Grid Array (PGAA)
Fig. 8.5 Ball Grid Array
Fig. 8.6 The Bottom View of the Processor
The Architecture
The processor has a 12-stage integer pipe lined structure: Pipe Line: This is a very important characteristic of a modern general purpose processor. A program is a set of instructions stored in memory. During execution a processor has to fetch these instructions from the memory, decode it and execute them. This process takes few clock cycles. To increase the speed of such processes the processor divide itself into different units. While one unit gets the instructions from the memory, another unit decodes them and some other unit executes them. This is called pipelining. This can be termed as segmenting a functional unit such that it can accept new operands every cycle while the total execution of the instruction may take many cycles. The pipeline construction works like a conveyor belt accepting units until the pipeline is filled and than producing results every cycle. The above processors has got such a pipeline divided into 12stages There are four major functional groups: I-fetch, decode and translate, execution, and data cache. The I-fetch components deliver instruction bytes from the large I-cache or the external bus. The decode and translate components convert these instruction bytes into internal execution forms. If there is any branching operation in the program it is identified here and the processor starts getting new instructions from a different location. The execution components issue, execute, and retire internal instructions Version 2 EE IIT, Kharagpur 7
The data cache components manage the efficient loading and storing of execution data to and from the caches, bus, and internal components
Instruction Fetch Unit
I-Cache
64 KB 4-way
&
I-TLB
128-ent 8-way 8-ent PDC
I B V
predecode decode buffer

Fig. 8.7
First three pipeline stages (I, B, V) deliver aligned instruction data from the I-cache (Instruction Cache) or external bus into the instruction decode buffers. The primary I-cache contains 64 KB organized as four-way set associative with 32-byte lines. The associated large I-TLB(Instruction Translation Look-aside Buffer) contains 128 entries organized as 8-way set associative. TLB: translation look-aside buffer a table in the processors memory that contains information about the pages in memory the processor has accessed recently. The table cross-references a programs virtual addresses with the corresponding absolute addresses in physical memory that the program has most recently used. The TLB enables faster computing because it allows the address processing to take place independent of the normal address-translation pipeline. The instruction data is predecoded as it comes out of the cache; this predecode is overlapped with other required operations and, thus, effectively takes no time. The fetched instruction data is placed sequentially into multiple buffers. Starting with a branch, the first branch-target byte is left adjusted into the instruction decode buffer.
Instruction Decode Unit

Predecode Return stack 3 BHTs Branch Prediction Decode buffer Decode 4-entry inst Q Translate
Fig. 8.8 Instruction bytes are decoded and translated into the internal format by two pipeline stages (F,X). The F stage decodes and formats an instruction into an intermediate format. The internalformat instructions are placed into a five-deep FIFO(First-In-First-Out) queue: the FIQ. The Xstage translates an intermediate-form instruction from the FIQ into the internal microinstruction format. Instruction fetch, decode, and translation are made asynchronous from execution via a five-entry FIFO queue (the XIQ) between the translator and the execution unit.
Decode & Translate
BTB
Branch Prediction
predec Return stack 3 BHTs Decode buffer
Dec Branch Prediction 4-entry inst Q BTB

Fig. 8.9
Tran
BHT Branch History Table and BTB Branch Target Buffer

The programs often invoke subroutines which are stored at a different location in the memory. In general the instruction fetch mechanism fetches instructions beforehand and keeps them in the cache memory at different stages and sends them for decoding. In case of a branch all such instructions need to be abandoned and new set of instruction codes from the corresponding subroutine is to be loaded. Prediction of branch earlier in the pipeline can save time in flushing out the current instructions and getting new instructions. Branch prediction is a technique that attempts to infer the proper next instruction address, knowing only the current one. Typically it uses a Branch Target Buffer (BTB), a small, associative memory that watches the instruction cache index and tries to predict which index should be accessed next, based on branch history which stored in another set of buffers known as Branch History Table (BHT). This is carried out in the F stage.
Integer Unit
BT Bus Unit L2
64 Kb 4-way
Translate 4-entry inst Q ROM
Register address calculation D-Cache & D-TLB - 64 KB - 128-ent 8-way 4 way 8-ent PDC
R A D G
Integer ALU Store-Branch Writeback Store Buffers
E S W
Write Buffers Fig. 8.10 Decode stage (R): Micro-instructions are decoded, integer register files are accessed and resource dependencies are evaluated. Addressing stage (A): Memory addresses are calculated and sent to the D-cache (Data Cache). Version 2 EE IIT, Kharagpur 10
Cache Access stages (D, G): The D-cache and D-TLB (Data Translation Look aside Buffer) are accessed and aligned load data returned at the end of the G-stage. Execute stage (E): Integer ALU operations are performed. All basic ALU functions take one clock except multiply and divide. Store stage (S): Integer store data is grabbed in this stage and placed in a store buffer. Write-back stage (W): The results of operations are committed to the register file.
Data-Cache and Data Path
BTB Socket 370 Bus Bus Unit L2 cache 64 Kb 4-way
Translate 4-entry inst Q ROM
Register File address calculation
R A D G
D-Cache
&
D-TLB - 128-ent 8-way 8-ent PDC
- 64 KB 4-way
Integer ALU
Fig. 8.11 The D-cache contains 64 KB organized as four-way set associative with 32-byte lines. The associated large D-TLB contains 128 entries organized as 8-way set associative. The cache, TLB, and page directory cache all use a pseudo-LRU (Least Recently Used) replacement algorithm
The L2-Cache Memory
BTB Socket 370 Bus Bus Unit L2 cache 64 Kb 4-way
Translator 4-entry inst Q Register F
address calculation D-Cache & D-TLB - 128-ent 8-way 8-ent PDC
- 64 KB 4-way
Fig. 8.12 The L2 cache at any point in time are not contained in the two 64-KB L1 caches. As lines are displaced from the L1 caches (due to bringing in new lines from memory), the displaced lines are placed in the L2 cache. Thus, a future L1-cache miss on this displaced line can be satisfied by returning the line from the L2 cache instead of having to access the external memory.
FP, MMX and 3D Uni

Ececute FP Q MMX/ 3D Unit FP Unit
Integer ALU Store-Branch Writeback
E S W
Fig. 8.13 FP; Floating Point Processing Unit MMX: Multimedia Extension or Matrix Math Extension Unit Version 2 EE IIT, Kharagpur 12
3D: Special set of instructions for 3D graphics capabilities In addition to the integer execution unit, there is a separate 80-bit floating-point execution unit that can execute floating-point instructions in parallel with integer instructions. Floating-point instructions proceed through the integer R, A, D, and G stages. Floating-point instructions are passed from the integer pipeline to the FP-unit through a FIFO queue. This queue, which runs at the processor clock speed, decouples the slower running FP unit from the integer pipeline so that the integer pipeline can continue to process instructions overlapped with FP instructions. Basic arithmetic floating-point instructions (add, multiply, divide, square root, compare, etc.) are represented by a single internal floating-point instruction. Certain little-used and complex floating point instructions (sin, tan, etc.), however, are implemented in microcode and are represented by a long stream of instructions coming from the ROM. These instructions tie up the integer instruction pipeline such that integer execution cannot proceed until they complete. This processor contains a separate execution unit for the MMX-compatible instructions. MMX instructions proceed through the integer R, A, D, and G stages. One MMX instruction can issue into the MMX unit every clock. The MMX multiplier is fully pipelined and can start one nondependent MMX multiply[-add] instruction (which consists of up to four separate multiplies) every clock. Other MMX instructions execute in one clock. Multiplies followed by a dependent MMX instruction require two clocks. Architecturally, the MMX registers are the same as the floating-point registers. However, there are actually two different register files (one in the FPunit and one in the MMX units) that are kept synchronized by hardware. There is a separate execution unit for some specific 3D instructions. These instructions provide assistance for graphics transformations via new SIMD(Single Instruction Multiple Data) singleprecision floating-point capabilities. These instruction-codes proceed through the integer R, A, D, and G stages. One 3D instruction can issue into the 3D unit every clock. The 3D unit has two single-precision floating-point multipliers and two single-precision floating-point adders. Other functions such as conversions, reciprocal, and reciprocal square root are provided. The multiplier and adder are fully pipelined and can start any non-dependent 3D instructions every clock.
8.3 Conclusion
This lesson discussed about the architecture of a typical modern general purpose processor(VIA C3) which similar to the x86 family of microprocessors in the Intel family. In fact this processor uses the same x86 instruction set as used by the Intel processor. It is a pipelined architecture. The General Purpose Processor Architecture has the following characteristics Multiple Stages of Pipeline More than one Level of Cache Memory Branch Prediction Mechanism at the early stage of Pipe Line Separate and Independent Processing Units (Integer Floating Point, MMX, 3D etc) Because of the uncertainties associated with Branching the overall instruction execution time is not fixed (therefore it is not suitable for some of the real time applications which need accurate execution speed) It handles a very complex instruction set The over all power consumption because of the complexity of the processor is higher In the next lesson we shall discuss the signals associated with such a processor. Version 2 EE IIT, Kharagpur 13
8.4 Questions and Answers

Q1. Draw the architecture of a similar processor (say P4) from Intel Family and study the various units. Q2. What is meant by the superscalar architecture in Intel family of processors? How is it different/similar to pipelined architecture? Q3. What kind of instructions do you expect for MMX units? Are they SIMD instructions? Q4. How do you evaluate sin(x) by hardware? Q5. What is the method to determine the execution time for a particular instruction for such processors? Q6. Enlist some instructions for the above processor. Q7. What is power consumption of this processor? How do you specify them? Q8. Give the various logic level voltages for the VIA C3 processor. Q9. How do you number the pins in an EBGA chip? Q10. What is the advantages of EBGA over PGA?
Answers
Q1. Intel P4 Net-Burst architecture System Bus Frequently used paths Less frequently used paths
Bus Unit
3rd Level Cache Optional 2nd Level Cache 8-Way Front End Fetch/Decode Trace Cache Microcode ROM Execution Out-Of-Order Core Retirement 1st Level Cache 4-Way
BTBs/Branch Prediction
Branch History Update
Q.2 Superscalar architecture refers to the use of multiple execution units, to allow the processing of more than one instruction at a time. This can be thought of as a form of "internal multiprocessing", since there really are multiple parallel processors inside the CPU. Most modern processors are superscalar; some have more parallel execution units than others. It can be said to consist of multiple pipelines. Q3. Some MMX instructions from x86 family MOVQ Move quadword PUNPCKHWD Unpack high-order words PADDUSW Add packed unsigned word integers with unsigned saturation They also can be SIMD instructions. Q4. (a) (b) (c) Look Up Table Taylor Series From the complex exponential
Q5. This is done by averaging the instruction execution in various programming models which includes latency and overhead. This is a statistical measure. Q6. All x86 family instructions will work. Q7. around 7.5 watts Q8. Parameter VIL Input Low Voltage VIH1.5 Input High Voltage VIH2.5 Input High Voltage VOL Low Level Output Voltage VOH High Level Output Voltage IOL Low Level Output Current ILI Input Leakage Current ILO Output Leakage Current Q9. Refer Text Q10. Refer Text Min -0.58 VREF + 0.2 2.0 Max 0.700 VTT 3.18 0.40 VCMOS 100 100 Units V V V V V mA A A Notes (2) (3) @IOL (1) @VCL
Module 2
Lesson 9
General Purpose Processors - II
Signals
In this lesson the student will learn the following Signals of a General Purpose Processor Multiplexing Address Signals Data Signals Control Bus Arbitration Signals Status Signal Indicators Sleep State Indicators Interrupts
Pre-requisite
Digital Electronics
9.1 Introduction
The input/output signals of a processor chip are the matter discussion in this chapter. We shall take up the same VIA C3 processor as discussed in the last chapter. In the design flow of a processor the internal architecture is determined and simulated for optimal performance.
APPLICATION REQUIREMENT CAPTURE FUNCTIONAL
INSTRUCTION SET DESIGN AND CODING
INITIAL ABSTRACT INSTRUCTION SET
FINAL INSTRUCTION SET & INITIAL ARCHITECTURE
ASIC HW FLOW
SW TOOLS FLOW
ENVIRONMENT REQUIREMENT CAPTURE FUNCTIONAL NON FUNCTIONAL
EXPLORATION OF ARCHITECTURES ESTIMATION
AUGMENTED ABSTRACT INSTRUCTION SET ARCHITECTURE
FINAL INSTRUCTION SET & FINAL ARCHITECTURE
PROCESSOR TOOLS & HW IMPLEMENTATION
Fig. 9.1 The overall design flow for a typical processor Version 2 EE IIT, Kharagpur 3
The basic architecture decides the signals. Broadly the signals can be classified as: 1. Address Signals 2. Data Signals 3. Control Signals 4. Power Supply Signals Some of these signals are multiplexed in time for making the VLSI design easier and efficient without affecting the over all performance.
Multiplexed in Time (known as Time Division Multiplexing)

A digital data transmission method that takes signals from multiple sources, divides them into pieces which are then placed periodically into time slots,(clock cycles here) transmits them down a single path and reassembles the time slots back into multiple signals on the remote end of the transmission
Fig. 9.2 Bottom View of the Processor
9.2 Signals of VIA Processor discussed earlier

The following lines discuss the various signals associated with the processor. A[31:3]# The address Bus provides addresses for physical memory and external I/O devices. During cache inquiry cycles, A31#-A3# are used as inputs to perform snoop cycles. This is an output signal when it sends and address to the memory and I/O device. It serves as both input and output during snoop cycles. It is synchronized with the Bus Clock (BCLK) Snoop cycles: The term "snooping" commonly refers to at least three different actions.
Inquire Cycles: These are bus cycles, initiated by external logic, that cause the processor to look up an address in its physical cache tags.
Internal Snooping: These are internal actions by the processor (rather than external logic) that are taken during certain types of cache accesses in order to detect selfmodifying code. Bus Watching: Some caching devices watch their address and data bus continuously while they are held off the bus, comparing every address driven by another bus master with their internal cache tags and optionally updating their cached lines on the fly, during write backs by the other master.
A20M# A20 Mask causes the CPU to make (force to 0) the A20 address bit when driving the external address bus or performing an internal cache access. A20M# is provided to emulate the 1 M Byte address wrap-around that occurs on the x86. Snoop addressing is not affected. It is an input signal. If it is not used then it is connected to the power supply. This is not synchronized with the Bus Clock or anything. ADS# Address Strobe begins a memory/I/O cycle and indicates the address bus (A31#-A3#) and transaction request signals (REQ#) are valid. This is an output signal during addressing cycle and an input/output signal during transaction request cycles. This is synchronized with the bus clock. Memory /I/O cycle: The memory and input output data transfer (read or write) is carried out in different clock cycles. The address is first loaded on the address bus. The processor being faster waits till the memory or input/output is ready to send or receive the date through the data bus. Normally it takes more than one clock cycle. Transaction Request Cycle: When the external device request the CPU to transmit data. The request comes through this line. BCLK Bus Clock: provides the fundamental timing for the CPU. The frequency of the input clock determines the operating frequency of the CPUs bus. External timing is defined referenced to the rising edge of CLK. It is an Input clock signal. BNR# Block Next Request: signals a bus stall by a bus agent unable to accept new transactions. This is an input or output signal and is synchronized with the bus clock. BPRI# Priority Agent Bus Request arbitrates for ownership of the system bus. Input and is synchronized with the Bus clock. Bus Arbitration: At times external devices signal the processor to release the system address/data/control bus from its control. This is achieved by an external request which normally comes from the external devices such as a DMA controller or a Coprocessor. BR[4:0]: Hardware strapping options for setting the processors internal clock multiplier. By strapping these wires to the supply or ground (some times they can be kept open for making them 1). This option divides the input clock. BSEL[1:0]: Bus frequency select balls (BSEL 0 and BSEL 1) identify the appropriate bus speed (100 MHz or 133 MHz). It is an output signal. BR0#: It drives the BREQ[0]# signal in the system to request access to the system bus. D[63:0]#: Data Bus signals are bi-directional signals which provide the data path between the CPU and external memory and I/O devices. The data bus must assert DRDY# to indicate valid data transfer. This is both input as well as output. Version 2 EE IIT, Kharagpur 6
DBSY#: Data Bus Busy is asserted by the data bus driver to indicate data bus is in use. This is both input as well as output. DEFER#: Defer is asserted by target agent and indicates the transaction cannot be guaranteed as an in-order completion. This is an input signal. DRDY#: Data Ready is asserted by data driver to indicate that a valid signal is on the data bus. This is both input and output signal. FERR#: FPU Error Status indicates an unmasked floating-point error has occurred. FERR# is asserted during execution of the FPU instruction that caused the error. This is an output signal. FLUSH#: Flush Internal Caches writing back all data in the modified state. This is an input signal to the CPU. HIT#: Snoop Hit indicates that the current cache inquiry address has been found in the cache. This is both input as well as output signal. HITM#: Snoop Hit Modified indicates that the current cache inquiry address has been found in the cache and dirty data exists in the cache line (modified state). (both input/output) INIT#: Initialization resets integer registers and does not affect internal cache or floating point registers. (Input) INTR: Maskable Interrupt I. This is an input signal to the CPU. NMI: Non-Maskable Interrupt I LOCK#: Lock Status is used by the CPU to signal to the target that the operation is atomic. An atomic operation is any operation that a CPU can perform such that all results will be made visible to each CPU at the same time and whose operation is safe from interference by other CPUs. For example, reading or writing a word of memory is an atomic operation. NCHCTRL: The CPU uses this ball to control integrated I/O pull-ups. A resistance is to be connected here to control the current on the input/output pins. PWRGD (power good) Indicates that the processors VCC is stable. It is an input signal. REQ[4:0]#: Request Command is asserted by bus driver to define current transaction type. RESET#: This is an input that resets the processor and invalidates internal cache without writing back. RTTCTRL: The CPU uses this ball to control the output impedance. RS[2:0]#: Response Status is an input that signals the completion status of the current transaction when the CPU is the response agent. SLP#: Sleep when asserted in the stop grant state, causes the CPU to enter the sleep state. Version 2 EE IIT, Kharagpur 7
Different Sleep states

"Stop Grant" Power to CPU is maintained, but no instructions are executed. The CPU halts itself and may shut down many of its internal components. In Microsoft Windows, the "Standby" command is associated with this state by default. "Suspend to RAM" All power to the CPU is shut off, and the contents of its registers are flushed to RAM, which remains on. This system state is the most prone to errors and instability. "Suspend to Disk" CPU power shut off, but RAM is written to disk and shut off as well. In Microsoft Windows, the "Hibernate" command is associated with this state. Because the contents of RAM are written out to disk, system context is maintained. For example, unsaved files would not be lost following this. "Soft Off" System is shut down, however some power may be supplied to certain devices to generate a wake event, for example to support automatic startup from a LAN or USB device. In Microsoft Windows, the "Shut down" command is associated with this state. Mechanical power can usually be removed or restored with no ill effects. Processor "C" power states Processor "C" power states are also defined. These are typically implemented in laptop platforms only. Here the cpu consumes less power while still doing work, and the tradeoff comes between power and performance, rather than power and latency. SMI#: System Management (SMM) Interrupt forces the processor to save the CPU state to the top of SMM memory and to begin execution of the SMI services routine at the beginning of the defined SMM memory space. An SMI is a high-priority interrupt than NMI. STPCLK#: Stop Clock Input causes the CPU to enter the stop grant state. TRDY#: Target Ready Input indicates that the target is ready to receive a write or write-back transfer from the CPU. VID[3:0]: Voltage Identification Bus informs the regulatory system on the motherboard of the CPU Core voltage requirements. This is an output signal.
9.3 Conclusion
In this chapter the various signals of a typical general purpose processor has been discussed. Broadly we can classify them into the following categories. Address Signals: They are used to address the memory as well as input/output devices. They are often multiplexed with other control signals. In such cases External Bus controllers latch these address lines and make them available for a longer time for the memory and input/output devices while the CPU changes the status of the same. The Bus controllers drive their inputs which are Version 2 EE IIT, Kharagpur 8
connected to the CPU to high impedance so as not to interfere with the current state of these lines from the CPU. Data Signals: These lines carry the data to and fro the processor and memory or i/o devices. Transceivers are connected on the data path to control the data flow. The data flow might succeed some bus transaction signals. This bus transaction signals are necessary to negotiate the speed mismatch between the input/output and the processor. Control Signals: These can be generally divided into the following groups Read Write Control Memory Write The processor issues this signal while sending data to the memory Memory Read The processor issues this signal while reading the data from the memory I/O Read The input/output read signal which is generally preceded by some bus transaction signals I/O Write The input/output read signal which is generally succeeded by some bus transaction signals These read write signals are not generally directly available from the CPU. They are decoded from a set of status signal by an external bus controller.
Bus Transaction Control
Master versus Slave

Bus master master send address Bus slave data can go either way
A bus transaction includes two parts: sending the address and receiving or sending the data. Master is the one who starts the bus transaction by sending the address. Slave is the one who responds to the address by sending data to the master if the master asks for data and receiving data from master if master wants to send data. These are controlled by signals like Ready, Defer etc.
Bus Arbitration Control

Bus Master
Control: master initiates requests Data can go either way
Bus Slave
This is known as requesting to obtain the access to a bus. They are achieved by the following lines. Bus Request: The slave requests for the access grant Bus Grant: Gets the grant signal Lock: For specific operations the bus requests are not granted as the CPU might be doing some important operations. Version 2 EE IIT, Kharagpur 9
Interrupt Control
In a multitasking environment the Interrupts are external signals to the CPU for emergency operations. The CPU executes the interrupt service routines while acknowledging the interrupts. The interrupts are processed according to their priority. More discussion is available in subsequent lessons.
Processor Control
These lines are activated when there is a power on or the processor comes up from a powersaving mode such as sleep. These are Reset Test lines etc. Some of the above signals will be discussed in the subsequent lessons.

Q1. What is maximum memory addressing capability of the processor discussed in this lecture? Ans: The number of address lines is 32. Therefore it can address 232 locations which is 4G bytes Q2. What do you understand by POST in a desktop computer?
Ans: It is called Power On Self Test. This is a routine executed to check the proper functioning of Hard Disk, CDROM, Floppy Disk and many other on-board and off-board components while the computer is powered on. Q3. Describe the various power-saving modes in a general purpose CPU?
Ans: Refer to: Sleep Mode in Text Q4. What could be the differences in design of a processor to be used in the following applications? LAPTOP Desktop Motor Control Ans: LAPTOP processor: should be complex General Purpose Processor with low power consumption and various power saving modes. Desktop: High Performance processor which has no limit on power consumption. Motor Control: Simple low power specialized processor with on-chip peripherals with Real Time Operating System. Q5. What is the advantage of reducing the High state voltage from 5 V to 3.5 volts? What are the disadvantages? Version 2 EE IIT, Kharagpur 10
Ans: It reduces the interference but decreases the noise accommodation. Q6. What is the use of Power-Good signal? Ans: It is used to know the quality of supply in side the CPU. If it is not good there may maloperations and data loss.
Module 2
Lesson 10
Embedded Processors - I
In this lesson the student will learn the following Architecture of an Embedded Processor The Architectural Overview of Intel MCS 96 family of Microcontrollers
Pre-requisite
Digital Electronics
10.1 Introduction
It is generally difficult to draw a clear-cut boundary between the class of microcontrollers and general purpose microprocessors. Distinctions can be made or assumed on the following grounds. Microcontrollers are generally associated with the embedded applications Microprocessors are associated with the desktop computers Microcontrollers will have simpler memory hierarchy i.e. the RAM and ROM may exist on the same chip and generally the cache memory will be absent. The power consumption and temperature rise of microcontroller is restricted because of the constraints on the physical dimensions. 8-bit and 16-bit microcontrollers are very popular with a simpler design as compared to large bit-length (32-bit, 64-bit) complex general purpose processors.
However, recently, the market for 32-bit embedded processors has been growing. Further the issues such as power consumption, cost, and integrated peripherals differentiate a desktop CPU from an embedded processor. Other important features include the interrupt response time, the amount of on-chip RAM or ROM, and the number of parallel ports. The desktop world values processing power, whereas an embedded microprocessor must do the job for a particular application at the lowest possible cost.
32- or 64-bit desktop processors Performance Embedded control
32-bit embedded controllers/processor

8- or 16-bit controller
4-bit controller Cost Fig. 10.1 The Performance vs Cost regions
ROM
EEPROM RAM
Microprocessor A/D Analog I/O D/A Input and output ports Input and output ports
Serial I/O Parallel I/O Timer PWM
(a) Microprocessor-based system
ROM
EEPROM RAM
Analog in
A/D CPU core
Serial I/O
Parallel I/O
Timer Analog out
PWM Microcontroller
Filter
(b) Microcontroller-based system
Digital PWM
Fig. 10.2 Microprocessor versus microcontroller Fig. 10.1 shows the performance cost plot of the available microprocessors. Naturally the more is the performance the more is the cost. The embedded controllers occupy the lower left hand corner of the plot. Fig.10.2 shows the architectural difference between two systems with a general purpose microprocessor and a microcontroller. The hardware requirement in the former system is more than that of later. Separate chips or circuits for serial interface, parallel interface, memory and AD-DA converters are necessary On the other hand the functionality, flexibility and the complexity of information handling is more in case of the former. Version 2 EE IIT, Kharagpur 5
10.2 The Architecture of a Typical Microcontroller

A typical microcontroller chip from the Intel 80X96 family is discussed in the following paragraphs.
Core
Optional ROM
Interrupt Controller
Clock and Power Mgmt.
PTS
I/O
EPA
PWM
WG
A/D
WDT
FG
SIO
Fig. 10.3 The Architectural Block diagram of Intel 8XC196 Microcontroller PTS: Peripheral Transaction Server; I/O: Input/Output Interface; EPA: Event Processor Array; PWM: Pulse Width Modulated Outputs; WG: Waveform Generator; A/D- Analog to Digital Converter; FG: Frequency Generator; SIO: Serial Input/Output Port Fig. 10.3 shows the functional block diagram of the microcontroller. The core of the microcontroller consists of the central processing unit (CPU) and memory controller. The CPU contains the register file and the register arithmetic-logic unit (RALU). A 16-bit internal bus connects the CPU to both the memory controller and the interrupt controller. An extension of this bus connects the CPU to the internal peripheral modules. An 8-bit internal bus transfers instruction bytes from the memory controller to the instruction register in the RALU.
CPU Register File RALU Microcode Engine Register RAM ALU Master PC PSW CPU SFRs Registers
Memory Controller Prefetch Queue Slave PC Address Register Data Register
Bus Controller
Fig. 10.4 The Architectural Block diagram of the core CPU: Central Processing Unit; RALU: Register Arithmetic Logic Unit; ALU: Arithmetic Logic Unit; Master PC: Master Program Counter; PSW: Processor Status Word; SFR: Special Function Registers
CPU Control
The CPU is controlled by the microcode engine, which instructs the RALU to perform operations using bytes, words, or double-words from either the 256-byte lower register file or through a window that directly accesses the upper register file. Windowing is a technique that maps blocks of the upper register file into a window in the lower register file. CPU instructions move from the 4-byte prefetch queue in the memory controller into the RALUs instruction register. The microcode engine decodes the instructions and then generates the sequence of events that cause desired functions to occur.
Register File
The register file is divided into an upper and a lower file. In the lower register file, the lowest 24 bytes are allocated to the CPUs special-function registers (SFRs) and the stack pointer, while the remainder is available as general-purpose register RAM. The upper register file contains only general-purpose register RAM. The register RAM can be accessed as bytes, words, or double words. The RALU accesses the upper and lower register files differently. The lower register file is always directly accessible with direct addressing. The upper register file is accessible with direct addressing only when windowing is enabled.
Register Arithmetic-logic Unit (RALU)

The RALU contains the microcode engine, the 16-bit arithmetic logic unit (ALU), the master program counter (PC), the processor status word (PSW), and several registers. The registers in the RALU are the instruction register, a constants register, a bit-select register, a loop counter, and three temporary registers (the upper-word, lower-word, and second-operand registers). The PSW contains one bit (PSW.1) that globally enables or disables servicing of all maskable interrupts, one bit (PSW.2) that enables or disables the peripheral transaction server (PTS), and six Boolean flags that reflect the state of your program. All registers, except the 3-bit bit-select register and the 6-bit loop counter, are either 16 or 17 bits (16 bits plus a sign extension). Some of these registers can reduce the ALUs workload by performing simple operations. The RALU uses the upper- and lower-word registers together for the 32-bit instructions and as temporary registers for many instructions. These registers have their own shift logic and are used for operations that require logical shifts, including normalize, multiply, and divide operations. The six-bit loop counter counts repetitive shifts. The second-operand register stores the second operand for two-operand instructions, including the multiplier during multiply operations and the divisor during divide operations. During subtraction operations, the output of this register is complemented before it is moved into the ALU. The RALU speeds up calculations by storing constants (e.g., 0, 1, and 2) in the constants register so that they are readily available when complementing, incrementing, or decrementing bytes or words. In addition, the constants register generates single-bit masks, based on the bit-select register, for bit-test instructions.
Code Execution
The RALU performs most calculations for the microcontroller, but it does not use an accumulator. Instead it operates directly on the lower register file, which essentially provides 256 accumulators. Because data does not flow through a single accumulator, the microcontrollers code executes faster and more efficiently.
Instruction Format
These microcontrollers combine general-purpose registers with a three-operand instruction format. This format allows a single instruction to specify two source registers and a separate destination register. For example, the following instruction multiplies two 16-bit variables and stores the 32-bit result in a third variable.
Memory Interface Unit

The RALU communicates with all memory, except the register file and peripheral SFRs, through the memory controller. The memory controller contains the prefetch queue, the slave program counter (slave PC), address and data registers, and the bus controller. The bus controller drives the memory bus, which consists of an internal memory bus and the external address/data bus. The bus controller receives memory-access requests from either the RALU or the prefetch queue; queue requests always have priority. Version 2 EE IIT, Kharagpur 8
When the bus controller receives a request from the queue, it fetches the code from the address contained in the slave PC. The slave PC increases execution speed because the next instruction byte is available immediately and the processor need not wait for the master PC to send the address to the memory controller. If a jump interrupt, call, or return changes the address sequence, the master PC loads the new address into the slave PC, then the CPU flushes the queue and continues processing.
Interrupt Service
The interrupt-handling system has two main components: the programmable interrupt controller and the peripheral transaction server (PTS). The programmable interrupt controller has a hardware priority scheme that can be modified by the software. Interrupts that go through the interrupt controller are serviced by interrupt service routines those are provided by you. The peripheral transaction server (PTS) which is a microcoded hardware interrupt-processor provides efficient interrupt handling.
Disable Clock Input (Powerdown) XTAL 1 FXTAL 1 Divide-by-two Circuit Disable Clocks (Powerdown) XTAL 2 Disable Oscillator (Powerdown) Clock Generators Peripheral Clocks (PH1, PH2) CLKOUT CPU Clocks (PH1, PH2) Disable Clocks (Idle, Powerdown)
Fig. 10.5 The clock circuitry
Internal Timing
The clock circuitry (Fig. 10.5) receives an input clock signal on XTAL1 provided by an external crystal or oscillator and divides the frequency by two. The clock generators accept the divided input frequency from the divide-by-two circuit and produce two non-overlapping internal timing signals, Phase 1(PH1) and Phase 2 (PH2). These signals are active when high.
XTAL 1 TXTAL 1 TXTAL 1 1 State Time
1 State Time PH 1
PH 2 CLKOUT Phase 1 Phase 2 Phase 1 Phase 2
Fig. 10.6 The internal clock phases The rising edges of PH1 and PH2 generate the internal CLKOUT signal (Fig. 10.6). The clock circuitry routes separate internal clock signals to the CPU and the peripherals to provide flexibility in power management. Because of the complex logic in the clock circuitry, the signal on the CLKOUT pin is a delayed version of the internal CLKOUT signal. This delay varies with temperature and voltage.
I/O Ports
Individual I/O port pins are multiplexed to serve as standard I/O or to carry special function signals associated with an on-chip peripheral or an off-chip component. If a particular specialfunction signal is not used in an application, the associated pin can be individually configured to serve as a standard I/O pin. Ports 3 and 4 are exceptions; they are controlled at the port level. When the bus controller needs to use the address/data bus, it takes control of the ports. When the address/data bus is idle, you can use the ports for I/O. Port 0 is an input-only port that is also the analog input for the A/D converter. For more details the reader is requested to see the data manual at www.intel.com/design/mcs96/manuals/27218103.pdf.
Serial I/O (SIO) Port

The microcontroller has a two-channel serial I/O port that shares pins with ports 1 and 2. Some versions of this microcontroller may not have any. The serial I/O (SIO) port is an asynchronous/synchronous port that includes a universal asynchronous receiver and transmitter (UART). The UART has two synchronous modes (modes 0 and 4) and three asynchronous modes (modes 1, 2, and 3) for both transmission and reception. The asynchronous modes are full duplex, meaning that they can transmit and receive data simultaneously. The receiver is buffered, so the reception of a second byte can begin before the first byte is read. The transmitter is also buffered, allowing continuous transmissions. The SIO port has two channels (channels 0 and 1) with identical signals and registers. Version 2 EE IIT, Kharagpur 10
Event Processor Array (EPA) and Timer/Counters

The event processor array (EPA) performs high-speed input and output functions associated with its timer/counters. In the input mode, the EPA monitors an input for signal transitions. When an event occurs, the EPA records the timer value associated with it. This is called a capture event. In the output mode, the EPA monitors a timer until its value matches that of a stored time value. When a match occurs, the EPA triggers an output event, which can set, clear, or toggle an output pin. This is called a compare event. Both capture and compare events can initiate interrupts, which can be serviced by either the interrupt controller or the PTS. Timer 1 and timer 2 are both 16-bit up/down timer/counters that can be clocked internally or externally. Each timer/counter is called a timer if it is clocked internally and a counter if it is clocked externally.
Pulse-width Modulator (PWM)

The output waveform from each PWM channel is a variable duty-cycle pulse. Several types of electric motor control applications require a PWM waveform for most efficient operation. When filtered, the PWM waveform produces a DC level that can change in 256 steps by varying the duty cycle. The number of steps per PWM period is also programmable (8 bits).
Frequency Generator
Some microcontrollers of this class has this frequency generator. This peripheral produces a waveform with a fixed duty cycle (50%) and a programmable frequency (ranging from 4 kHz to 1 MHz with a 16 MHz input clock).
Waveform Generator
A waveform generator simplifies the task of generating synchronized, pulse-width modulated (PWM) outputs. This waveform generator is optimized for motion control applications such as driving 3-phase AC induction motors, 3-phase DC brushless motors, or 4-phase stepping motors. The waveform generator can produce three independent pairs of complementary PWM outputs, which share a common carrier period, dead time, and operating mode. Once it is initialized, the waveform generator operates without CPU intervention unless you need to change a duty cycle.
Analog-to-digital Converter
The analog-to-digital (A/D) converter converts an analog input voltage to a digital equivalent. Resolution is either 8 or 10 bits; sample and convert times are programmable. Conversions can be performed on the analog ground and reference voltage, and the results can be used to calculate gain and zero-offset errors. The internal zero-offset compensation circuit enables automatic zero offset adjustment. The A/D also has a threshold-detection mode, which can be used to generate an interrupt when a programmable threshold voltage is crossed in either direction. The A/D scan mode of the PTS facilitates automated A/D conversions and result storage.
Watchdog Timer
The watchdog timer is a 16-bit internal timer that resets the microcontroller if the software fails to operate properly.
Special Operating Modes

In addition to the normal execution mode, the microcontroller operates in several special-purpose modes. Idle and power-down modes conserve power when the microcontroller is inactive. On circuit emulation (ONCE) mode electrically isolates the microcontroller from the system, and several other modes provide programming options for nonvolatile memory.
Reducing Power Consumption

In idle mode, the CPU stops executing instructions, but the peripheral clocks remain active. Power consumption drops to about 40% of normal execution mode consumption. Either a hardware reset or any enabled interrupt source will bring the microcontroller out of idle mode. In power-down mode, all internal clocks are frozen at logic state zero and the internal oscillator is shut off. The register file and most peripherals retain their data if VCC is maintained. Power consumption drops into the W range.
Testing the Printed Circuit Board

The on-circuit emulation (ONCE) mode electrically isolates the microcontroller from the system. By invoking the ONCE mode, you can test the printed circuit board while the microcontroller is soldered onto the board.
Programming the Nonvolatile Memory

The microcontrollers that have internal OTPROM provide several programming options: Slave programming allows a master EPROM programmer to program and verify one or more slave microcontrollers. Programming vendors and Intel distributors typically use this mode to program a large number of microcontrollers with a customers code and data. Auto programming allows an microcontroller to program itself with code and data located in an external memory device. Customers typically use this low-cost method to program a small number of microcontrollers after development and testing are complete. Run-time programming allows you to program individual nonvolatile memory locations during normal code execution, under complete software control. Customers typically use this mode to download a small amount of information to the microcontroller after the rest of the array has been programmed. For example, you might use run-time programming to download a unique identification number to a security device. ROM dump mode allows you to dump the contents of the microcontrollers nonvolatile memory to a tester or to a memory device (such as flash memory or RAM).
10.3 Conclusion
This lesson discussed about the architecture of a typical high performance microcontrollers. The next lesson shall discuss the signals of a typical microcontroller from the Intel MCS96 family.

1. What do you mean by the Microcode Engine? Ans: This is where the instructions which breaks down to smaller micro-instructions are executed. Microprogramming was one of the key breakthroughs that allowed system architects to implement complex instructions in hardware. To understand what microprogramming is, it helps to first consider the alternative: direct execution. With direct execution, the machine fetches an instruction from memory and feeds it into a hardwired control unit. This control unit takes the instruction as its input and activates some circuitry that carries out the task. For instance, if the machine fetches a floating-point ADD and feeds it to the control unit, theres a circuit somewhere in there that kicks in and directs the execution units to make sure that all of the shifting, adding, and normalization gets done. Direct execution is actually pretty much what youd expect to go on inside a computer if you didnt know about microcoding. The main advantage of direct execution is that its fast. Theres no extra abstraction or translation going on; the machine is just decoding and executing the instructions right in hardware. The problem with it is that it can take up quite a bit of space. Think about it. If every instruction has to have some circuitry that executes it, then the more instructions you have, the more space the control unit will take up. This problem is compounded if some of the instructions are big and complex, and take a lot of work to execute. So directly executing instructions for a CISC machine just wasnt feasible with the limited transistor resources of the day. With microprogramming, its almost like theres a mini-CPU on the CPU. The control unit is a microcode engine that executes microcode instructions. The CPU designer uses these microinstructions to write microprograms, which are stored in a special control memory. When a normal program instruction is fetched from memory and fed into the microcode engine, the microcode engine executes the proper microcode subroutine. This subroutine tells the various functional units what to do and how to do it. As you can probably guess, in the beginning microcode was a pretty slow way to do things. The ROM used for control memory was about 10 times faster than magnetic corebased main memory, so the microcode engine could stay far enough ahead to offer decent performance. As microcode technology evolved, however, it got faster and faster. (The microcode engines on current CPUs are about 95% as fast as direct execution) Since microcode technology was getting better and better, it made more and more sense to just move functionality from (slower and more expensive) software to (faster and cheaper) hardware. So ISA instruction counts grew, and program instruction counts shrank. As microprograms got bigger and bigger to accommodate the growing instructions sets, however, some serious problems started to emerge. To keep performance up, microcode had to be highly optimized with no inefficiencies, and it had to be extremely compact in order to keep memory costs down. And since microcode programs were so large now, it became Version 2 EE IIT, Kharagpur 13
much harder to test and debug the code. As a result, the microcode that shipped with machines was often buggy and had to be patched numerous times out in the field. It was the difficulties involved with using microcode for control that spurred Patterson and others began to question whether implementing all of these complex, elaborate instructions in microcode was really the best use of limited transistor resources. 2. What is the function of the Watch Dog Timer? Ans: A fail-safe mechanism that intervenes if a system stops functioning. A hardware timer that is periodically reset by software. If the software crashes or hangs, the watchdog timer will expire, and the entire system will be reset automatically. The Watch Dog Unit contains a Watch Dog Timer. A watchdog timer (WDT) is a device or electronic card that performs a specific operation after a certain period of time if something goes wrong with an electronic system and the system does not recover on its own. A common problem is for a machine or operating system to lock up if two parts or programs conflict, or, in an operating system, if memory management trouble occurs. In some cases, the system will eventually recover on its own, but this may take an unknown and perhaps extended length of time. A watchdog timer can be programmed to perform a warm boot (restarting the system) after a certain number of seconds during which a program or computer fails to respond following the most recent mouse click or keyboard action. The timer can also be used for other purposes, for example, to actuate the refresh (or reload) button in a Web browser if a Web site does not fully load after a certain length of time following the entry of a Uniform Resource Locator (URL). A WDT contains a digital counter that counts down to zero at a constant speed from a preset number. The counter speed is kept constant by a clock circuit. If the counter reaches zero before the computer recovers, a signal is sent to designated circuits to perform the desired action.
Module 2
Lesson 11
Embedded Processors - II
Signals of a Typical Microcontroller

In this lesson the student will learn the following The Overview of Signals of Intel MCS 96 family of Microcontrollers Introduction Typical Signals of a Microcontroller
Pre-requisite
Digital Electronics
11.1 Introduction
Microcontrollers are required to operate in the real world without much of interface circuitry. The input-output signals of such a processor are both analog and digital. The digital data transmission can be both parallel and serial. The voltage levels also could be different. The architecture of a basic microcontroller is shown in Fig. 11.1. It illustrates the various modules inside a microcontroller. Common processors will have Digital Input/Output, Timer and Serial Input/Output lines. Some of the microcontrollers also support multi-channel Analog to Digital Converter (ADC) as well as Digital to Analog Converter (DAC) units. Thus analog signal input and output pins are also present in typical microcontroller units. For external memory and I/O chips the address as well as data lines are also supported. RAM area Timer 16-Bit 8 ROM area Port B 5 Serial Port Tx Rx Port A ADC 8
CPU
Port C 8
Fig. 11.1 A basic Microcontroller and its signals
Port 11
Port 10
EPORT
Port 12
Watchdog Timer
Stack Overflow Module
A/D Converter
Pulse-width Modulators
SSI00 SSI01
Peripheral Addr Bus (10)
Peripheral Addr Bus (16)
SIO0
Baud-rate Generator
Memory Addr Bus (24)
Bus Control Memory Data Bus (16) AZO:15 Bus Controller
Chip-select Unit
Port 2
AD15:0
Peripheral Interrupt Handler Peripheral Transaction Server Interrupt Controller
SIO1
Baud-rate Generator
Bus-Control Interface Unit Queue
Ports 7.8
17 Capture/ Compares EPA 4 Times 8 Output/ Simulcaptures
Microcode Engine
Source (16) Port 9
ALU
Register RAM 1 Kbyte
Memory Interface Unit
Destination (16)
Code/Data RAM 3 Kbytes
Serial Debug Unit
Fig. 11.2 The architecture of an MCS96 processor
11.2 The Signals of Intel Mcs 96

The various units of an MCS96 processor are shown in Fig. 11.2. The signals of such a processor can be divided into the following groups Version 2 EE IIT, Kharagpur 4
Address/Data Lines Bus Control Signals Signals related to Interrupt Signals related to Timers/Event Manager Digital Input/Output Ports Analog Input/Output Ports
Fig. 11.3 Signals of MCS96
Address and Data Pins

A15:0 System Address Bus. These are output pins and provide address bits 015 during the entire external memory cycle. A20:16 Address Pins 1620. These are output pins used during external memory cycle. These are multiplexed with EPORT.4:0. This is a part of the 8-bit extended addressing port. It is used to Version 2 EE IIT, Kharagpur 5
support extended addressing. The EPORT is an 8-bit port which can operate either as a generalpurpose I/O signal (I/O mode) or as a special-function signal (special-function mode). AD15:0 Address/Data Lines These lines serve as input as well as output pins. The function of these pins depends on the bus width and mode. When a bus access is not occurring, these pins revert to their I/O port function. AD15:0 drive address bits 015 during the first half of the bus cycle and drive or receive data during the second half of the bus cycle.
Bus Control and Status Signals

ALE Address Latch Enable: This is an output signal and is active-high output. It is asserted only during external memory cycles. ALE signals the start of an external bus cycle and indicates that valid address information is available on the system address/data bus (A20:16 and AD15:0 for a multiplexed bus; A20:0 for a demultiplexed bus). An external latch can use this signal to demultiplex address bits 015 from the address/data bus in multiplexed mode. BHE: Byte High Enable- During 16-bit bus cycles, this active-low output signal is asserted for word and high-byte reads and writes to external memory. BHE# indicates that valid data is being transferred over the upper half of the system data bus. WRH Write High. This is an output signal During 16-bit data transfers from the cpu to external devices, this active-low output signal is asserted for high-byte writes and word writes to external memory. BREQ: Bus Request .This is an output signal. This active-low output signal is asserted during a hold cycle when the bus controller has a pending external memory cycle. CS2:0 Chip-select Lines 02: Output Signal. The active-low output is asserted during an external memory cycle when the address to be accessed is in the range as programmed. HOLD: Input Signal: Hold Request An external device uses this active-low input signal to request control of the bus. HLDA: Output Signal: Bus Hold Acknowledge This active-low output indicates that the CPU has released the bus as the result of an external device asserting HOLD. INST Output signal: When high, INST indicates that an instruction is being fetched from external memory. The signal remains high during the entire bus cycle of an external instruction fetch. RD: Read Signal: Output: It is asserted only during external memory reads. READY: Ready Input: This active-high input can be used to insert wait states in addition to those programmed in the chip configuration. WR: Write: Output Signal: This active-low output indicates that an external write is occurring. This signal is asserted only during external memory writes.
WRH Write High: Output Signal: During 16-bit bus cycles, this active-low output signal is asserted for high-byte writes and word writes to external memory. WRL Write Low: Output Signal: During 16-bit bus cycles, this active-low output signal is asserted for low-byte writes and word writes to external memory.
Processor Control Signals

CLKOUT: Clock Out: It is the output of the internal clock generator. This signal can be programmed to have different frequencies and can be used by the external devices for synchronization etc. EA: External Access: Input Signal: This input determines whether memory accesses to the upper 7 Kbytes of ROM (FF2400FF3FFFH) are directed to internal or external memory. These accesses are directed to internal memory if EA# is held high and to external memory if EA# is held low. For an access to any other memory location, the value of EA# is irrelevant. EXTINT: External Interrupt Input: In normal operating mode, a rising edge on EXTINT sets the EXTINT interrupt pending bit. EXTINT is sampled during phase 2 (CLKOUT high). The minimum high time is one state time. If the EXTINT interrupt is enabled, the CPU executes the interrupt service routine. NMI: Nonmaskable Interrupt Input: In normal operating mode, a rising edge on NMI generates a nonmaskable interrupt. NMI has the highest priority of all prioritized interrupts. ONCE: Input: On-circuit emulation (ONCE) mode electrically isolates the microcontroller from the system. By invoking the ONCE mode, you can test the printed circuit board while the microcontroller is soldered onto the board. PLLEN: Input Signal: Phase-locked Loop Enable This active-high input pin enables the on-chip clock multiplier. The PLLEN pin must be held low along with the ONCE# pin to enter on-circuit emulation (ONCE) mode. RESET: I/O Reset: A level-sensitive reset input to, and an open-drain system reset output from, the microcontroller. Either a falling edge on or an internal reset turns on a pull-down transistor connected to the RESET for 16 state times. In the power down and idle modes, asserting RESET causes the microcontroller to reset and return to normal operating mode. RPD: Return-From-Power-Down Input Signal: Return from Power down Timing pin for the return-from-power down circuit. TMODE: Test-Mode Entry Input: If this pin is held low during reset, the microcontroller will enter a test mode. The value of several other pins defines the actual test mode. XTAL1 I Input Crystal/Resonator or External Clock Input: Input to the on-chip oscillator and the internal clock generators. The internal clock generators provide the peripheral clocks, CPU clock, and CLKOUT signal. When using an external clock source instead of the on-chip oscillator, connect the clock input to XTAL1. Version 2 EE IIT, Kharagpur 7
XTAL2: Output: Inverted Output for the Crystal/Resonator Output of the on-chip oscillator inverter. Leave XTAL2 floating when the design uses an external clock source instead of the onchip oscillator.
Parallel Digital Input/Output Ports

P2.7:0 I/O Port 2: This is a standard, 8-bit, bidirectional port that shares package pins with individually selectable special-function signals. P2.6 is multiplexed with the ONCE function. P3.7:0 I/O Port 3: This is a memory-mapped, 8-bit, bidirectional port with programmable open drain or complementary output modes. P4.7:0 I/O Port 4 This is a memory-mapped, 8-bit, bidirectional port with programmable open drain or complementary output modes. P5.7:0 I/O Port 5 This is a memory-mapped, 8-bit, bidirectional port. P7.7:0 I/O Port 7 This is a standard, 8-bit, bidirectional port that shares package pins with individually selectable special-function signals. P8.7:0 I/O Port 8: This is a standard, 8-bit, bidirectional port. P9.7:0 I/O Port 9: This is a standard, 8-bit, bidirectional port. P10.5:0 I/O Port 10: This is a standard, 6-bit, bidirectional port that is multiplexed with individually selectable special-function signals. P11.7:0 I/O Port 11: This is a standard, 8-bit, bidirectional port that is multiplexed with individually selectable special-function signals. P12.4:0 I/O Port 12: This is a memory-mapped, 5-bit, bidirectional port. P12.2:0 select the TROM Most of the above ports are shared with other important signals discussed here. For instance Port 3 pins P3.7:0 share package pins with AD7:0. That means by writing a specific word to the configuration register the pins can change their function.
Serial Digital Input/Output Ports

TXD1:0 Output Signal: Transmit Serial Data 0 and 1. It can be programmed in different modes by writing specific words to the internal configuration registers. RXD1:0 Input: Receive Serial Data 0 and 1 in different preprogrammed modes.
Analog Inputs
ACH15:0: Input Analog Channels: These signals are analog inputs to the A/D converter. The ANGND and VREF pins are also used for the standard A/D converter to function. Other important signals of a typical microcontroller include Power Supply and Ground pins at multiple points Signals from the internal programmable Timer Debug Pins The reader is requested to follow the link www.intel.com/design/mcs96/manuals/272804.htm or www.intel.com/design/mcs96/manuals/27280403.pdf for more details.
Some Specifications of the Processor

Frequency of Operation: 40 MHz 2 Mbytes of linear address space 1 Kbyte of register RAM 3 Kbytes of code RAM 8 Kbytes of ROM 2 peripheral interrupt handlers (PIH) 6 peripheral interrupts 83 I/O port pins 2 full-duplex serial ports with baud-rate generators Synchronous serial unit 8 pulse-width modulator (PWM) outputs with 8-bit resolution 16-bit watchdog timer Sixteen 10-bit A/D channels Programmable clock output signal
11.3 Conclusions
This chapter discussed the important signals of a typical microcontroller. The detailed electrical and timing specifications are available in the respective manuals.
11.4 Questions
1. Which ports of the 80C196EA can generate PWM pulses? What is the voltage level of such pulses? Ans:
2. Why the power supply is given to multiple points on a chip? Ans: The multiple power supply points ensure the following The voltages at devices (transistors and cells) are better than a set target under a specified set of varying load conditions in the design. This is to ensure correct operation of circuits at the expected level of performance. the current supplied by a pad, pin, or voltage regulator is within a specified limit under any of the specified loading conditions. This is required: a) for not exceeding the design capacity of regulators and pads; and b) to distribute currents more uniformly among the pads, so that the L di/dt voltage variations due to parasitic inductance in the packages substrate, ball-grid array, and bond wires are minimized.
Module 2
Lesson 12
Memory-Interfacing
After going through this lesson the student would learn Requirement of External Memory Different modes of a typical Embedded Controller Standard Control Signals for Memory Interface A typical Example
Pre-Requisite
12.1 Introduction
A Single Chip Microcontroller
RAM area 8 Timer 16bit CPU Port ADC A 8
ROM area Port B 5
Serial Port
Tx Rx
Port C 8
CPU: The processing module of the microcontroller Fig. 12.1 The basic architecture of a Microcontroller Fig. 12.1 shows the internal architecture of single chip microcontroller with internal RAM as well as ROM. Most of these microcontrollers do not require external memory for simpler tasks. The program lengths being small can easily fit into the internal memory. Therefore it often provides single chip solutions. However the amount of internal memory cannot be increased beyond a certain limit because of the following reasons. Power Consumption Size
The presence of extra memory needs more power consumption and hence higher temperature rise. The size has to be increased to house the additional memory. The need for extra memory Version 2 EE IIT, Kharagpur 3
space arises in some specific applications. Fig. 12.2 shows the basic block diagram of memory interface to a processor.
Data Lines
CPU
Address Lines
Memory
Control Lines
Fig. 12.2 The Memory Interface
12.2 External Memory Interfacing to PIC18F8XXX family of microcontrollers

PIC18F8XXX Data EMI Bus Interface Logic Address, Control Fig. 12.3 External Memory Interface Diagram The above family of microcontroller can have both on-chip as well as off chip external memory. At times the on-chip memory is a programmable flash type. A special register inside the microcontroller can be programmed (by writing an 8 bit or 16-bit binary number) for using this external memory in various modes. In case of the PIC family the following modes are possible Memory
Microcontroller Mode
The processor accesses only on-chip FLASH memory. External Memory Interface functions are disabled. Attempts to read above the physical limit of the on-chip FLASH causes a read of all 0s (a NOP instruction).
Microprocessor Mode
The processor permits execution and access only through external program memory; the contents of the on-chip FLASH memory are ignored.
Microprocessor with Boot Block mode

The processor accesses on-chip FLASH memory within only the boot block. The boot block size is device dependent and is located at the beginning of program memory. Beyond the boot block, external program memory is accessed all the way up to the 2-MByte limit. Program execution automatically switches between the two memories as required.
Extended Microcontroller Mode

The processor accesses both internal and external program memories as a single block. The device can access its entire on-chip FLASH memory; above this, the device accesses external program memory up to the 2-MByte program space limit. As with Boot Block mode, execution automatically switches between the two memories as required.
Microprocessor with Boot Block Mode (MPBB) 000000h On-Chip Boot Boundary
External Program Memory External Program Memory Program Memory No access
Microprocessor Mode (MP) 000000h On-Chip

Program Memory (No access)
Microcontroller Mode (MC) 000000h

On-Chip Program Memory
Extended Microcontroller Mode (EMC) 000000h

On-Chip Program Memory
Program Space Execution
Boundary Boundary+1
Boundary Boundary+1
Reads 0s
External Program Memory
1FFFFFh On-Chip FLASH External Memory
1FFFFFh On-Chip External FLASH Memory
1FFFFFh On-Chip FLASH
1FFFFFh On-Chip External FLASH Memory
Fig. 12.4 The memory Map in different modes
A<16,17> AD<0, 15:10>
VDD VSS AD<7:1> ALE
A<18,19> AD<9,8> PIC18F8XXX
OE
WRL WRH
UB
LB
BA0 A16-A19: The 4 most significant bits of the address BA0: Byte Address 0
Fig. 12.5 The address, data and control lines of the PIC18F8XXX microcontroller required for external memory interfacing The address, data and control lines of a PIC family of microcontroller is shown in Fig. 12.5 and are explained below. AD0-AD15: 16-bit Data and 16 bits of Address are multiplexed ALE: Address Latch Enable Signal to latch the multiplexed address in the first clock cycle
WRL Write Low Control Pin to make the memory write the lower byte of the data when it is low WRH Write High Control Pin to make the memory write the higher byte of the data when it is low OE Output Enable is made low when valid data is made available to the external memory CE Chip enable line is made low to access the external memory chip
LB Lower Byte Enable Control is kept low when the lower byte is available for the memory.
UB Upper Byte Enable Control is kept low when the upper byte is available for the memory.
CE Version 2 EE IIT, Kharagpur 6
The microcontroller has a 16-bit wide bus for data transfer. These data lines are shared with address lines and are labeled AD<15:0>. Because of this, 16 bits of latching are necessary to demultiplex the address and data. There are four additional address lines labeled A<19:16>. The PIC18 architecture provides an internal program counter of 21 bits, offering a capability of 2 Mbytes of addressing. There are seven control lines that are used in the External Memory Interface: ALE, WRL , WRH , OE , CE , LB , UB . All of these lines except OE may be used during data writes. All of these lines except WRL and WRH may be used during fetches and reads. The application will determine which control lines are necessary. The basic connection diagram is shown in Fig. 12.6. The 16-bit byte select mode is shown here.
D15:DO PIC18F8XXX LATCH AD<15:0> ALE CE A<19:16> OE WRH WRL BA0 UB LB Address Bus Data Bus Control Lines Ax:A0 MEMORY Ax:A0 D15:DO CE OE WR(1)
Fig. 12.6 The connection diagram for external memory interface in 16-bit byte select mode The PIC18 family runs from a clock that is four times faster than its instruction cycle. The four clock pulses are a quarter of the instruction cycle in length and are referred to as Q1, Q2, Q3, and Q4. During Q1, ALE is enabled while address information A<15:0> are placed on pins AD<15:0>. At the same time, the upper address information A<19:16> are available on the upper address bus. On the negative edge of ALE, the address is latched in the external latch. At the beginning of Q3, the OE output enable (active low) signal is generated. Also, at the beginning of Q3, BA0 is generated. This signal will be active high only during Q3, indicating the state of the program counter Least Significant bit. At the end of Q4, OE goes high and data (16bit word) is fetched from memory at the low-to-high transition edge of OE . The timing diagram for all signals during external memory code execution and table reads is shown in Fig. 12.7.
Q1
Q2
Q3
Q4
Q1
Q2
Q3
Q4
A<19:16> AD<15:0> BA0 ALE OE WRH WRL CE UB LB 1 1 0 0 0 3AABh
00h 0E55h CF33h
0Ch 9256h
Fig. 12.7 Timing Diagram for Memory Read
12.3 Conclusion
This lesson discussed a typical external memory interface example for PIC family of microcontrollers. A typical timing diagram for memory read operation is presented.
12.4 Questions
Q1.Draw the read timing diagram for a typical memory operation Ans: Refer to text Q2. Draw the read timing diagram for a typical memory operation
16-bit Write Operation in MCS96 family refer Lesson10 and 11
Module 3
Embedded Systems I/O
Lesson 13
Interfacing bus, Protocols, ISA bus etc.
After going through this lesson the student would learn Bus, Wires and Ports Basic Protocols of data transfer Bus arbitration ISA bus signals and handshaking Memory mapped I/O and simple I/O Parallel I/O and Port Based I/O Example of interfacing memory to the ports of 8051
Pre-Requisite
13.1 Introduction
The traditional definition of input-output is the devices those create a medium of interaction with the human users. They fall into the following categories such as: 1. Printers 2. Visual Display Units 3. Keyboard 4. Cameras 5. Plotters 6. Scanners However in Real-Time embedded systems the definition of I/O devices is very different. An embedded controller needs to communicate with a wide range of devices namely 1. Analog to Digital (A-D) and Digital to Analog (D-A) Converters 2. CODECs 3. Small Screen Displays such as TFT, LCD etc 4. Antennas 5. Cameras 6. Microphones 7. Touch Screens Etc. A typical Embedded system is a Digital Camera as shown in Fig. 13.1. As it can be seen it possesses broad range of input-output devices such as Lens, Microphone, speakers, Serial interface standards, TFT screens etc.
Battery and USB Voltage Monitoring Speed light Status LCD
Zoom Lens Position Measurement Remote Ir Rx TV Monitor
Buttons
Motors
Motors Drivers V/H Timing Generator TI AFE
MCU Video Op Amps 1.6in/1.8in TFT Panel TFT Controller
Lens
CCD Module
RS232c
Audio Codec Module Audio Power Amplifier Reset
TI Digital Media Processor
USB 1394
32164-MB SDRAM
-MB Flash Memory
Removable Storage
1.5-V/1.8-/2.5V Core Supply
3.3-V/5-V System Supply
7.5V/12V/15V LCD/CCD Supply
Supply Voltage Supervisor
Low Dropout Regulator
Buck Converter
Buck Boost Converter
Boost Converter
Charge Pump
Inverter
Power Management LI-Ion Protector Battery Monitor Li+NiMH Battery Management Alkaline Battery Charger Wall Supply USB Power
Fig. 13.1 Version 2 EE IIT, Kharagpur 4
The functionality of an Embedded System can be broadly classified as Processing Transformation of data Implemented using processors Storage Retention of data Implemented using memory And Communication (also called Interfacing) Transfer of data between processors and memories Implemented using buses
Interfacing
Interfacing is a way to communicate and transfer information in either way without ending into deadlocks. In our context it is a way of effective communication in real time. This involves Addressing Arbitration Protocols
Master
Slave Control Lines Address Lines Data Lines Fig. 13.2(a) The Bus structure
Addressing: The data sent by the master over a specified set of lines which enables just the device for which it is meant Protocols: The literal meaning of protocol is a set of rules. Here it is a set of formal rules describing how to transfer data, especially between two devices. A simple example is memory read and write protocol. The set of rules or the protocol is For read (Fig. 13.2 (b)) The CPU must send the memory address The read line must be enabled The processor must wait till the memory is ready Then accept the bits in the data lines
rd'/wr enable addr data tsetup tread read protocol Fig. 13.2(b) For write (Fig. 13.2(c)) The CPU must send the memory address The write line must be enabled The processor sends the data over the data lines The processor must wait till the memory is ready
rd'/wr enable addr data tsetup twrite
write protocol Fig. 13.2(c) Arbitration: When the same set of address/data/control lines are shared by different units then the bus arbitration logic comes into play. Access to a bus is arbitrated by a bus master. Each node on a bus has a bus master which requests access to the bus, called a bus request, when then node requires to use the bus. This is a global request sent to all nodes on the bus. The node that currently has access to the bus responds with either a bus grant or a bus busy signal, which is also globally known to all bus masters. (Fig. 13.3)
CPU
Memory 1
Memory 2
I/O Device 1
I/O Device 2
DMA
Fig. 13.3 The bus arbitration of the DMA, known as direct memory access controller which is responsible for transferring data between an I/O device and memory without involving the CPU. It starts with a bus request to the CPU and after it is granted it takes over the address/data and control bus to initiate the data transfer. After the data transfer is complete it passes the control over to the CPU. Before learning more details about each of these concepts a concrete definition of the following terms is necessary. Wire: It is just a passive physical connection with least resistance Bus: A group of signals (such as data, address etc). It may be augmented with buffers latches etc. A bus has standard specification such as number of bits, the clock speed etc. Port: It is the set of physical wires available so that any device which meets the specified standard can be directly plugged in. Example is the serial, parallel and USB port of the PC. Time multiplexing: This is to Share a single set of wires for multiple pieces of data. It saves wires at expense of time
Time-multiplexed data transfer

Master data(15.0) mux data(8) req Servant data(15.0) demux Master addr data mux addr/data req Servant addr data
demux
req data 15:8 7:0
req addr/data addr data
Data serializing
Address/data muxing
Fig. 13.4 The Time multiplexing data transfer. The left hand side transmits 16-bits of data in an 8-bit line MSB after the LSB. The transfer is synchronized with the req signal. In the example shown on the right hand side the same set of wires carry address followed by data in synchronism with the req signal. mux: stands for multiplexer
The Handshaking Protocol Strobe Protocol

Master req Servant
data
req data
1 2
3 4
Fig. 13.5(a) Strobe Protocol Version 2 EE IIT, Kharagpur 8
1. 2. 3. 4.
Master asserts req to receive data Servant puts data on bus within time taccess Master receives data and deasserts req Servant ready for next request
Handshake Protocol
Master req ack Servant
data
req ack data
1 2
3 4
Fig. 13.5(b) Handshake Protocol 1. 2. 3. 4. Master asserts req to receive data Servant puts data on bus and asserts ack Master receives data and deasserts req Servant ready for next request
The Strobe & Handshake combined

Master req wait data Servant
req wait data
req wait
1 2 3
2 taccess
data taccess
1. Master asserts req to receive data 1. Master asserts req to receive data 2. Servant puts data on bus within time taccess 2. Servant cant put data within taccess, asserts wait ack (wait line is unused) 3. Servant puts data on bus and deasserts wait 3. Master receives data and deasserts req 4. Master receives data and deasserts req 4. Servant ready for next request 5. Servant ready for next request Fast-response case Slow-response case Fig. 13.5(c) Strobe and Handshake Combined
Handshaking Example in ISA Bus

The Industry Standard Architecture (ISA Bus) has been described as below This is a standard bus architecture developed to help the various designers to customize their product and the interfaces. The pin configuration and the signals are discussed below.
Fig. 13.6 The ISA bus
ISA Signal Descriptions SA19 to SA0 (SA for System Address)

System Address bits 19:0 are used to address memory and I/O devices within the system. These signals may be used along with LA23 to LA17 to address up to 16 megabytes of memory. Only the lower 16 bits are used during I/O operations to address up to 64K I/O locations. SA19 is the most significant bit. SA0 is the least significant bit. These signals are gated on the system bus when BALE is high and are latched on the falling edge of BALE. They remain valid throughout a read or write command. These signals are normally driven by the system microprocessor or DMA controller, but may also be driven by a bus master on an ISA board that takes ownership of the bus.
LA23 to LA17
Unlatched Address bits 23:17 are used to address memory within the system. They are used along with SA19 to SA0 to address up to 16 megabytes of memory. These signals are valid when BALE is high. They are "unlatched" and do not stay valid for the entire bus cycle. Decodes of these signals should be latched on the falling edge of BALE.
AEN
Address Enable is used to degate the system microprocessor and other devices from the bus during DMA transfers. When this signal is active the system DMA controller has control of the Version 2 EE IIT, Kharagpur 11
address, data, and read/write signals. This signal should be included as part of ISA board select decodes to prevent incorrect board selects during DMA cycles.
BALE
Buffered Address Latch Enable is used to latch the LA23 to LA17 signals or decodes of these signals. Addresses are latched on the falling edge of BALE. It is forced high during DMA cycles. When used with AEN, it indicates a valid microprocessor or DMA address.
CLK
System Clock is a free running clock typically in the 8MHz to 10MHz range, although its exact frequency is not guaranteed. It is used in some ISA board applications to allow synchronization with the system microprocessor.
SD15 to SD0
System Data serves as the data bus bits for devices on the ISA bus. SD15 is the most significant bit. SD0 is the least significant bits. SD7 to SD0 are used for transfer of data with 8-bit devices. SD15 to SD0 are used for transfer of data with 16-bit devices. 16-bit devices transferring data with 8-bit devices shall convert the transfer into two 8-bit cycles using SD7 to SD0.
DACK0 to DACK3 and DACK5 to DACK7

DMA Acknowledge 0 to 3 and 5 to 7 are used to acknowledge DMA requests on DRQ0 to DRQ3 and DRQ5 to DRQ7.
DRQ0 to DRQ3 and DRQ5 to DRQ7

DMA Requests are used by ISA boards to request service from the system DMA controller or to request ownership of the bus as a bus master device. These signals may be asserted asynchronously. The requesting device must hold the request signal active until the system board asserts the corresponding DACK signal.
I/O CH CK
I/O Channel Check signal may be activated by ISA boards to request than an non-maskable interrupt (NMI) be generated to the system microprocessor. It is driven active to indicate a uncorrectable error has been detected.
I/O CH RDY
I/O Channel Ready allow slower ISA boards to lengthen I/O or memory cycles by inserting wait states. This signals normal state is active high (ready). ISA boards drive the signal inactive low (not ready) to insert wait states. Devices using this signal to insert wait states should drive it low
immediately after detecting a valid address decode and an active read or write command. The signal is release high when the device is ready to complete the cycle.
IOR
I/O Read is driven by the owner of the bus and instructs the selected I/O device to drive read data onto the data bus.
IOW
I/O Write is driven by the owner of the bus and instructs the selected I/O device to capture the write data on the data bus.
IRQ3 to IRQ7 and IRQ9 to IRQ12 and IRQ14 to IRQ15

Interrupt Requests are used to signal the system microprocessor that an ISA board requires attention. An interrupt request is generated when an IRQ line is raised from low to high. The line must be held high until the microprocessor acknowledges the request through its interrupt service routine. These signals are prioritized with IRQ9 to IRQ12 and IRQ14 to IRQ15 having the highest priority (IRQ9 is the highest) and IRQ3 to IRQ 7 have the lowest priority (IRQ7 is the lowest).
SMEMR
System Memory Read instructs a selected memory device to drive data onto the data bus. It is active only when the memory decode is within the low 1 megabyte of memory space. SMEMR is derived from MEMR and a decode of the low 1 megabyte of memory.
SMEMW
System Memory Write instructs a selected memory device to store the data currently on the data bus. It is active only when the memory decode is within the low 1 megabyte of memory space. SMEMW is derived from MEMW and a decode of the low 1 megabyte of memory.
MEMR
Memory Read instructs a selected memory device to drive data onto the data bus. It is active on all memory read cycles.
MEMW
Memory Write instructs a selected memory device to store the data currently on the data bus. It is active on all memory write cycles.
REFRESH
Memory Refresh is driven low to indicate a memory refresh operation is in progress.
OSC
Oscillator is a clock with a 70ns period (14.31818 MHz). This signal is not synchronous with the system clock (CLK).
RESET DRV
Reset Drive is driven high to reset or initialize system logic upon power up or subsequent system reset.
TC
Terminal Count provides a pulse to signal a terminal count has been reached on a DMA channel operation.
MASTER
Master is used by an ISA board along with a DRQ line to gain ownership of the ISA bus. Upon receiving a -DACK a device can pull -MASTER low which will allow it to control the system address, data, and control lines. After MASTER is low, the device should wait one CLK period before driving the address and data lines, and two clock periods before issuing a read or write command.
MEM CS16
Memory Chip Select 16 is driven low by a memory slave device to indicate it is capable of performing a 16-bit memory data transfer. This signal is driven from a decode of the LA23 to LA17 address lines.
I/O CS16
I/O Chip Select 16 is driven low by a I/O slave device to indicate it is capable of performing a 16-bit I/O data transfer. This signal is driven from a decode of the SA15 to SA0 address lines.
0WS
Zero Wait State is driven low by a bus slave device to indicate it is capable of performing a bus cycle without inserting any additional wait states. To perform a 16-bit memory cycle without wait states, -0WS is derived from an address decode.
SBHE
System Byte High Enable is driven low to indicate a transfer of data on the high half of the data bus (D15 to D8).
The Memory Read bus cycle in ISA bus

CYCLE C1 CLOCK D[7-0] A[19-0] ALE /MEMR CHRDY Fig. 13.7(a) The Handshaking Mode of Data Transfer in ISA bus ADDRESS DATA C2 WAIT C3 C4
The Memory Write bus cycle in ISA bus

CYCLE C1 CLOCK D[7-0] A[19-0] ALE /MEMW CHRDY Fig. 13.7(b) The Handshaking Mode of Data Transfer in ISA bus DATA ADDRESS C2 WAIT C3 C4
13.2 I/O addressing

A microprocessor communicates with other devices using some of its pins. Broadly we can classify them as Version 2 EE IIT, Kharagpur 15
Port-based I/O (parallel I/O) Processor has one or more N-bit ports Processors software reads and writes a port just like a register Bus-based I/O Processor has address, data and control ports that form a single bus Communication protocol is built into the processor A single instruction carries out the read or write protocol on the bus Parallel I/O peripheral When processor only supports bus-based I/O but parallel I/O needed Each port on peripheral connected to a register within peripheral that is read/written by the processor Processor Memory System bus Processor Port 0 Port 1 Port 2 Port 3 Parallel I/O peripheral
Parallel I/O peripheral
Port A Port B Port C Adding parallel I/O to a busbased I/O processor
Port A Port B Port C Extended parallel I/O
Fig. 13.8 Parallel I/O and extended Parallel I/O Extended parallel I/O When processor supports port-based I/O but more ports needed One or more processor ports interface with parallel I/O peripheral extending total number of ports available for I/O e.g., extending 4 ports to 6 ports in figure Types of bus-based I/O: Memory-mapped I/O and standard I/O Processor talks to both memory and peripherals using same bus two ways to talk to peripherals Memory-mapped I/O Peripheral registers occupy addresses in same address space as memory e.g., Bus has 16-bit address lower 32K addresses may correspond to memory upper 32k addresses may correspond to peripherals Standard I/O (I/O-mapped I/O) Additional pin (M/IO) on bus indicates whether a memory or peripheral access e.g., Bus has 16-bit address all 64K addresses correspond to memory when M/IO set to 0 Version 2 EE IIT, Kharagpur 16
all 64K addresses correspond to peripherals when M/IO set to 1 Memory-mapped I/O vs. Standard I/O Memory-mapped I/O Requires no special instructions Assembly instructions involving memory like MOV and ADD work with peripherals as well Standard I/O requires special instructions (e.g., IN, OUT) to move data between peripheral registers and memory Standard I/O No loss of memory addresses to peripherals Simpler address decoding logic in peripherals possible When number of peripherals much smaller than address space then high-order address bits can be ignored smaller and/or faster comparators A basic memory protocol
Interfacing an 8051 to external memory

8051 has three 8-bit ports through which it can communicate with the outside world. Ports P0 and P2 support port-based I/O when 8051 internal memory being used Those ports serve as data/address buses when external memory is being used 16-bit address and 8-bit data are time multiplexed; low 8-bits of address must therefore be latched with aid of ALE (address latch enable) signal D<07> A<015> /OE /WE CS2 /CS1 HM6264 /CS D<07> A<014> /OE 27C256
P0 ALE 8 P2 /WR /RD /PSEN 8051
D /CS G
74373
Fig. 13.9(a) A basic memory interface
Clock P0 P2 Q ALE /RD Adr. 7..0 Adr. 158 Adr. 70 Data
Fig. 13.9(b) The timing diagram The timing of the various signals is shown in Fig. 13.9(b). The lower byte of the address is placed along P0 and the address latch enable signal is enabled. The higher byte of the address is placed along P2. The ALE signal enables the 74373 chip to latch the address as the P0 bus will be used for data. The P0 bus goes into tri-state (high impedance state) and switches internally for data path. The RD (read) line is enabled. The bar over the read line indicates that it is active when low. The data is received from the memory on the P0 bus. A memory write cycle can be explained similarly.
13.3 Conclusion
In this lesson you learnt about the basics of Input Output interfacing. In the previous chapter you also studied about some input output concepts. But most of those I/O such as Timer, Watch Dog circuits, PWM generator, Serial and Parallel ports were part of the microcontroller. In this lesson the basics of interfacing with external devices have been discussed. The difference between a Bus and a Port should be kept in mind. The ISA bus is discussed to give an idea about the various bus architectures which will discussed in the later part of this course. You must browse various websites as listed below for further knowledge. http://esd.cs.ucr.edu/slide_index.html http://esd.cs.ucr.edu/wres.html www.techfest.com/hardware/bus/isa.htm You should be able to be in a position to learn any microcontroller and their interfacing protocols.
13.4 Questions
1. List at least 4 differences between the I/O devices for a Real Time Embedded System (RTES) and a Desktop PC? Version 2 EE IIT, Kharagpur 18
RTES I/O It has to operate in real time. The timing requirement has to met. The I/O devices need not be meant for the human user and may consists of analog interfaces, digital controllers, mixed signal circuits.
PC I/O May take little longer and need not satisfy the stringent timing requirement of the user The I/O for desktop encompasses a broad range. Generally the keypad, monitor, mouse etc which are meant for the human users are termed as I/O. But it could have also the similar I/Os as in case of RTES The power consumption of these I/O There is virtually no strict limit to the devices should be limited. power in such I/Os The size of the I/O devices should be small Generally the size is not a problem as it is to make it coexist with the processor and not meant to be portable other devices 2. Draw the timing diagram of a memory read protocol for slower memory. What additional handshaking signals are necessary? Ans: An additional handshaking signal from the memory namely /ready is necessary. The microcontroller inserts wait states as long as the /ready line is not inactive. The ready line in this case is sampled at the rising edge of the third clock phase. Fig.Q2 reveals the timing of such an operation. T1 Clock T2 Twait T4 T5
Address
/RD
/Ready
Data
Fig. Q2 The Timing Diagram of memory read from a slower 3. Enlist the handshaking signals in the ISA bus for dealing with slower I/O devices. Version 2 EE IIT, Kharagpur 19
Ans: I/O CH RDY I/O Channel Ready allow slower ISA boards to lengthen I/O or memory cycles by inserting wait states. This signals normal state is active high (ready). ISA boards drive the signal inactive low (not ready) to insert wait states. Devices using this signal to insert wait states should drive it low immediately after detecting a valid address decode and an active read or write command. The signal is release high when the device is ready to complete the cycle. 4. What additional handshaking signals are necessary for bidirectional data transfer over the same set data lines. Ans: For an 8-bit data transfer we need at least 4 additional lines for hand shaking. As shown in Fig.Q4 there are two ports shown. Port A acts as the 8-bit bidirectional data bus. Port C carries the handshaking signals. Write operation: When the data is ready the /OBFA (PC7 output buffer full acknowledge active low) signal is made 0. The device which is connected acknowledges through /ACKA( PC6 acknowledge that it is ready to accept data. It is active low). The data transfer takes place over PA0-PA7. Read operation: When the data is ready the external device makes the /STBA (PC4 Strobe acknowledge active low) line low. The acknowledgement is sent through IBFA (Input Buffer Empty Acknowledge that it is ready to accept data. It is active high). The data transfer takes place.
PA7-PA0 PC7 PC6 PC4 PC5
OBFA ACKA STBA

IBFA
Fig. Q4 The master 5. List the various bus standards used in industry. Ans:
ISA Bus
The Industry Standard Architecture (ISA) bus is an open, 8-bit (PC and XT) or 16-bit (AT) asymmetrical I/O channel with numerous compatible hardware implementations. Version 2 EE IIT, Kharagpur 20
EISA Bus
The Extended Industry Standard Architecture (EISA) bus is an open, 32-bit, asymmetrical I/O channel with numerous compatible hardware implementations. The system bus and allows data transfer rates at a bandwidth of up to 33 MB per second, supports a 4 GB address space, 8 DMA channels, and is backward compatible with the Industry Standard Architecture (ISA) bus.
PCI Bus
The Peripheral Component Interconnect Local Bus (PCI) is an open, high-performance 32-bit or 64-bit synchronous bus with multiplexed address and data lines, and numerous compatible hardware implementations. PCI bus support a PCI frequency of 33 MHz and a transfer rate of 132 MB per second.
Futurebus+
Futurebus+ is an open bus, designed by the IEEE 896 committee, whose architecture and interfaces are publicly documented, and that is independent of any underlying architecture. It has broad-base, cross-industry support; very high throughput (the maximum rate for 64-bit bandwidth is 160 MB per second; for the 128-bit bandwidth, 180 MB per second). Futurebus+ supports a 64-bit address space and a set of control and status registers (CSRs) that provides all the necessary ability to enable or disable features; thus supporting multivendor interoperablity.
SCSI Bus
The Small Computer Systems Interface (SCSI) bus is an ANSI standard for the interconnection of computers with each other and with disks, floppies, tapes, printers, optical disks, and scanners. The SCSI standard includes all the mechanical, electrical, and Data transfer rates are individually negotiated with each device attached to a given SCSI bus. For example, a 4 MB per second device and a 10 MB per second device may share a fast narrow bus. When the 4 MB per second device is using the bus, the transfer rate is 4 MB per second. When the 10 MB per second device is using the bus, the transfer rate is 10 MB per second. However, when faster devices are placed on a slower bus, their transfer rate is reduced to allow for proper operation in that slower environment. Note that the speed of the SCSI bus is a function of cable length, with slow, single-ended SCSI buses supporting a maximum cable length of 6 meters, and fast, single-ended SCSI buses supporting a maximum cable length of 3 meters.
TURBOchannel Bus
The TURBOchannel bus is a synchronous, 32-bit, asymmetrical I/O channel that can be operated at any fixed frequency in the range 12.5 MHz to 25 MHz. It is also an open bus, developed by Digital, whose architecture and interfaces are publicly documented. At 12.5 MHz, the peak data rate is 50 MB per second. At 25 MHz, the peak data rate is 100 MB per second. The TURBOchannel is asymmetrical in that the base system processor and system memory are defined separately from the TURBOchannel architecture. The I/O operations do not directly Version 2 EE IIT, Kharagpur 21
address each other. All data is entered into system memory before being transferred to another I/O option. The design facilitates a concise and compact protocol with very high performance.
XMI Bus
The XMI bus is a 64-bit wide parallel bus that can sustain a 100 MB per second bandwidth in a single processor configuration. The bandwidth is exclusive of addressing overhead; the XMI bus can transmit 100 MB per second of data. The XMI bus implements a "pended protocol" design so that the bus does not stall between requests and transmissions of data. Several transactions can be in progress at a given time. Bus cycles not used by the requesting device are available to other devices on the bus. Arbitration and data transfers occur simultaneously, with multiplexed data and address lines. These design features are particularly significant when a combination of multiple devices has a wider bandwidth than the bus itself.
VME Bus
Digital UNIX includes a generic VME interface layer that provides customers with a consistent interface to VME devices across Alpha AXP workstation and server platforms. Currently, VME adapters are only supported on the TURBOchannel bus. To use the VME interface layer to write VMEbus device drivers, you must have the Digital UNIX TURBOchannel/VME Adapter Driver Version 2.0 software (Software Product Description 48.50.00) and its required processor and/or hardware configurations (Software Support Addendum 48.50.00-A).
Module 3
Lesson 14
Timers
After going through this lesson the student would learn Standard Peripheral Devices most commonly used in single purpose processors. They are Timers and Counter Basics Various Modes of Timer Operation The internal Timer of 8051 A programmable interval timer 8253 Watchdog Timer and Watchdog circuit
Pre-Requisite
14
Introduction
The Peripherals of an embedded processor can either be on the same chip as the processor or can be connected externally.
External Interrupts Interrupt Control On-Chip Flash On-Chip RAM
ETC. Timer 1 Timer 0 Counter Inputs
CPU
Osc
Bus Control
4 I/O Ports
Serial Port
TXD RXD P0 P2 P1 P3
Fig. 14.1 Block Diagram of the basic 8051 Architecture For example in a typical embedded processor as shown in Fig.14.1 timer, interrupt. Serial port and parallel ports reside on the single chip. These dedicated units are otherwise termed as single-purpose processor. These units are designed to achieve the following objectives. They can be a part of the microcontroller or can reside outside the chip and therefore should be properly interfaced with the processor. The tasks generally carried out by such units are Timers, counters, watchdog timers serial transmission Version 2 EE IIT, Kharagpur 3
analog/digital conversions
Timer
Timer is a very common and useful peripheral. It is used to generate events at specific times or measures the duration of specific events which are external to the processor. It is a programmable device, i.e. the time period can be adjusted by writing specific bit patterns to some of the registers called timer-control registers.
Counter
A counter is a more general version of the timer. It is used to count events in the form of pulses which is fed to it. Fig.14.2(a) shows the block diagram of a simple timer. This has a 16-bit up counter which increments with each input clock pulse. Thus the output value Cnt represents the number of pulses since the counter was last reset to zero. An additional output top indicates when the terminal count has been reached. It may go high for a predetermined time as set by the programmable control word inside the timer unit. The count can be loaded by the external program. Fig.14.2(b) provides the structure of another timer where a multiplexer is used to choose between an internal clock or external clock. The mode bit when set or reset decided the selection. For internal clock(Clk) it behaves like the timer in Fig.14.2(a). For the external count in (cnt_in) it just counts the number of occurrences. Basic timer Clk 16 Cnt 16-bit up counter Top Reset Fig. 14.2(a) Mode Fig. 14.2(b) Fig.14.2(c) shows a timer with the terminal count. This can generate an event if a particular interval of time has been elapsed. The counter restarts after every terminal count. Reset Cnt_in Clk 2x1 mux 16-bit up counter 16 Cnt Timer/counter
Top
Timer with a terminal count 16-bit up counter
Clk
16 Cnt
Reset
=
Top
Terminal count
Fig. 14.2(c)
Clock Amplitude
-2 0 10
10
15 20 Clock Pulse No.
25
30
Counter Value
Reset and Reload the Timer with a new count each time 5
0 0 2
10
25
30
Output
10
25
30
Fig. 14.3 The Timer Count and Output. The timer is in count-down mode. In every clock pulse the count is decremented by 1. When the count value reaches zero the output of the counter i.e. TOP goes high for a predetermined time. The counter has to be loaded with a new or previous value of the count by external program or it can be loaded automatically every time the count reaches zero.
Timer in 8051 Microcontroller

Fig.14.1 shows the architecture of 8051 which has got two timer units. The 8051 comes equipped with two timers, both of which may be controlled, set, read, and configured individually. The 8051 timers have three general functions: 1) Keeping time and/or calculating the amount of time between events, 2) Counting the events themselves, or 3) Generating baud rates for the serial port. As mentioned before, the 8051 has two timers which each function essentially the same way. One timer is TIMER0 and the other is TIMER1. The two timers share two Special Function Registers(SFR) (TMOD and TCON) which control the timers, and each timer also has two SFRs dedicated solely to itself (TH0/TL0 and TH1/TL1).
Timer0 and Timer1

The Timer and Counter functions are selected in the Special Function Register TMOD. These two Timer/Counter have four operating modes which are selected by bit-pairs (M1. M0) in TMOD. Modes 0, 1, and 2 are the same for both Timer/Counters.Mode3 is different. Version 2 EE IIT, Kharagpur 6
MODE0
Either Timer in Mode0 is an 8-bit Counter with a divide-by-32 pre-scaler. In this mode, the Timer register is configured as a 13-Bit register. As the count rolls over from all 1s to all 0s, it sets the Timer interrupt flag TF1. The counted input is enabled to the Timer whenTR1 = 1and either GATE = 0 or INT1 = 1. (Setting GATE = 1 allows the Timer to be controlled by external input INT1, to facilitate pulse width measurements.)
OSC
+ 12 C/T = 0 C/T = 1
TL1 TH1 (5 Bits) (8 Bits) CONTROL
TF1
INTERRUPT
T1 PIN TR1 GATE INT1 PIN
Fig. 14.4 Timer/Counter Mode 0: 13-BitCounter
(MSB) GATE
C/T
M1
M0
GATE
C/T
M1
(LSB) M0
Timer 1 GATE Gating control when set. Timer/Counter M1 x is enabled only while INTx pin is 0 high and TRx control pin is set. When cleared Timer x is enabled 0 whenever TRx control bit is set. C/T Timer or Counter Selector cleared for Timer operation (input from internal system clock). Set for Counter operation (input from Tx input pin). 1
Timer 0 Operating Mode M0 8-bit Timer/Counter THx with 0 TLx as 5-bit prescaler. 16-bit Timer/Counter THx and 1 TLx are cascaded; there is no prescaler. 8-bit auto-reload Timer/Counter 0 THx holds a value which is to be reloaded into TLx each time it overflows. (Timer 0) TL0 is an 8-bit 1 Timer/Counter controlled by the standard Timer 0 control bits. THO is an 8-bit timer only controlled by Timer 1 controls bits. (Timer 1) Timer/Counter 1 stopped. 1
Mode Control Register (TMOD)
(MSB) TF1 Symbol TF1 Position
TR1
TF0 TR0
IE1
IT1
IE0
(LSB) IT0 Name and Significance Interrupt 1 Edge flag. Set by hardware when external interrupt edge detected. Cleared when interrupt processed. Interrupt 1 Type control bit. Set/cleared by software to specify falling edge/low level triggered external interrupts. Interrupt 0 Edge flag. Set by hardware when external interrupt edge detected. Cleared when interrupt processed. Interrupt 0 Type control bit. Set/cleared by software to specify falling edge/low level triggered external interrupts.
Name and Significance
Symbol Position IE1 TCON.3
TCON.7 Timer 1 overflow Flag. Set by hardware on Timer/Counter overflow. Cleared by hardware when processor vectors to interrupt routine. TCON.6 Timer 1 Run control bit. Set/cleared by software to turn Timer/Counter on/off. TCON.5 Timer 0 overflow Flag. Set by hardware on Timer/Counter overflow. Cleared by hardware when processor vectors to interrupt routine. TCON.4 Timer 1 Run control bit. Set/cleared by software to turn Timer/Counter on/off.
TR1
IT1
TCON.2
TF0
IE0
TCON.1
TR0
IT0
TCON.0
Timer/Counter Control Register (TCON) MODE 1: Mode 1 is the same as Mode 0, except that the Timer register is being run with all 16bits.
OSC
+ 12 C/T = 0 C/T = 1
TL1 (8 Bits) CONTROL
TF1
INTERRUPT
T1 PIN TR1 GATE INT1 PIN
RELOAD TH1 (5 Bits)
Fig. 14.5 MODE 2 configures the Timer register as an 8-bit counter with automatic reload
OSC
1/12 f
+ 12
1/12 f
OSC TL0 (8 Bits) CONTROL
OSC
C/T = 0 C/T = 1
TF0
INTERRUPT
T1 PIN
TR1 GATE INT1 PIN

1/12 f
OSC TR1
TH0 (8 Bits) CONTROL
TF1
INTERRUPT
Fig. 14.6 MODE 3: Timer simply holds its count. Timer 0 in Mode 3 establishes TL0 and TH0 as two separate counters.
The Programmable Interval Timer 8253

For processors where the timer unit is not internal the programmable interval timer can be used. Fig.14.7 shows the signals for 8253 programmable interval timer.
D7 D6 D5 D4 D3 D2 D1 D0 CLK 0 OUT 0 GATE 0 GND
1 2 3 4 5 6 7 8 9 10 11 12
8 2 5 3
24 23 22 21 20 19 18 17 16 15 14 13
Vcc WR RD CS A1 A0 CLK 2 OUT 2 GATE 2 CLK 1 GATE 1 OUT 1
Microprocessor interface D7 D0
Counter input/output CLK 0 GATE 0 OUT 0 CLK 1 GATE 1 OUT 1 CLK 2 GATE 2 OUT 2
RD WR
8253
A0 A1
CS Fig. 14.7 The pin configuration of the timer Fig.14.8 shows the internal block diagram. There are three separate counter units controlled by configuration register (Fig.14.9). Each counter has two inputs, clock and gate and one output. The clock is signal that helps in counting by decrementing a preloaded value in the respective counter register. The gate serves as an enable input. If the gate is maintained low the counting is disabled. The timing diagram explains in detail about the various modes of operation of the timer.
CLK0
RD WR A1 A0 CS
Bus
D7D0
Data Bus Buffer
Counter GATE0 #0
OUT0
Internal
Read Write Control Logic
CLK1
Counter GATE1 #1
OUT1
Power supplies
Vcc GND
CLK2
Control Word Register
Counter GATE2 #2
OUT2
Fig. 14.8 The internal block diagram of 8253 Table The address map Version 2 EE IIT, Kharagpur 10
CS 0 0 0 0
D7 SC1 D6 SC0
A1 0 0 1 1
D5 RL1
A0 0 1 0 1
D4 RL0
Port Counter 0 Counter 1 Counter 2 Control register

D3 M2 D2 M1 D1 M0 D0 BCD
0 1 0 0
Binary counter (16-bit) BCD (4 decades) 0 0 1 1 0 0 0 1 0 1 0 1 Mode 0 Mode 1 Mode 2 Mode 3 Mode 4 Mode 5

1 1 0 0 1 1 0 0 1 1 0 1 0 1 0 1 0 1
Counter latching operation Road/load LSB only Road/load MSB only Road/load LSB first, then MSB Select counter 0 Select counter 1 Select counter 2 Illegal
Fig. 14.9 Control Register
8253 Operating Modes

Mode 0 Interrupt on terminal count Mode 1 Programmable one shot Mode 2 Rate Generator Mode 3 Square wave rate Generator Mode 4 Software triggered strobe Mode 5 Hardware triggered strobe
Mode 0: The output goes high after the terminal count is reached. The counter stops if the Gate is low. (Fig.14.10(a) & (b)). The timer count register is loaded with a count (say 6) when the WR line is made low by the processor. The counter unit starts counting down with each clock pulse. The output goes high when the register value reaches zero. In the mean time if the GATE is made low (Fig.14.10(b)) the count is suspended at the value(3) till the GATE is enabled again. CLK
WR OUT 6 5 4 3 2 1
GATE Fig. 14.10(a) Mode 0 count when Gate is high (enabled) CLK
WR OUT 6 5 4 3 3 3 2 1
GATE Fig. 14.10(b) Mode 0 count when Gate is low temporarily (disabled)
Mode 1 Programmable mono-shot

The output goes low with the Gate pulse for a predetermined period depending on the counter. The counter is disabled if the GATE pulse goes momentarily low. The counter register is loaded with a count value as in the previous case (say 5) (Fig.14.11(a)). The output responds to the GATE input and goes low for period that equals the count down period of the register (5 clock pulses in this period). By changing the value of this count the duration of the output pulse can be changed. If the GATE becomes low before the count down is
completed then the counter will be suspended at that state as long as GATE is low (Fig.14.11(b)). Thus it works as a mono-shot. CLK
WR
GATE (trigger)
OUT
Fig. 14.11(a) Mode 1 The Gate goes high. The output goes low for the period depending on the count CLK
WR
GATE (trigger)
OUT
Fig. 14.11(b) Mode 1 The Gate pulse is disabled momentarily causing the counter to stop.
Mode 2 Programmable Rate Generator

Fig.14.12(a) and (b) shows the waveforms corresponding the Timer operation in this mode. In this mode it operates as a rate generator. The output goes high for a period that equals the time of count down of the count register (3 in this case). The output goes low exactly for one clock period before it becomes high again. This is a periodic operation.
CLK
WR
GATE 3 OUT Fig. 14.12(a) Mode 2 Operation when the GATE is kept high CLK 2 1 3 2 1
WR
GATE OUT 3 2 1 3 3 2 1
Fig. 14.12(b) Mode 2 operation when the GATE is disabled momentarily.
Mode 3 Programmable Square Wave Rate Generator

It is similar to Mode 2 but the output high and low period is symmetrical. The output goes high after the count is loaded and it remains high for period which equals the count down period of the counter register. The output subsequently goes low for an equal period and hence generates a symmetrical square wave unlike Mode 2. The GATE has no role here. (Fig.14.13).
CLK
WR n=4 OUT (n=4)
OUT (n=5)
Fig. 14.13 Mode3 Operation: Square Wave generator
Mode 4 Software Triggered Strobe

In this mode after the count is loaded by the processor the count down starts. The output goes low for one clock period after the count down is complete. The count down can be suspended by making the GATE low (Fig.14.14(a) (b)). This is also called a software triggered strobe as the count down is initiated by a program. CLK
WR OUT 4 3 2 1
Fig. 14.14(a) Mode 4 Software Triggered Strobe when GATE is high
CLK
WR
GATE OUT 4 3 3 2 1
Fig. 14.14(b) Mode 4 Software Triggered Strobe when GATE is momentarily low
Mode 5 Hardware Triggered Strobe

The count is loaded by the processor but the count down is initiated by the GATE pulse. The transition from low to high of the GATE pulse enables count down. The output goes low for one clock period after the count down is complete (Fig.14.15). CLK
WR
GATE OUT 5 4 3 2 1
Fig. 14.15 Mode 5 Hardware Triggered Strobe
Watchdog timer
A Watchdog Timer is a circuit that automatically invokes a reset unless the system being watched sends regular hold-off signals to the Watchdog.
Watchdog Circuit
To make sure that a particular program is executing properly the Watchdog circuit is used. For instance the program may reset a particular flip-flop periodically. And the flip-flop is set by an external circuit. Suppose the flip-flop is not reset for long time it can be known by using external hardware. This will indicate that the program is not executed properly and hence an exception or interrupt can be generated. Watch Dog Timer(WDT) provides a unique clock, which is independent of any external clock. When the WDT is enabled, a counter starts at 00 and increments by 1 until it reaches FF. When it goes from FF to 00 (which is FF + 1) then the processor will be reset or an exception will be generated. The only way to stop the WDT from resetting the processor or generating an exception or interrupt is to periodically reset the WDT back to 00 throughout the program. If the program gets stuck for some reason, then the WDT will not be set. The WDT will then reset or interrupt the processor. An interrupt service routine will be invoked to take into account the erroneous operation of the program. (getting stuck or going into infinite loop).
Conclusion
In this chapter you have learnt about the programmable timer/counter. For most of the embedded processors the timer is internal and exists along with the processor on the same chip. The 8051 microcontroller has 3 different internal timers which can be programmed in various modes by the configuration and mode control register. An external timer chip namely 8253 has also been discussed. It has 8 data lines 2 data lines, 1 chip select line and one read and one write control line. The 16 bit counts of the corresponding registers can be loaded with two consecutive write operations. Counters and Timers are used for triggering, trapping and managing various real time events. The least count of the timer depend on the clock. The stability of the clock decides the accuracy of the timings. Timers can be used to generate specific baud rate clocks for asynchronous serial communications. It can be used to measure speed, frequency and analog voltages after Voltage to Frequency conversion. One important application of timer is to generate Pulse-Width-Modulated (PWM) waveforms. In 8253 the GATE and pulse together can be used together to generate pulse with different widths. These modulated pulses are used in electronic power control to reduce harmonics and hence distortions. You also learnt about the Watch dog circuit and Watch dog timers. These are used to monitor the activity of a program and the processor.
Questions
Q1. Design a circuit using 8253 to measure the speed of any motor by counting the number of pulses in definite period. Q2. Write a pseudo code (any assembly code) to generate sinusoidal pulse width modulated waveform from the 8253 timer. Version 2 EE IIT, Kharagpur 17
Q3. Design a scheme to read temperature from a thermister circuit using a V/F converter and Timer. Q4. What are the differences in Mode 4 and Mode 5 operation of 8253 Timer? Q5. Explain the circuit given in Fig.14.5.
Module 3
Lesson 15
Interrupts
After going through this lesson the student would learn Interrupts Interrupt Service Subroutines Polling Priority Resolving Daisy Chain Interrupts Interrupt Structure in 8051 Microcontroller Programmable Interrupt Controller
Pre-Requisite
15
Introduction
Real Time Embedded System design requires that I/O devices receive servicing in an efficient manner so that large amounts of the total system tasks can be assumed by the processor with little or no effect on throughput. The most common method of servicing such devices is the polled approach. This is where the processor must test each device in sequence and in effect ask each one if it needs servicing. It is easy to see that a large portion of the main program is looping through this continuous polling cycle and that such a method would have a serious, detrimental effect on system throughput, thus, limiting the tasks that could be assumed by the microcomputer and reducing the cost effectiveness of using such devices. A more desirable method would be one that would allow the microprocessor to be executing its main program and only stop to service peripheral devices when it is told to do so by the device itself. In effect, the method would provide an external asynchronous input that would inform the processor that it should complete whatever instruction that is currently being executed and fetch a new routine that will service the requesting device. Once this servicing is complete, however, the processor would resume exactly where it left off. This can be effectively handled by interrupts. A signal informing a program or a device connected to the processor that an event has occurred. When a processor receives an interrupt signal, it takes a specified action depending on the priority and importance of the entity generating the signal. Interrupt signals can cause a program to suspend itself temporarily to service the interrupt by branching into another program called Interrupt Service Subroutines (ISS) for the specified device which has caused the interrupt.
Types of Interrupts
Interrupts can be broadly classified as - Hardware Interrupts These are interrupts caused by the connected devices. - Software Interrupts These are interrupts deliberately introduced by software instructions to generate user defined exceptions - Trap Version 2 EE IIT, Kharagpur 3
These are interrupts used by the processor alone to detect any exception such as divide by zero Depending on the service the interrupts also can be classified as - Fixed interrupt Address of the ISR built into microprocessor, cannot be changed Either ISR stored at address or a jump to actual ISR stored if not enough bytes available - Vectored interrupt Peripheral must provide the address of the ISR Common when microprocessor has multiple peripherals connected by a system bus Compromise between fixed and vectored interrupts One interrupt pin Table in memory holding ISR addresses (maybe 256 words) Peripheral doesnt provide ISR address, but rather index into table Fewer bits are sent by the peripheral Can move ISR location without changing peripheral Maskable vs. Non-maskable interrupts Maskable: programmer can set bit that causes processor to ignore interrupt This is important when the processor is executing a time-critical code Non-maskable: a separate interrupt pin that cant be masked Typically reserved for drastic situations, like power failure requiring immediate backup of data to non-volatile memory Example: Interrupt Driven Data Transfer (Fixed Interrupt) Fig.15.1(a) shows the block diagram of a system where it is required to read data from a input port P1, modify (according to some given algorithm) and send to port P2. The input port generates data at a very slow pace. There are two ways to transfer data (a) The processor waits till the input is ready with the data and performs a read operation from P1 followed by a write operation to P2. This is called Programmed Data Transfer (b) The other option is when the input/output device is slow then the device whenever is ready interrupts the microprocessor through an Int pin as shown in Fig.15.1. The processor which may be otherwise busy in executing another program (main program here) after receiving the interrupts calls an Interrupt Service Subroutine (ISR) to accomplish the required data transfer. This is known as Interrupt Driven Data Transfer.
Program memory ISR

16: MOV R0, 0x8000 17: # modifies R0 18: MOV 0x8001, R0 19: RETI # ISR return ...
Data memory C
System bus
Main program
... 100: instruction 101: instruction
Int
P1 0x8000
P2 0x8001
PC
Fig: 15.1(a) The Interrupt Driven Data Transfer
PC-Program counter, P1-Port 1 P2-Port 2, C-Microcontroller

C is executing its main program at 100 P1 receives input data in a register with address 0x8000. Time
After completing instruction at 100, C sees Int asserted, saves the PCs value of 100, and sets PC to the ISR fixed location of 16. The ISR reads data from 0x8000, modifies the data, and writes the resulting data to 0x8001.
P1 asserts Int to request servicing by the microprocessor.
After being read, P1 de-asserts Int.
The ISR returns, thus restoring PC to 100+1=101, where P resumes executing. Fig. 15.1(b) Flow chart for Interrupt Service Fig.15.1(b) describes the sequence of action taking place after the Port P1 is ready with the data. Example: Interrupt Driven Data Transfer (Vectored Interrupt) Version 2 EE IIT, Kharagpur 5
Program memory ISR

16: MOV R0, 0x8000 17: # modifies R0 19: RETI # ISR return ...
Data memory
System bus
18: MOV 0x8001, R0 Inta Int P1 0 16 0x8000 0x8001 P2
Main program
... 100: instruction 101: instruction
PC
100 Fig. 15.2(a)
Time
C is executing its main program.
P1 receives input data in a register with address 0x8000. P1 asserts Int to request servicing by the microprocessor. P1 detects Inta and puts interrupt address vector 16 on the data bus.
After completing instruction at 100, C sees Int asserted, saves the PCs value of 100, and asserts Inta.
C jumps to the address on the bus (16). The ISR there reads data from 0x8000, modifies the data, and writes the resulting data to 0x8001. The ISR returns, thus restoring PC to 100+1=101, where P resumes executing.
After being read, P1 deasserts Int.
Fig. 15.2(b) Vectored Interrupt Service
Interrupts in a Typical Microcontroller (say 8051)

External Interrupts Interrupt Control 4k ROM 128 RAM Timer 1 Timer 0 Counter Inputs
CPU
Osc
Bus Control
Four I/O Ports
Serial Port TXD RXD
P0
P2
P1
P3
Address/Data
Fig. 15.3 The 8051 Architecture The 8051 has 5 interrupt sources: 2 external interrupts, 2 timer interrupts, and the serial port interrupt. These interrupts occur because of 1. timers overflowing 2. receiving character via the serial port 3. transmitting character via the serial port 4. Two external events
Interrupt Enables
Each interrupt source can be individually enabled or disabled by setting or clearing a bit in a Special Function Register (SFR) named IE (Interrupt Enable). This register also contains a global disable bit, which can be cleared to disable all interrupts at once.
Interrupt Priorities
Each interrupt source can also be individually programmed to one of two priority levels by setting or clearing a bit in the SFR named IP (Interrupt Priority). A low-priority interrupt can be interrupted by a high-priority interrupt, but not by another low-priority interrupt. A high-priority interrupt cant be interrupted by any other interrupt source. If two interrupt requests of different priority levels are received simultaneously, the request of higher priority is serviced. If interrupt requests of the same priority level are received simultaneously, an internal polling sequence determines which request is serviced. Thus within each priority level there is a second priority structure determined by the polling sequence. In operation, all the interrupt flags are latched into the interrupt control system during State 5 of every machine cycle. The samples are polled during the following machine cycle. If the flag for an enabled interrupt is found to be set (1), the Version 2 EE IIT, Kharagpur 7
interrupt system generates a CALL to the appropriate location in Program Memory, unless some other condition blocks the interrupt. Several conditions can block an interrupt, among them that an interrupt of equal or higher priority level is already in progress. The hardware-generated CALL causes the contents of the Program Counter to be pushed into the stack, and reloads the PC with the beginning address of the service routine. Interrupt Enable(IE) Register terrupt Priority (IP) Regist
IE Register INT0 TF0 0 1 TF1 RI TI Individual Enables IE1 0 1 IT0 IE0
IP Register
High Priority Interrupt
Interrupt Polling Sequence IT1
INT1
Global Disable
Low Priority Interrupt
Fig. 15.4 8051 Interrupt Control System
INT0 : External Interrupt 0 INT0 : External Interrupt 1 TF0: Timer 0 Interrupt TF1: Timer 1 Interrupt RI,TI: Serial Port Receive/Transmit Interrupt
The service routine for each interrupt begins at a fixed location (fixed address interrupts). Only the Program Counter (PC) is automatically pushed onto the stack, not the Processor Status Word (which includes the contents of the accumulator and flag register) or any other register. Having only the PC automatically saved allows the programmer to decide how much time should be spent saving other registers. This enhances the interrupt response time, albeit at the expense of increasing the programmers burden of responsibility. As a result, many interrupt functions that are typical in control applications toggling a port pin for example, or reloading a timer, or unloading a serial buffer can often be completed in less time than it takes other architectures to complete. Interrupt Number 0 1 2 3 4 Interrupt Vector Address 0003h 000Bh 0013h 001Bh 0023h Description EXTERNAL 0 TIMER/COUNTER 0 EXTERNAL 1 TIMER/COUNTER 1 SERIAL PORT
Simultaneously occurring interrupts are serviced in the following order: 1. 2. 3. 4. 5. External 0 Interrupt Timer 0 Interrupt External 1 Interrupt Timer 1 Interrupt Serial Interrupt
The Bus Arbitration

When there are more than one device need interrupt service then they have to be connected in specific manner. The processor responds to each one of them. This is called Arbitration. The method can be divided into following Priority Arbiter Daisy Chain Arbiter
Priority Arbiter
C
System bus Inta Int 5 3 Priority arbiter Ireq1 Iack1 Ireq2 Iack2 Fig. 15.5 The Priority Arbitration Let us assume that the Priority of the devices are Device1 > Device 2 1. The Processor is executing its program. 2. Peripheral1 needs servicing so asserts Ireq1. Peripheral2 also needs servicing so asserts Ireq2. 3. Priority arbiter sees at least one Ireq input asserted, so asserts Int. 4. Processor stops executing its program and stores its state. 5. Processor asserts Inta. 6. Priority arbiter asserts Iack1 to acknowledge Peripheral1. 7. Peripheral1 puts its interrupt address vector on the system bus 8. Processor jumps to the address of ISR read from data bus, ISR executes and returns(and completes handshake with arbiter). Thus in case of simultaneous interrupts the device with the highest priority will be served. 6 7 Peripheral 1 2 Peripheral 2 2
Daisy Chain Interrupts

In this case the peripherals needing interrupt service are connected in a chain as shown in Fig.15.6. The requests are chained and hence any device interrupting shall be transmitted to the CPU in a chain. Let us assume that the Priority of the devices are Device1 > Device 2 1. The Processor is executing its program. 2. Any Peripheral needs servicing asserts Req out. This Req out goes to the Req in of the subsequent device in the chain 3. Thus the peripheral nearest to the C asserts Int. 4. The processor stops executing its program and stores its state. 5. Processor asserts Inta the nearest device. 6. The Inta passes through the chain till it finds a flag which is set by the device which has generated the interrupt. 7. The interrupting device sends the Interrupt Address Vector to the processor for its interrupt service subroutine. Version 2 EE IIT, Kharagpur 10
8. The processor jumps to the address of ISR read from data bus, ISR executes and returns. 9. The flag is reset. The processor now check for the next device which has interrupted simultaneously. C System bus Peripheral 1 Ack_in Ack_out Req_out Req_in Peripheral 2 Ack_in Ack_out Req_out Req_in
Inta Int
Fig. 15.6 The Daisy Chain Arbitration In this case The device nearest to the processor has the highest priority The service to the subsequent stages is interrupted if the chain is broken at one place.
Handling a number of Interrupts by Intel 8259 Programmable Interrupt Controller

The Programmable Interrupt Controller (PlC) functions as an overall manager in an InterruptDriven system. It accepts requests from the peripheral equipment, determines which of the incoming requests is of the highest importance (priority), ascertains whether the incoming request has a higher priority value than the level currently being serviced, and issues an interrupt to the CPU based on this determination.
INT CPU
PIC
RAM
I/O (1)
ROM
I/O (2)
I/O (N)
Fig. 15.7 Handling a number of interrupts Version 2 EE IIT, Kharagpur 11
Each peripheral device or structure usually has a special program or routine that is associated with its specific functional or operational requirements; this is referred to as a service routine. The PlC, after issuing an interrupt to the CPU, must somehow input information into the CPU that can point (vector) the Program Counter to the service routine associated with the requesting device. The PIC manages eight levels of requests and has built-in features for expandability to other PIC (up to 64 levels). It is programmed by system software as an I/O peripheral. The priority modes can be changed or reconfigured dynamically at any time during main program operation.
Interrupt Request Register (IRR) and In-Service Register (ISR)

The interrupts at the IR input lines are handled by two registers in cascade, the Interrupt Request Register (lRR) and the In- Service Register (lSR). The IRR is used to indicate all the interrupt levels which are requesting service, and the ISR is used to store all the interrupt levels which are currently being serviced.
Priority Resolver
This logic block determines the priorities of the bits set in the lRR. The highest priority is selected and strobed into the corresponding bit of the lSR during the INTA sequence.
Interrupt Mask Register (IMR)

The lMR stores the bits which disable the interrupt lines to be masked. The IMR operates on the output of the IRR. Masking of a higher priority input will not affect the interrupt request lines of lower priority.
Data Bus Buffer

This 3-state, bidirectional 8-bit buffer is used to interface the PIC to the System Data Bus. Control words and status information are transferred through the Data Bus Buffer.
Read/Write Control Logic

The function of this block is to accept output commands from the CPU. It contains the Initialization Command Word (lCW) registers and Operation Command Word (OCW) registers which store the various control formats for device operation. This function block also allows the status of the PIC to be transferred onto the Data Bus. This function block stores and compares the IDs of all PICs used in the system. The associated three I/O pins (CAS0- 2) are outputs when the 8259 is used as a master and are inputs when the 8259 is used as a slave. As a master, the 8259 sends the ID of the interrupting slave device onto the CAS0 - 2 lines. The slave, thus selected will send its preprogrammed subroutine address onto the Data Bus during the next one or two consecutive INTA pulses.
D[7..0] A[0..0] RD WR INT INTA CAS[2..0] SP/EN
Intel 8259
IR0 IR1 IR2 IR3 IR4 IR5 IR6 IR7
Fig. 15.8 The 8259 Interrupt Controller

INTA
D7-D0
INT
DATA BUS BUFFER
CONTROL LOGIC
RD WR A0 CS CAS 0 CAS 1 CAS 2 SP/EN
READ/ WRITE LOGIC
INSERVICE REG (ISR)
PRIORITY RESOLVER
INTERRUPT REQUEST REG (IRR)
IR0 IR1 IR2 IR3 IR4 IR5 IR6 IR7
CASCADE BUFFER COMPARATOR
INTERRUPT MASK REG (IMR) INTERNAL BUS
Fig. 15.9 The Functional Block Diagram Table of Signals of the PIC Signal D[7..0] A[0..0] Description These wires are connected to the system bus and are used by the microprocessor to write or read the internal registers of the 8259. This pin acts in conjunction with WR/RD signals. It is used by the 8259 to decipher various command words the microprocessor writes and status the microprocessor wishes to read. When this write signal is asserted, the 8259 accepts the command on the data line, i.e., the microprocessor writes to the 8259 by placing a command on the data lines and asserting this signal. When this read signal is asserted, the 8259 provides on the data lines its status, i.e., the microprocessor reads the status of the 8259 by asserting this signal and reading the data lines. This signal is asserted whenever a valid interrupt request is received by the 8259, i.e., it is used to interrupt the microprocessor.
WR
RD
INT
INTA
IR 0,1,2,3,4,5,6,7 CAS[2..0] SP/EN
This signal, is used to enable 8259 interrupt-vector data onto the data bus by a sequence of interrupt acknowledge pulses issued by the microprocessor. An interrupt request is executed by a peripheral device when one of these signals is asserted. These are cascade signals to enable multiple 8259 chips to be chained together. This function is used in conjunction with the CAS signals for cascading purposes.
Fig.15.10 shows the daisy chain connection of a number of PICs. The extreme right PIC interrupts the processor. In this figure the processor can entertain up to 24 different interrupt requests. The SP/EN signal has been connected to Vcc for the master and grounded for the slaves.
ADDRESS BUS (16) CONTROL BUS INT REQ DATA BUS (8)
CS A0 D7 D0 INTA INT CAS 0 82C59A SLAVE A CAS 1 CAS 2 SP/EN 7 6 5 4 3 2 1 0 GND 7 6 5 4 3 2 1 0
CS A0 D7 D0 INTA INT CAS 0 82C59A SLAVE B CAS 1 CAS 2 SP/EN 7 6 5 4 3 2 1 0 GND 7 6 5 4 3 2 1 0
CS A0 D7 D0 INTA INT CAS 0 CAS 1 MASTER 82C59A CAS 2 SP/EN 7 6 5 4 3 2 1 0 VCC 7 6 5 4 3 2 1 0
INTERRUPT REQUESTS
Fig. 15.10 Nested Connection of Interrupts
Software Interrupts
These are initiated by the program by specific instructions. On encountering such instructions the CPU executes an Interrupt service subroutine.
Conclusion
In this chapter you have learnt about the Interrupts and the Programmable Interrupt Controller. Different methods of interrupt services such as Priority arbitration and Daisy Chain arbitration have been discussed. In real time systems the interrupts are used for specific cases and the time of execution of these Interrupt Service Subroutines are almost fixed. Too many interrupts are not encouraged in real time as it may severely disrupt the services. Please look at problem no.1 in the exercise. Most of the embedded processors are equipped with an interrupt structure. Rarely there is a need to use a PIC. Some of the entry level microcontrollers do not have an inbuilt exception Version 2 EE IIT, Kharagpur 14
handler called trap. The trap is also an interrupt which is used to handle some extreme processor conditions such as divide by 0, overflow etc.
Question Answers
Q1. A computer system has three devices whose characteristics are summarized in the following table: Device D1 D2 D3 Service Time 150s 50s 100s Interrupt Frequency 1/(800s) 1/(1000s) 1/(800s) Allowable Latency 50s 50s 100s
Service time indicates how long it takes to run the interrupt handler for each device. The maximum time allowed to elapse between an interrupt request and the start of the interrupt handler is indicated by allowable latency. If a program P takes 100 seconds to execute when interrupts are disabled, how long will P take to run when interrupts are enabled? Ans: The CPU time taken to service the interrupts must be found out. Let us consider Device 1. It takes 400 s to execute and occurs at a frequency of 1/(800s) (1250 times a second). Consider a time quantum of 1 unit. The Device 1 shall take (150+50)/800= 1/4 unit The Device 2 shall take (50+50)/1000=1/10 unit The Device 3 shall take (100+100)/800=1/4 unit In one unit of real time the cpu time taken by all these devices is (1/4+1/10+1/4) = 0.6 units The cpu idle time 0.4 units which can be used by the Program P. For 100 seconds of CPU time the Real Time required will be 100/0.4= 250 seconds Q.2 What is TRAP? Ans: The term trap denotes a programmer initiated and expected transfer of control to a special handler routine. In many respects, a trap is nothing more than a specialized subroutine call. Many texts refer to traps as software interrupts. Traps are usually unconditional; that is, when you execute an Interrupt instruction, control always transfers to the procedure associated with the trap. Since traps execute via an explicit instruction, it is easy to determine exactly which instructions in a program will invoke a trap handling routine.
Q.3. Discuss about the Interrupt Acknowledge Machine Cycle. Ans: For vectored interrupts the processor expects the address from the external device. Once it receives the interrupt it starts an Interrupt acknowledge cycle as shown in the figure. In the figure TN is the last clock state of the previous instruction immediately after which the processor checks the status of the Intr pin which has already become high by the external device. Therefore the processor starts an INTA cycle in which it brings the interrupt vector through the data lines. If the data lines arte 8-bits and the address required is 16 bits there will be two I/O read. If the interrupt vector is a number which will be vectored to a look up table then only 8-bits are required and hence one I/O read will be there.
TN T1 T2 T3
CLK
INTREQ
INTACK
Data
Address code
Last machine cycle of instruction
Interrupt Acknowledge machine cycle
Module 3
Lesson 16
DMA
After going through this lesson the student would learn The concept of Direct Memory Access When and where to use DMA? How to initiate an DMA cycle? What are the different steps of DMA? What is a typical DMA controller?
Pre-Requisite
16(I)
Introduction
Drect Memory Access (DMA) allows devices to transfer data without subjecting the processor a heavy overhead. Otherwise, the processor would have to copy each piece of data from the source to the destination. This is typically slower than copying normal blocks of memory since access to I/O devices over a peripheral bus is generally slower than normal system RAM. During this time the processor would be unavailable for any other tasks involving processor bus access. But it can continue to work on any work which does not require bus access. DMA transfers are essential for high performance embedded systems where large chunks of data need to be transferred from the input/output devices to or from the primary memory.
16(II)
DMA Controller
A DMA controller is a device, usually peripheral to a CPU that is programmed to perform a sequence of data transfers on behalf of the CPU. A DMA controller can directly access memory and is used to transfer data from one memory location to another, or from an I/O device to memory and vice versa. A DMA controller manages several DMA channels, each of which can be programmed to perform a sequence of these DMA transfers. Devices, usually I/O peripherals, that acquire data that must be read (or devices that must output data and be written to) signal the DMA controller to perform a DMA transfer by asserting a hardware DMA request (DRQ) signal. A DMA request signal for each channel is routed to the DMA controller. This signal is monitored and responded to in much the same way that a processor handles interrupts. When the DMA controller sees a DMA request, it responds by performing one or many data transfers from that I/O device into system memory or vice versa. Channels must be enabled by the processor for the DMA controller to respond to DMA requests. The number of transfers performed, transfer modes used, and memory locations accessed depends on how the DMA channel is programmed. A DMA controller typically shares the system memory and I/O bus with the CPU and has both bus master and slave capability. Fig.16.1 shows the DMA controller architecture and how the DMA controller interacts with the CPU. In bus master mode, the DMA controller acquires the system bus (address, data, and control lines) from the CPU to perform the
DMA transfers. Because the CPU releases the system bus for the duration of the transfer, the process is sometimes referred to as cycle stealing. In bus slave mode, the DMA controller is accessed by the CPU, which programs the DMA controller's internal registers to set up DMA transfers. The internal registers consist of source and destination address registers and transfer count registers for each DMA channel, as well as control and status registers for initiating, monitoring, and sustaining the operation of the DMA controller.
DMA Controller
...
Status Register
...
Enable/ Disable
CPU
Mask Register
DMA Channel X Base Count TC Current Count Base Address Current Address
Base Request
DMA Arbitration Logic
Base Grant
DACKX DRQX
TC
PC Bus
Fig. 16.1 The DMA controller architecture
DMA Transfer Types and Modes

DMA controllers vary as to the type of DMA transfers and the number of DMA channels they support. The two types of DMA transfers are flyby DMA transfers and fetch-and-deposit DMA transfers. The three common transfer modes are single, block, and demand transfer modes. These DMA transfer types and modes are described in the following paragraphs. The fastest DMA transfer type is referred to as a single-cycle, single-address, or flyby transfer. In a flyby DMA transfer, a single bus operation is used to accomplish the transfer, with data read from the source and written to the destination simultaneously. In flyby operation, the device requesting service asserts a DMA request on the appropriate channel request line of the DMA controller. The DMA controller responds by gaining control of the system bus from the CPU and then issuing the pre-programmed memory address. Simultaneously, the DMA controller sends a DMA acknowledge signal to the requesting device. This signal alerts the requesting device to drive the data onto the system data bus or to latch the data from the system bus, depending on the direction of the transfer. In other words, a flyby DMA transfer looks like a memory read or write cycle with the DMA controller supplying the address and the I/O device reading or writing the data. Because flyby DMA transfers involve a single memory cycle per data transfer, these transfers are very efficient. Fig.16.2 shows the flyby DMA transfer signal protocol.
DMA Request (I/O Device) DMA Acknowledge* (DMA Controller) I/O Read* (DMA Controller) Memory Write* (DMA Controller) Address (DMA Controller) Data I/O Device Memory Address DMA request remains high for additional transfers.
Data
Fig. 16.2 Flyby DMA transfer The second type of DMA transfer is referred to as a dual-cycle, dual-address, flowthrough, or fetch-and-deposit DMA transfer. As these names imply, this type of transfer involves two memory or I/O cycles. The data being transferred is first read from the I/O device or memory into a temporary data register internal to the DMA controller. The data is then written to the memory or I/O device in the next cycle. Fig.16.3 shows the fetch-and-deposit DMA transfer signal protocol. Although inefficient because the DMA controller performs two cycles and thus retains the system bus longer, this type of transfer is useful for interfacing devices with different data bus sizes. For example, a DMA controller can perform two 16-bit read operations from one location followed by a 32-bit write operation to another location. A DMA controller supporting this type of transfer has two address registers per channel (source address and destination address) and bus-size registers, in addition to the usual transfer count and control registers. Version 2 EE IIT, Kharagpur 5
Unlike the flyby operation, this type of DMA transfer is suitable for both memory-to-memory and I/O transfers.
DMA Request I/O Device I/O Read* Memory Write*
(DMA Controller)
Address Data
I/O Address Data
Memory Address Data
Fig. 16.3 Fetch-and-Deposit DMA Transfer Single, block, and demand are the most common transfer modes. Single transfer mode transfers one data value for each DMA request assertion. This mode is the slowest method of transfer because it requires the DMA controller to arbitrate for the system bus with each transfer. This arbitration is not a major problem on a lightly loaded bus, but it can lead to latency problems when multiple devices are using the bus. Block and demand transfer modes increase system throughput by allowing the DMA controller to perform multiple DMA transfers when the DMA controller has gained the bus. For block mode transfers, the DMA controller performs the entire DMA sequence as specified by the transfer count register at the fastest possible rate in response to a single DMA request from the I/O device. For demand mode transfers, the DMA controller performs DMA transfers at the fastest possible rate as long as the I/O device asserts its DMA request. When the I/O device unasserts this DMA request, transfers are held off.
DMA Controller Operation

For each channel, the DMA controller saves the programmed address and count in the base registers and maintains copies of the information in the current address and current count registers, as shown in Fig.16.1. Each DMA channel is enabled and disabled via a DMA mask register. When DMA is started by writing to the base registers and enabling the DMA channel, the current registers are loaded from the base registers. With each DMA transfer, the value in the current address register is driven onto the address bus, and the current address register is automatically incremented or decremented. The current count register determines the number of transfers remaining and is automatically decremented after each transfer. When the value in the current count register goes from 0 to -1, a terminal count (TC) signal is generated, which signifies the completion of the DMA transfer sequence. This termination event is referred to as reaching terminal count. DMA controllers often generate a hardware TC pulse during the last cycle of a DMA transfer sequence. This signal can be monitored by the I/O devices participating in the DMA transfers. DMA controllers require reprogramming when a DMA channel reaches TC. Thus, DMA controllers require some CPU time, but far less than is required for the CPU to service device I/O interrupts. When a DMA channel reaches TC, the processor may need to reprogram the controller for additional DMA transfers. Some DMA controllers interrupt the Version 2 EE IIT, Kharagpur 6
processor whenever a channel terminates. DMA controllers also have mechanisms for automatically reprogramming a DMA channel when the DMA transfer sequence completes. These mechanisms include auto initialization and buffer chaining. The auto initialization feature repeats the DMA transfer sequence by reloading the DMA channel's current registers from the base registers at the end of a DMA sequence and re-enabling the channel. Buffer chaining is useful for transferring blocks of data into noncontiguous buffer areas or for handling doublebuffered data acquisition. With buffer chaining, a channel interrupts the CPU and is programmed with the next address and count parameters while DMA transfers are being performed on the current buffer. Some DMA controllers minimize CPU intervention further by having a chain address register that points to a chain control table in memory. The DMA controller then loads its own channel parameters from memory. Generally, the more sophisticated the DMA controller, the less servicing the CPU has to perform. A DMA controller has one or more status registers that are read by the CPU to determine the state of each DMA channel. The status register typically indicates whether a DMA request is asserted on a channel and whether a channel has reached TC. Reading the status register often clears the terminal count information in the register, which leads to problems when multiple programs are trying to use different DMA channels. Steps in a Typical DMA cycle Device wishing to perform DMA asserts the processors bus request signal. 1. Processor completes the current bus cycle and then asserts the bus grant signal to the device. 2. The device then asserts the bus grant ack signal. 3. The processor senses in the change in the state of bus grant ack signal and starts listening to the data and address bus for DMA activity. 4. The DMA device performs the transfer from the source to destination address. 5. During these transfers, the processor monitors the addresses on the bus and checks if any location modified during DMA operations is cached in the processor. If the processor detects a cached address on the bus, it can take one of the two actions:
o
Processor invalidates the internal cache entry for the address involved in DMA write operation Processor updates the internal cache when a DMA write is detected
6. Once the DMA operations have been completed, the device releases the bus by asserting the bus release signal. 7. Processor acknowledges the bus release and resumes its bus cycles from the point it left off.
16(III)
8237 DMA Controller

IOR IOW MEMR MEMW NC READY HLDA ADSTB
1 2 3 4 5 6 7 8 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21
A7 A6 A5 A4 EOP A3 A2 A1 A0 VCC DB0 DB1 DB2 DB3 DB4 DACK0 DACK1 DB5 DB6 DB7
AEN 9 HRQ 10 CS 11 CLK 12 RESET 13 DACK2 14 DACK3 15 DREQ3 16 DREQ2 17 DREQ1 18 DREQ0 19 (GND) VSS 20
Fig. 16.4 The DMA pin-out

EOP RESET CS READY CLK AEN ADSTB MEMR MEMW IOR IOW DECREMENTOR TEMP WORD COUNT REG (16) TIMING AND CONTROL INC/DECREMENTOR TEMP ADDRESS REG (16) IO BUFFER A0-A3
16-BIT BUS 16-BIT BUS READ WRITE BUFFER CURRENT ADDRESS (16) CURRENT WORD COUNT (16) OUTPUT BUFFER A8-A15
READ BUFFER BASE ADDRESS (16)
A4-A7
BASE WORD COUNT (16)
COMMAND CONTROL D0-D1
WRITE BUFFER DREQ0DREQ3 HLDA HRQ DACK0DACK3 4 PRIORITY ENCODER AND ROTATING PRIORITY LOGIC COMMAND (8) MASK (4) REQUEST (4)
READ BUFFER
INTERNAL DATA BUS
IO BUFFER DB0-DB7
MODE (4 x 6)
STATUS (8)
TEMPORARY (8)
Fig. 16.5 The 8237 Architecture Version 2 EE IIT, Kharagpur 8
Signal Description (Fig.16.4 and Fig.16.5) VCC: is the +5V power supply pin GND Ground CLK: CLOCK INPUT: The Clock Input is used to generate the timing signals which control 82C37A operations. CS: CHIP SELECT: Chip Select is an active low input used to enable the controller onto the data bus for CPU communications. RESET: This is an active high input which clears the Command, Status, Request, and Temporary registers, the First/Last Flip-Flop, and the mode register counter. The Mask register is set to ignore requests. Following a Reset, the controller is in an idle cycle. READY: This signal can be used to extend the memory read and write pulses from the 82C37A to accommodate slow memories or I/O devices. HLDA: HOLD ACKNOWLEDGE: The active high Hold Acknowledge from the CPU indicates that it has relinquished control of the system busses. DREQ0-DREQ3: DMA REQUEST: The DMA Request (DREQ) lines are individual asynchronous channel request inputs used by peripheral circuits to obtain DMA service. In Fixed Priority, DREQ0 has the highest priority and DREQ3 has the lowest priority. A request is generated by activating the DREQ line of a channel. DACK will acknowledge the recognition of a DREQ signal. Polarity of DREQ is programmable. RESET initializes these lines to active high. DREQ must be maintained until the corresponding DACK goes active. DREQ will not be recognized while the clock is stopped. Unused DREQ inputs should be pulled High or Low (inactive) and the corresponding mask bit set. DB0-DB7: DATA BUS: The Data Bus lines are bidirectional three-state signals connected to the system data bus. The outputs are enabled in the Program condition during the I/O Read to output the contents of a register to the CPU. The outputs are disabled and the inputs are read during an I/O Write cycle when the CPU is programming the 82C37A control registers. During DMA cycles, the most significant 8-bits of the address are output onto the data bus to be strobed into an external latch by ADSTB. In memory-to-memory operations, data from the memory enters the 82C37A on the data bus during the read-from-memory transfer, then during the write-to-memory transfer, the data bus outputs write the data into the new memory location. IOR: READ: I/O Read is a bidirectional active low three-state line. In the Idle cycle, it is an input control signal used by the CPU to read the control registers. In the Active cycle, it is an output control signal used by the 82C37A to access data from the peripheral during a DMA Write transfer. IOW: WRITE: I/O Write is a bidirectional active low three-state line. In the Idle cycle, it is an input control signal used by the CPU to load information into the 82C37A. In the Active cycle, it is an output control signal used by the 82C37A to load data to the peripheral during a DMA Read transfer. EOP: END OF PROCESS: End of Process (EOP) is an active low bidirectional signal. Information concerning the completion of DMA services is available at the bidirectional EOP pin. The 82C37A allows an external signal to terminate an active DMA service by pulling the EOP pin low. A pulse is generated by the 82C37A when terminal count (TC) for any channel is reached, except for channel 0 in memory-to-memory mode. During memory-to-memory Version 2 EE IIT, Kharagpur 9
transfers, EOP will be output when the TC for channel 1 occurs. The EOP pin is driven by an open drain transistor on-chip, and requires an external pull-up resistor to VCC. When an EOP pulse occurs, whether internally or externally generated, the 82C37A will terminate the service, and if auto-initialize is enabled, the base registers will be written to the current registers of that channel. The mask bit and TC bit in the status word will be set for the currently active channel by EOP unless the channel is programmed for autoinitialize. In that case, the mask bit remains clear. A0-A3: ADDRESS: The four least significant address lines are bidirectional three-state signals. In the Idle cycle, they are inputs and are used by the 82C37A to address the control register to be loaded or read. In the Active cycle, they are outputs and provide the lower 4-bits of the output address. A4-A7: ADDRESS: The four most significant address lines are three-state outputs and provide 4-bits of address. These lines are enabled only during the DMA service. HRQ: HOLD REQUEST: The Hold Request (HRQ) output is used to request control of the system bus. When a DREQ occurs and the corresponding mask bit is clear, or a software DMA request is made, the 82C37A issues HRQ. The HLDA signal then informs the controller when access to the system busses is permitted. For stand-alone operation where the 82C37A always controls the busses, HRQ may be tied to HLDA. This will result in one S0 state before the transfer. DACK0-DACK3: DMA ACKNOWLEDGE: DMA acknowledge is used to notify the individual peripherals when one has been granted a DMA cycle. The sense of these lines is programmable. RESET initializes them to active low. AEN: ADDRESS ENABLE: Address Enable enables the 8-bit latch containing the upper 8 address bits onto the system address bus. AEN can also be used to disable other system bus drivers during DMA transfers. AEN is active high. ADSTB: ADDRESS STROBE: This is an active high signal used to control latching of the upper address byte. It will drive directly the strobe input of external transparent octal latches, such as the 82C82. During block operations, ADSTB will only be issued when the upper address byte must be updated, thus speeding operation through elimination of S1 states. ADSTB timing is referenced to the falling edge of the 82C37A clock. MEMR: MEMORY READ: The Memory Read signal is an active low three-state output used to access data from the selected memory location during a DMA Read or a memory-to-memory transfer. MEMW MEMORY WRITE: The Memory Write signal is an active low three-state output used to write data to the selected memory location during a DMA Write or a memory-to-memory transfer. NC: NO CONNECT: Pin 5 is open and should not be tested for continuity.
Functional Description
The 82C37A direct memory access controller is designed to improve the data transfer rate in systems which must transfer data from an I/O device to memory, or move a block of memory to an I/O device. It will also perform memory-to-memory block moves, or fill a block of memory with data from a single location. Operating modes are provided to handle single byte transfers as Version 2 EE IIT, Kharagpur 10
well as discontinuous data streams, which allows the 82C37A to control data movement with software transparency. The DMA controller is a state-driven address and control signal generator, which permits data to be transferred directly from an I/O device to memory or vice versa without ever being stored in a temporary register. This can greatly increase the data transfer rate for sequential operations, compared with processor move or repeated string instructions. Memory-to-memory operations require temporary internal storage of the data byte between generation of the source and destination addresses, so memory-to-memory transfers take place at less than half the rate of I/O operations, but still much faster than with central processor techniques. The block diagram of the 82C37A is shown in Fig.16.6. The timing and control block, priority block, and internal registers are the main components. The timing and control block derives internal timing from clock input, and generates external control signals. The Priority Encoder block resolves priority contention between DMA channels requesting service simultaneously.
DMA Operation
In a system, the 82C37A address and control outputs and data bus pins are basically connected in parallel with the system busses. An external latch is required for the upper address byte. While inactive, the controllers outputs are in a high impedance state. When activated by a DMA request and bus control is relinquished by the host, the 82C37A drives the busses and generates the control signals to perform the data transfer. The operation performed by activating one of the four DMA request inputs has previously been programmed into the controller via the Command, Mode, Address, and Word Count registers. For example, if a block of data is to be transferred from RAM to an I/O device, the starting address of the data is loaded into the 82C37A Current and Base Address registers for a particular channel, and the length of the block is loaded into the channels Word Count register. The corresponding Mode register is programmed for a memoryto-I/O operation (read transfer), and various options are selected by the Command register and the other Mode register bits. The channels mask bit is cleared to enable recognition of a DMA request (DREQ). The DREQ can either be a hardware signal or a software command. Once initiated, the block DMA transfer will proceed as the controller outputs the data address, simultaneous MEMR and IOW pulses, and selects an I/O device via the DMA acknowledge (DACK) outputs. The data byte flows directly from the RAM to the I/O device. After each byte is transferred, the address is automatically incremented (or decremented) and the word count is decremented. The operation is then repeated for the next byte. The controller stops transferring data when the Word Count register underflows, or an external EOP is applied. To further understand 82C37A operation, the states generated by each clock cycle must be considered. The DMA controller operates in two major cycles, active and idle. After being programmed, the controller is normally idle until a DMA request occurs on an unmasked channel, or a software request is given. The 82C37A will then request control of the system busses and enter the active cycle. The active cycle is composed of several internal states, depending on what options have been selected and what type of operation has been requested. The 82C37A can assume seven separate states, each composed of one full clock period. State I (SI) is the idle state. It is entered when the 82C37A has no valid DMA requests pending, at the end of a transfer sequence, or when a Reset or Master Clear has occurred. While in SI, the DMA controller is inactive but may be in the Program Condition (being programmed by the processor). State 0 (S0) is the first state of a DMA service. The 82C37A has requested a hold but the processor has not yet returned an acknowledge. The 82C37A may still be programmed until it Version 2 EE IIT, Kharagpur 11
has received HLDA from the CPU. An acknowledge from the CPU will signal the DMA transfer may begin. S1, S2, S3, and S4 are the working state of the DMA service. If more time is needed to complete a transfer than is available with normal timing, wait states (SW) can be inserted between S3 and S4 in normal transfers by the use of the Ready line on the 82C37A. For compressed transfers, wait states can be inserted between S2 and S4. Note that the data is transferred directly from the I/O device to memory (or vice versa) with IOR and MEMW (or MEMR and IOW) being active at the same time. The data is not read into or driven out of the 82C37A in I/O-to-memory or memory-to-I/O DMA transfers. Memory-to-memory transfers require a read-from and a write-to memory to complete each transfer. The states, which resemble the normal working states, use two-digit numbers for identification. Eight states are required for a single transfer. The first four states (S11, S12, S13, S14) are used for the read-from-memory half and the last four state (S21, S22, S23, S24) for the write-to-memory half of the transfer.
16(IV)
Conclusion
This lesson has given an overview of DMA controller. The controllers are normally used in highperformance embedded systems where large bulks of data need to transferred from the input to the memory. One such system is a on-board Digital Signal Processor in a mobile telephone. Besides fast digital coding and decoding at times this processor is required to process the voice signals to improve the quality. This has to take place in real time. While the voice message is streaming in through the AD-converter it need to be transferred and windowed for filtering. DMA offers a great help here. For simpler systems DMA is not normally used. The signals and functional architecture of a very familiar DMA controller(8237) used in personal computers has been discussed. For more detailed discussions the readers are requested to visit www.intel.com or any other manufactures and read the datasheet.
16(V)
Questions and Answers
Q.1. Can you use 82C37A in embedded systems? Justify your answers Ans: Only high performance systems where the power supply constraints are not stringent. The supply voltage is 5V and the current may reach up to 16 mA resulting in 80 mW of power consumption. Q.2 Highlight on different modes of DMA data transfer. Which mode consumes the list power and which mode is the fastest? Ans: Refer to text Q.3. Draw the architecture of 8237 and explain the various parts. Ans: Refer to text
Module 3
Lesson 17
USB and IrDA
After going through this lesson the student would be able to learn basics of The Universal Serial Bus Signals The IrDA standard
Pre-Requisite
17(I)
The USB Port
As personal computers and other microprocessor based embedded systems began handling photographic images, audio, video and other bulky data, the traditional communications buses are not enough to carry the data as fast as it is desired. So a group of leading computer and telecom firms including IBM, Intel, Microsoft, Compaq, Digital Equipment, NEC and Northern Telecom got together and developed USB. The USB is a medium-speed serial data bus designed to carry relatively large amounts of data over relatively short cables: up to about five meters long. It can support data rates of up to 12Mb/s (megabits per second). The USB is an addressable bus system, with a seven-bit address code so it can support up to 127 different devices or nodes at once (the all zeroes code is not a valid address). However it can have only one host. The host with its peripherals connected via the USB forms a star network. On the other hand any device connected to the USB can have a number of other nodes connected to it in daisy-chain fashion, so it can also form the hub for a mini-star sub-network. Similarly you can have a device which purely functions as a hub for other node devices, with no separate function of its own. This expansion via hubs is because the USB supports a tiered star topology, as shown in Fig.17.1. Each USB hub acts as a kind of traffic cop for its part of the network, routing data from the host to its correct address and preventing bus contention clashes between devices trying to send data at the same time. On a USB hub device, the single port used to connect to the host PC either directly or via another hub is known as the upstream port, while the ports used for connecting other devices to the USB are known as the downstream ports. This is illustrated in Fig.17.2. USB hubs work transparently as far as the host PC and its operating system are concerned. Most hubs provide either four or seven downstream ports, or less if they already include a USB device of their own. Another important feature of the USB is that it is designed to allow hot swapping i.e. devices can be plugged into and unplugged from the bus without having to turn the power off and on again, re-boot the PC or even manually start a driver program. A new device can simply be connected to the USB, and the PCs operating system should recognize it and automatically set up the necessary driver to service it.
USB Host (PC)
USB device (Modem) + Hub
Phone line
USB Hub Fig. 17.1 The USB is a medium speed serial bus used to transfer data between a PC and its peripherals. It uses a tiered star configuration, with expansion via hubs (either separate, or in USB devices).
PC
USB MINI HUB

Port 1 Port 2 Port 3 Port 4
Upstream port (from PC)
Downstream ports (to more devices)
Fig. 17.2 The port on a USB device or hub which connects to the PC host (either directly or via another hub) is known as the upstream port, while hub ports which connect to additional USB devices are downstream ports. Downstream ports use Type A sockets, while upstream ports use Type B sockets.
Power and data

USB cables consist of two twisted pairs of wires, one pair used to carry the bidirectional serial data and the other pair for 5V DC power. This makes it possible for low-powered peripherals such as a mouse, joystick or modem to be powered directly from the USB or strictly from the host (or the nearest hub) upstream, via the USB. Most modern PCs have two USB ports, and each can provide up to 500mA of 5V DC power for bus powered peripherals Individual peripheral devices (including hubs) can draw a maximum of 100mA from their upstream USB Version 2 EE IIT, Kharagpur 4
port, so if they require less than this figure for operation they can be bus powered. If they need more, they have to use their own power supply such as a plug-pack adaptor. Hubs should be able to supply up to 500mA at 5V from each downstream port, if they are not bus powered. Serial data is sent along the USB in differential or push-pull mode, with opposite polarities on the two signal lines. This improves the signal-to-noise ratio (SNR), by doubling the effective signal amplitude and also allowing the cancellation of any common-mode noise induced into the cable. The data is sent in non-return-to-zero (NRTZ) format, with signal levels of 3.3V peak (i.e., 6V peak differential). USB cables use two different types of connectors: Type-A plugs for the upstream end, and Type B plugs for the downstream end. Hence the USB ports of PCs are provided with matching Type-A sockets, as are the downstream ports of hubs, while the upstream ports of USB devices (including hubs) have Type B sockets. Type-A plugs and sockets are flat in shape and have the four connections in line, while Type B plugs and sockets are much squarer in shape and have two connections on either side of the centre spigot (Fig.17.3). Both types of connector are polarized so they cannot be inserted the wrong way around. Fig.17.3 shows the pin connections for both type of connector, with sockets shown and viewed from the front. Note that although USB cables having a Type-A plug at each end are available, they should never be used to connect two PCs together, via their USB ports. This is because a USB network can only have one host, and both would try to claim that role. In any case, the cable would also short their 5V power rails together, which could cause a damaging current to flow. USB is not designed for direct data transfer between PCs. All normal USB connections should be made using cables with a Type A plug at one end and a Type B plug at the other, although extension cables with a Type A plug at one end and a Type A socket at the other can also be used, providing the total extended length of a cable doesnt exceed 5m. By the way, USB cables are usually easy to identify as the plugs have a distinctive symbol molded into them (Fig.17.4).
Data formats (Fig.17.5)

USB data transfer is essentially in the form of packets of data, sent back and forth between the host and peripheral devices. However because USB is designed to handle many different types of data, it can use four different data formats as appropriate. One of the two main formats is bulk asynchronous mode, which is used for transferring data that is not time critical. The packets can be interleaved on the USB with others being sent to or from other devices. The other main format is isochronous mode, used to transfer data that is time critical such as audio data to digital speakers, or to/from a modem. These packets must not be delayed by those from other devices. The two other data formats are interrupt format, used by devices to request servicing from the PC/host, and control format, used by the PC/host to send token packets to control bus operation, and by all devices to send handshake packets to indicate whether the data they have just received was OK (ACK) or had errors (NAK). Some of the data formats are illustrated in Fig.17.5. Note that all data packets begin with a sync byte (01hex), used to synchronize the PLL (phase-locked loop) in the receiving devices USB controller. This is followed by the packet identifier (PID), containing a four-bit nibble (sent in both normal and inverted form) which indicates the type of data and the direction it is going in (i.e., to or from the host). Token packets then have the 7-bit address of the destination device and a 4-bit end point field to indicate which of that devices registers its to be sent to. On the other hand data packets have a data field of up to 1023 bytes of data following the PID field, while Start of Frame (SOF) packets have an 11-bit frame identifier instead and handshake packets have no other field. Most packets end with a cyclic redundancy check (CRC) field of either five or 16 bits, for error checking, except handshake packets which rely on the redundancy in the PID field. All USB data is sent serially, of course, and leastVersion 2 EE IIT, Kharagpur 5
significant-bit (LSB) first. Luckily all of the fine details of USB handshaking and data transfer are looked after by the driver software in the host and the firmware built into the USB controller inside each USB peripheral device and hub
2 1 2 3 4 3 1 4
Type A socket (from front)
Pin connections Pin No. Signal 1 + 5V Power 2 - Data 3 + Data 4 Ground Fig. 17.3 Pin connections for the two different types of USB socket, as viewed from the front.
Fig. 17.4 Most USB plugs have this distinctive marking symbol.
SYNC 00000001
PID xxxx,xxxx
Device Address End Point xxxxxxx xxxx
CRC xxxxx
Token packets
SYNC 00000001
PID xxxx,xxxx
Data (0-1023 bytes)
CRC xxxxx
Data packets
SYNC 00000001
PID xxxx,xxxx
Handshake packets Packet Identifier Nibble Codes: OUTPUT = 0001 INPUT SET UP DATA0 DATA1 ACK NAK STALL = 1001 = 1101 = 0011 = 1011 = 0010 = 1010 = 1110 Hankshake Data Tokens
Fig. 17.5 Examples of the various kinds of USB signaling and data packets.
17(II)
IrDA Standard
IrDA is the abbreviation for the Infrared Data Association, a nonprofit organization for setting standards in IR serial computer connections. The transmission in an IrDAcompatible mode (sometimes called SIR for serial IR) uses, in the simplest case, the RS232 port, a builtin standard of all compatible PCs. With a simple interface, Version 2 EE IIT, Kharagpur 7
shortening the bit length to a maximum of 3/16 of its original length for powersaving requirements, an infrared emitting diode is driven to transmit an optical signal to the receiver. This type of transmission covers the data range up to115.2 kbit/s which is the maximum data rate supported by standard UARTs (Fig.17.7). The minimum demand for transmission speed for IrDA is only 9600 bit/s. All transmissions must be started at this frequency to enable compatibility. Higher speeds are a matter of negotiation of the ports after establishing the links. IR output Pulse shaping UART 16550/RS232 TOIM3000 or TOIM3232 Pulse recovery Transmitter 4000 series transceiver Receiver IR input
Fig. 17.7 One end of the over all serial link. Please browse www.irda.org for details
Serial Port Infrared Receiver

78L05
IR RXR MODULE TSOP1838
1 6
9 5 MAX 232
Fig. 17.8(a) A simple circuit for Infrared interface to RS232 port. 7805- is a voltage regulator which supplies 5V to the MAX232 the Level converter. It converts the signal which is at 5V and Ground to 12V compatible with RS232 standard.
3 Input Control Circuit 1 PIN AGC Band Pass Demodulator 2
VS
OUT
GND
Fig. 17.8(b) The TSOP Receiver
Question
Q.1. From the internet find out a microcontroller with in-built USB port and draw its architecture Ans:
DP0 DM0 XTAL1
CLOCK GENERATOR
DP2 USB HUB REPEATER DM2 DP3 DM3 GPIO PA[0:7] PD[0:6] AVR ADC ADC[0:11] TIMER/ COUNTER
XTAL2 LFT CPUSEL
VCC[1,2,A] VSS[1,2,A]
RSTN TEST
ROM AND SRAM
VOLTAGE REGULATORS
V33[1,2,A]
The architecture of a typical microcontroller from Atmel with an on-chip USB controller Q.2 Draw the circuit diagram for interfacing an IrDA receiver with a typical microcontroller Version 2 EE IIT, Kharagpur 9
Ans: 330 *) 3 TSOP18.. 1 >10 K recomm. +5V**)
2 GND A typical application circuit The Receiver Interface to a Microcontroller
Further Reference
1. www.usb.org 2. www.irda.org
Module 3
Lesson 18
AD and DA Converters
After going through this lesson the student would be able to Learn about Real Time Signal Processing Sampling Theorem DA Conversion Different Methods of AD Conversions o Successive Approximation o Flash o Sigma Delta
Pre-Requisite
18
Introduction
The real time embedded controller is expected to process the real world signals within a specified time. Most of the real world signals are analog in nature. Take the examples of your mobile phone. The overall architecture is shown on Fig.18.1. The Digital Signal Processor (DSP) is fed with the analog data from the microphone. It also receives the digital signals after demodulation from the RF receiver and generates the filtered and noise free analog signal through the speaker. All the processing is done in real time. The processing of signals in real time is termed as Real Time Signal Processing which has been coined beautifully in the Signal Processing industry.
RF receiver (Rx) DSP Antenna RF transmitter (Tx)
Speaker Microphone Display Microcontroller Keyboard
Fig. 18.1 The block diagram The detailed steps of such a processing task is outlined in Fig.18.2 Version 2 EE IIT, Kharagpur 3
Signal Processing Analog Processing Analog Processing Measurand Sensor Conditioner Analog Processor LPF ADC
Digital Processing DSP DAC Analog Processor LPF
Fig. 18.2 Real Time Processing of Analog Signals Measurand is the quantity which is measured. In this case it is the analog speech signal. The sensor is a microphone. In case of your mobile set it is the microphone which is embedded in it. The conditioner can be a preamplifier or a demodulator. The Analog Processor mostly is a Low Pass Filter (LPF). This is primarily used to prevent aliasing which is a term to be explained later in this chapter. The following is the Analog to Digital Converter which has a number of stages to convert an analog signal into digital form. The Digital Signal Processing is carried out by a system with a processor. Further the processed signal is converted into analog signal by the Digital to Analog Converter which finally sends the output to the real world through another Low Pass Filter. The functional layout of the ADC and DAC is depicted in Fig.18.3
ADC
x(t) Sampler p(t) xs(t) x(t) Quantizer xq(t) xq(n) b bits Coder [xb(n)]
DAC
b bits [yb(n)] Decoder Sample/hold y(n)
Fig. 18.3 The functional layout of the ADC and DAC
The DA Converter
In theory, the simplest method for digital-to-analog conversion is to pull the samples from memory and convert them into an impulse train. 3 2 Amplitude 1 0 -1 -2 -3 0 1 2 Time Fig. 18.4(a) The analog equivalent of digital words 3 2 Amplitude 1 0 -1 -2 -3 0 4 3 5 Time Fig. 18.4(b) The analog voltage after zero-order hold 1 2 c. Zeroth-order hold 3 4 5 a. Impulse train
3 2 Amplitude 1 0 -1 -2 -3 0 1 2 3 4 5 Time Fig. 18.4(c) The reconstructed analog signal after filtering A digital word (8-bits or 16-bits) can be converted to its analog equivalent by weighted averaging. Fig. 18.5(a) shows the weighted averaging method for a 3-bit converter. A switch connects an input either to a common voltage V or to a common ground. Only switches currently connected to the voltage source contribute current to the non-inverting input summing node. The output voltage is given by the expression drawn below the circuit diagram; SX = 1 if switch X connects to V, SX = 0 if it connects to ground. There are eight possible combinations of connections for the three switches, and these are indicated in the columns of the table to the right of the diagram. Each combination is associated with a decimal integer as shown. The inputs are weighted in a 4:2:1 relationship, so that the sequence of values for 4S3 + 2S2 + S1 form a binary-coded decimal number representation. The magnitude of Vo varies in units (steps) of (Rf/4R)V from 0 to 7. This circuit provides a simplified Digital to Analog Converter (DAC). The digital input controls the switches, and the amplifier provides the analog output. f. Reconstructed analog signal
V R S3 Rf S2 2R + V0 0 1 2 3 4
S3 0 0 0 0 1 1
S2 0 0 1 1 0 0
S1 0 1 0 1 0 1 0 1
S1
4R
1 6 1 V0 = -R f S3 V + S2 V + S1 V R 2R 4R 1 7 1 -R = f V(4S3 + 2S2 + S1) 4R Fig. 18.5(a) The binary weighted register method
V S3 2R
Rf
2R
+ V0
S2
2R
S1
2R
V0 = -R f S3 V 1 + S2 V 1 + S1 V 1 3R 2 3R 4 3R 8 = - R f V (4S3 + 2S2 + S1) 24R
V 1 R 1(S3) = 3R 2 S3 1(S2) = V 1 S2 3R 4 2R 1(S1) = V 1 S1 3R 8
Fig. 18.5(b) R-2R ladder D-A conversion circuit

Fig. 18.5(b) depicts the R-2R ladder network. The disadvantage of the binary weighted register is the availability and manufacturing of exact values of the resistances. Here also the output is proportional to the binary-coded decimal number. The output of the above circuits as given in Fig. 18.5(a) and 18.5(b) is equivalent analog values as shown in Fig. 18.4(a). However to reconstruct the original signal this is further passed through a zero order hold (ZOH) circuit followed by a filter (Fig.18.2). The reconstructed waveforms are shown in Fig. 18.4(b) and 18.4(c). Version 2 EE IIT, Kharagpur 7
The AD Converter
The ADC consists of a sampler, quantizer and a coder. Each of them is explained below.
Sampler
The sampler in the simplest form is a semiconductor switch as shown below. It is followed by a hold circuit which a capacitor with a very low leakage path.
Analog signal
Semiconductor Switch Capacitor
Sampled signal
Control signal
1 0.8 0.6 0.4 0.2 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
Fig. 18.6 The Sample and Hold Circuit
2
Amplitude
Analog Signal
1.5 1 0.5 0 0.5 2 2.5 3 time(ms) Sampled Signal after the capacitor 1 1.5 3.5 4
-0.5 0 2
Amplitude
1.5 1 0.5 0 0.5 1 1.5 2 time(ms) 2.5 3 3.5 4
-0.5 0
Fig. 18.7 Sample and Hold Signals
Quantizer
The hold circuit tries to maintain a constant voltage till the next switching. The quantizer is responsible to convert this voltage to a binary number. The number of bits in a binary number decides the approximation and accuracy. The sample hand hold output can assume any real number in a given range. However because of finite number of bits (say N) the levels possible in the digital domain 0 to 2N-1 which corresponds to a voltage range of 0 to V volts
3.025 b. Sampled analog signal

Amplitude (in volts)
3.020 3.015 3.010 3.005 3.000 0 5 10 15 20 25 30 Time 35 40 45 50
Fig. 18.8(a) Hold Circuit Output

3025 c. Digitized signal 3020

Digital number
3015 3010 3005 3000 0 5 10 15 20 25 30 35 Sample number 40 45 50
Fig. 18.8(b) The Quantized Value
Coder
This is an optional device which is used after the conversion is complete. In microprocessor based systems the Coder is responsible for packing several samples and transmitting them onwards either in synchronous or in asynchronous manner. For example in TI DSK kits you will find the AD converters with CODECs are interfaced to McBSP ports (short form of Multichannel Buffered Serial Ports). Several 16-bit sampled values are packed into a frame and transmitted to the processor or to the memory by Direct Memory Access (DMA). The Coder is responsible for controlling the ADC and transferring the Data quickly for processing. Sometimes the Codec is responsible for compressing several samples together and transmitting them. In your desktop computers you will find audio interfaces which can digitize and record your voice and store them in .wav format. Basically this AD conversion followed by coding. The wav format is the Pulse-Code-Modulated (PCM) format of the original digital voice samples.
The Sampling Theorem

The definition of proper sampling is quite simple. Suppose you sample a continuous signal in some manner. If you can exactly reconstruct the analog signal from the samples, you must have done the sampling properly. Even if the sampled data appears confusing or incomplete, the key information has been captured if you can reverse the process. Fig.18.9 shows several sinusoids before and after digitization. The continuous line represents the analog signal entering the ADC, while the square markers are the digital signal leaving the ADC. In (a), the analog signal is a constant DC value, a cosine wave of zero frequency. Since the analog signal is a series of straight lines between each of the samples, all of the information needed to reconstruct the analog signal is contained in the digital data. According to our definition, this is proper sampling. The sine wave shown in (b) has a frequency of 0.09 of the sampling rate. This might represent, for example, a 90cycle/second sine wave being sampled at1000 samples/second. Expressed in Version 2 EE IIT, Kharagpur 10
another way, there are 11.1 samples taken over each complete cycle of the sinusoid. This situation is more complicated than the previous case, because the analog signal cannot be reconstructed by simply drawing straight lines between the data points. Do these samples properly represent the analog signal? The answer is yes, because no other sinusoid, or combination of sinusoids, will produce this pattern of samples (within the reasonable constraints listed below). These samples correspond to only one analog signal, and therefore the analog signal can be exactly reconstructed. Again, an instance of proper sampling. In (c), the situation is made more difficult by increasing the sine wave's frequency to 0.31 of the sampling rate. This results in only 3.2 samples per sine wave cycle. Here the samples are so sparse that they don't even appear to follow the general trend of the analog signal. Do these samples properly represent the analog waveform? Again, the answer is yes, and for exactly the same reason. The samples are a unique representation of the analog signal. All of the information needed to reconstruct the continuous waveform is contained in the digital data. Obviously, it must be more sophisticated than just drawing straight lines between the data points. As strange as it seems, this is proper sampling according to our definition. In (d), the analog frequency is pushed even higher to 0.95 of the sampling rate, with a mere 1.05 samples per sine wave cycle. Do these samples properly represent the data? No, they don't! The samples represent a different sine wave from the one contained in the analog signal. In particular, the original sine wave of 0.95 frequency misrepresents itself as a sine wave of 0.05 frequency in the digital signal. This phenomenon of sinusoids changing frequency during sampling is called aliasing. Just as a criminal might take on an assumed name or identity (an alias), the sinusoid assumes another frequency that is not its own. Since the digital data is no longer uniquely related to a particular analog signal, an unambiguous reconstruction is impossible. There is nothing in the sampled data to suggest that the original analog signal had a frequency of 0.95 rather than 0.05. The sine wave has hidden its true identity completely; the perfect crime has been committed! According to our definition, this is an example of improper sampling. This line of reasoning leads to a milestone in DSP, the sampling theorem. Frequently this is called the Shannon sampling theorem, or the Nyquist Sampling theorem, after the authors of 1940s papers on the topic. The sampling theorem indicates that a continuous signal can be properly sampled, only if it does not contain frequency components above one-half of the sampling rate. For instance, a sampling rate of 2,000 samples/second requires the analog signal to be composed of frequencies below 1000 cycles/second. If frequencies above this limit are present in the signal, they will be aliased to frequencies between 0 and 1000 cycles/second, combining with whatever information that was legitimately there.
3 a. Analog frequency = 0.0 (i.e., DC) 2 Amplitude Amplitude Time (or sample number) 1 0 -1
3
b. Analog frequency = 0.09 of sampling rate
2 1 0 -1
-2 -3
-2 -3 Time (or sample number)
3
c. Analog frequency = 0.31 of sampling rate
3
d. Analog frequency = 0.95 of sampling rate
2 Amplitude Amplitude Time (or sample number) 1 0 -1
2 1 0 -1
-2 -3
-2 -3 Time (or sample number)
Fig. 18.9 Sampling a sine wave at different frequencies
Methods of AD Conversion
The analog voltage samples are converted to digital equivalent at the quantizer. There are various ways to convert the analog values to the nearest finite length digital word. Some of these methods are explained below.
Successive Approximation ADC

V0 R2 R1 0 V1 V2 R3 0 V2 V4 R4 4 2 V3 + V1
OG1 OG2
V DAC V+ 3 7 + V+
OG2 OG1
V+
6 5
V3
V+ V4
V- V7 0
0 U1 V+ uA741
5 ADC 6 1 U2 uA741 G
2 - V4 V8
0 0
Fig. 18.10 The Counter Converter

The AD conversion is indirectly carried out through DA conversion. The 3-bit input as shown in Fig.18.10 to the DA converter may change sequentially from 000 to 111 by a 3-bit counter. The unknown voltage (V8) is applied to one input of the comparator. When the DA output exceeds the unknown voltage the comparator output which was negative becomes positive. This can be used to latch the counter value which is approximately equivalent digital value of the unknown voltage. The draw back of sequential counting is the time taken to reach the highest count is large. For instance an 8-bit converter has to count 256 for converting the maximum input. It therefore has to consume 256 clock cycles which is large. Therefore, a different method called successive approximation is used for counting as shown in Fig.18.11.
100
110
010
111
101
011 000
001
Fig. 18.11 The successive approximation counting

Consider a three-bit conversion for simplicity. The counting ADC must allow for up to eight comparisons (including zero). The search tree for an SAR search is illustrated in Fig.18.11. To start a conversion cycle a three-bit digital register is first cleared, and then loaded with the triplet 100. The register state provides the input to a DAC, and that provides a reference output. This output is compared to the analog signal to be converted, and a decision is made whether the analog signal is greater than or less than the reference signal. This comparison is essentially the same as that made for the previous ADC, except that because of the use of the half-way code the result of this single comparison is used to eliminate concurrently half the possible DAC steps. As the tree suggests, if the analog signal is greater then all the smaller DAC outputs are eliminated from consideration. Digital logic associated with the comparison then either clears the MSB (Most Significant Bit) to 0 or simply leaves it unchanged. In either case the next bit is set to 1, i.e., to the mid-code of the selected half, and a new comparison made. Again half the remaining DAC states are eliminated from consideration. Depending on the result of the comparison the second bit is cleared to 0, or it is left unchanged at 1. In either case the third bit is set to 1 and the comparison step repeated. Each time a comparison is made half the remaining DAC output states will be eliminated. Instead of having to step through 2N states for an N bit conversion only N comparisons are needed. The SAR ADC is perhaps the most common of the converters, providing a relatively rapid and relatively inexpensive conversion
Flash Converter
Making all the comparisons between the digital states and the analog signal concurrently makes for a fast conversion cycle. A resistive voltage divider (see figure) can provide all the digital reference states required. There are eight reference values (including zero) for the three-bit converter illustrated. Note that the voltage reference states are offset so that they are midway between reference step values. The analog signal is compared concurrently with each reference state; therefore a separate comparator is required for each comparison. Digital logic then combines the several comparator outputs to determine the appropriate binary code to present.
V0
3R 2 111
Analog input
6.5 Vo 8 5.5 Vo 8 4.5 Vo 8 3.5 Vo 8
110 MSB 22 21 100 LSB 20 011
R R
101
R
2.5 Vo 8 1.5 Vo 8 0.5 Vo 8
R R
R 2
010 001 3-bit flash converter
Fig. 18.12 Flash Converter
Sigma-Delta () AD converters
The analog side of a sigma-delta converter (a 1-bit ADC) is very simple. The digital side, which is what makes the sigma-delta ADC inexpensive to produce, is more complex. It performs filtering and decimation. The concepts of over-sampling, noise shaping, digital filtering, and decimation are used to make a sigma-delta ADC.
Over-sampling
First, consider the frequency-domain transfer function of a traditional multi-bit ADC with a sinewave input signal. This input is sampled at a frequency Fs. According to Nyquist theory, Fs must be at least twice the bandwidth of the input signal. When observing the result of an FFT analysis on the digital output, we see a single tone and lots of random noise extending from DC to Fs/2 (Fig.18.13). Known as quantization noise, this effect results from the following consideration: the ADC input is a continuous signal with an infinite number of possible states, but the digital output is a discrete function, whose number of different states is determined by the converter's
resolution. So, the conversion from analog to digital loses some information and introduces some distortion into the signal. The magnitude of this error is random, with values up to LSB.
Fig. 18.13 FFT diagram of a multi-bit ADC with a sampling frequency FS

If we divide the fundamental amplitude by the RMS sum of all the frequencies representing noise, we obtain the signal to noise ratio (SNR). For an N-bit ADC, SNR = 6.02N + 1.76dB. To improve the SNR in a conventional ADC (and consequently the accuracy of signal reproduction) you must increase the number of bits. Consider again the above example, but with a sampling frequency increased by the oversampling ratio k, to kFs (Fig.18.14). An FFT analysis shows that the noise floor has dropped. SNR is the same as before, but the noise energy has been spread over a wider frequency range. Sigma-delta converters exploit this effect by following the 1-bit ADC with a digital filter (Fig.18.14). The RMS noise is less, because most of the noise passes through the digital filter. This action enables sigma-delta converters to achieve wide dynamic range from a low-resolution ADC.
Fig. 18.14 FFT diagram of a multi-bit ADC with a sampling frequency kFS and effect of Digital Filter on Noise Bandwidth
Noise Shaping
It includes a difference amplifier, an integrator, and a comparator with feedback loop that contains a 1-bit DAC. (This DAC is simply a switch that connects the negative input of the difference amplifier to a positive or a negative reference voltage.) The purpose of the feedback DAC is to maintain the average output of the integrator near the comparator's reference level. The density of "ones" at the modulator output is proportional to the input signal. For an increasing input the comparator generates a greater number of "ones," and vice versa for a decreasing input. By summing the error voltage, the integrator acts as a lowpass filter to the input signal and a highpass filter to the quantization noise. Thus, most of the quantization noise is pushed into higher frequencies. Oversampling has changed not the total noise power, but its distribution. If we apply a digital filter to the noise-shaped delta-sigma modulator, it removes more noise than does simple oversampling.(Fig.18.16).
Signal Input, X1
X2
X3 Integrator
Difference Amp X5
X4
To Digital Filter
Comparator (1-bit ADC)
(1-bit ADC) Fig. 18.15 Block Diagram of 1-bit Sigma Delta Converter
Fig. 18.16 The Effect of Integrator and Digital Filter on the Spectrum
1-bit Data Stream Analog Input Delta Sigma Modulator Digital Low Pass Filter
Multi-bit Data Decimation Filter
Output Data
Fig. 18.17 The Digital Side of the Sigma-Delta modulator
Digital Filtering
The output of the sigma-delta modulator is a 1-bit data stream at the sampling rate, which can be in the megahertz range. The purpose of the digital-and-decimation filter (Fig.18.17) is to extract information from this data stream and reduce the data rate to a more useful value. In a sigmadelta ADC, the digital filter averages the 1-bit data stream, improves the ADC resolution, and removes quantization noise that is outside the band of interest. It determines the signal bandwidth, settling time, and stop band rejection.
Conclusion
In this chapter you have learnt about the basics of Real Time Signal Processing, DA and AD conversion methods. Some microcontrollers are already equipped with DA and AD converters on the same chip. Generally the real world signals are broad band. For instance a triangular wave though periodic will have frequencies ranging till infinite. Therefore anti-aliasing filter is always desirable before AD conversion. This limits the signal bandwidth and hence finite sampling frequency. The question answer session shall discuss about the quantization error, specifications of the AD and DA converters and errors at the various stages of real time signal processing. The details of interfacing shall be discussed in the next lesson. The AD and DA converter fall under mixed VLSI circuits. The digital and analog circuits coexist on the same chip. This poses design difficulties for VLSI engineers for embedding fast and high resolution AD converters along with the processors. Sigma-Delta ADCs are most complex and hence rarely found embedded on microcontrollers.
Question Answers
Q1. What are the errors at different stages in a Real Time Signal Processing system? Elaborate on the quantization error.
Ans: Refer to text

Q2. What are the difference specifications of a D-A converter?
Ans: No. of bits (8-bits, 16-bits etc), Settling Time, Power Supply range, Power Consumption, Various Temperature ratings, Packaging
Q3. What are the various specifications of an A-D converter?
Ans: No. of bits (8-bits, 16-bits etc), No. of channels, Conversion Time, Power Supply range, Power Consumption, Various Temperature ratings, Packaging
Q4. How to construct a second order Delta-Sigma AD Converter.
Ans: Refer to text and Fig.18.15

Q5. What method you will adopt to digitize a slowly varying temperature signal without using AD converter?
Ans: Instead of AD Converters use Voltage to Frequency Converters followed by a counter
Module 3
Lesson 19
Analog Interfacing
After going through this lesson the student would be able to Know the interfacing of analog signals to microcontrollers/microprocessors Generating Analog Signals Designing AD and DA interfaces Various Methods of acquiring and generating analog data
Pre-Requisite
19(I)
Introduction
Fig.19.1 shows a typical sensor network. You will find a number of sensors and actuators connected to a common bus to share information and derive a collective decision. This is a complex embedded system. Digital camera falls under such a system. Only the analog signals are shown here. Last lesson discussed in detail about the AD and DA conversion methods. This chapter shall discuss the inbuilt AD-DA converter and standalone converters and their interfacing.
Fig. 19.1 The Analog Interfacing Network
Fig. 19.2 The Analog-Digital-Analog signal path with real time processing
Different Stages of Fig.19.2

Stage-1 Signal Amplification and Conditioning; Stage-2 Anti-aliasing Filter; Stage-3 Sample and Hold; Stage-4 Analog to Digital Converter; Stage-5 Digital Processing and Data manipulation in a Processor; Stage-6 Processed Digital Values are temporarily stored in a latch before D-A conversion; Stage-7 Digital to Analog Conversion; Stage-8 Removal of Glitches and Spikes; Stage-8 Final Low pass filtering
19(II)
Embedded AD Converters in Intel 80196
Fig.19.3 shows the block diagram of the AD converter inbuilt to 80196 embedded processor. The details of the subsystems are given as follows:
Analog Inputs VREF ANGND Control Logic Status AD_COMMAND AD_TIME AD_TEST EPA or PTS Command
Analog Mux Sample and Hold
Successive Approximation A/D Converter AD_RESULT
Multiplexed with port inputs
Fig. 19.3 The block diagram of the Internal AD converter Analog Inputs: There are 12 input channels which are multiplexed with the Port P0 and Port P1 of the processor.
ANGND: It is the analog ground which is separately connected to the circuit from where analog voltage is brought inside the processor. Vref: It is reference voltage which decides the range of the input voltage. By making it negative bipolar inputs can be used.
EPA: Event Processor Array

Control applications often require high-speed event control. For example, the controller may need to periodically generate pulse-width modulated outputs or an interrupt. In another application, the controller may monitor an input signal to determine the status of an external device. The event processor array (EPA) was designed to reduce the CPU overhead associated with these types of event control. This chapter describes the EPA and its timers and explains how to configure and program them. The EPA can control AD converter such as generating timing pulses, start conversion signals etc.
PTS: Peripheral Transaction Server

The microcontrollers interrupt-handling system has two components: the programmable interrupt controller and the peripheral transaction server (PTS). The programmable interrupt controller has a hardware priority scheme that can be modified by the software. Interrupts that go through the interrupt controller are serviced by interrupt service routines that you provide. The upper and lower interrupt vectors in special-purpose memory contain the interrupt service routines addresses. The peripheral transaction server (PTS), a microcoded hardware interrupt processor, provides high-speed, low-overhead interrupt handling; it does not modify the stack or the Processor Status Word. The PTS supports seven microcoded routines that enable it to complete specific tasks in lesser time than an equivalent interrupt service routine can. It can transfer bytes or words, either individually or in blocks, between any memory locations; manage multiple analog-to-digital (A/D) conversions; and transmit and receive serial data in either asynchronous or synchronous mode.
Analog Mux: Analog Multiplexer

It selects a particular analog channel for conversion. Only after completing conversion of one channel it switches to subsequent channels.
The associated Registers

AD_COMMAND register This register selects the A/D channel, controls whether the A/D conversion starts immediately or is triggered by the EPA, and selects the operating mode.
AD_RESULT
For an A/D conversion, the high byte contains the eight MSBs from the conversion, while the low byte contains the two LSBs from a 10- bit conversion (undefined for an 8-bit conversion), indicates which A/D channel was used, and indicates whether the channel is idle. For a Version 2 EE IIT, Kharagpur 5
threshold-detection, calculate the value for the successive approximation register and write that value to the high byte of AD_RESULT. Clear the low byte or leave it in its default state. AD_TEST A/D Conversion Test This register specifies adjustments for zero-offset errors. AD_TIME A/D Conversion Time This register defines the sample window time and the conversion time for each bit. INT_MASK Interrupt Mask The AD bit in this register enables or disables the A/D interrupt. Set the AD bit to enable the interrupt request. INT_PEND Interrupt Pending The AD bit in this register, when set, indicates that an A/D interrupt request is pending.
A/D Converter Operation

An A/D conversion converts an analog input voltage to a digital value, stores the result in the AD_RESULT register, and sets the A/D interrupt pending bit. An 8-bit conversion provides 20 mV resolution, while a 10-bit conversion provides 5 mV resolution. An 8-bit conversion takes less time than a 10-bit conversion because it has two fewer bits to resolve and the comparator requires less settling time for 20 mV resolution than for 5 mV resolution. Either the voltage on an analog input channel or a test voltage can be converted. Converting the test inputs is used to calculate the zero-offset error, and the zero-offset adjustment is used to compensate for it. This feature can reduce or eliminate off-chip compensation hardware. Typically, the test voltages are converted to adjust for the zero-offset error before performing conversions on an input channel. The AD_TEST register is used to program for zero-offset adjustment. A threshold-detection compares an input voltage to a programmed reference voltage and sets the A/D interrupt pending bit when the input voltage crosses over or under the reference voltage. A conversion can be started by a write to the AD_COMMAND register or it can be initiated by the EPA, which can provide equally spaced samples or synchronization with external events. Once the A/D converter receives the command to start a conversion, a delay time elapses before sampling begins. During this sample delay, the hardware clears the successive approximation register and selects the designated multiplexer channel. After the sample delay, the device connects the multiplexer output to the sample capacitor for the specified sample time. After this sample window closes, it disconnects the multiplexer output from the sample capacitor so that changes on the input pin will not alter the stored charge while the conversion is in progress. The device then zeros the comparator and begins the conversion. The A/D converter uses a successive approximation algorithm to perform the analog-to-digital conversion. The converter hardware consists of a 256-resistor ladder, a comparator, coupling capacitors, and a 10bit successive approximation register (SAR) with logic that guides the process. The resistive ladder provides 20 mV steps (VREF = 5.12 volts), while capacitive coupling creates 5 mV steps within the 20 mV ladder voltages. Therefore, 1024 internal reference voltage levels are available for comparison against the analog input to generate a 10-bit conversion result. In 8- bit conversion mode, only the resistive ladder is used, providing 256 internal reference voltage levels. The successive approximation conversion compares a sequence of reference voltages to Version 2 EE IIT, Kharagpur 6
the analog input, performing a binary search for the reference voltage that most closely matches the input. The full scale reference voltage is the first tested. This corresponds to a 10-bit result where the most-significant bit is zero and all other bits are ones (0111111111). If the analog input was less than the test voltage, bit 10 of the SAR is left at zero, and a new test voltage of full scale (0011111111) is tried. If the analog input was greater than the test voltage, bit 9 of SAR is set. Bit 8 is then cleared for the next test (0101111111). This binary search continues until 10 (or 8) tests have occurred, at which time the valid conversion result resides in the AD_RESULT register where it can be read by software. The result is equal to the ratio of the input voltage divided by the analog supply voltage. If the ratio is 1.00, the result will be all ones. The following A/D converter parameters are programmable: conversion input input channel zero-offset adjustment no adjustment, plus 2.5 mV, minus 2.5 mV, or minus 5.0 mV conversion times sample window time and conversion time for each bit operating mode 8- or 10-bit conversion or 8-bit high or low threshold detection conversion trigger immediate or EPA starts
19(III)
The External AD Converters (AD0809)

START CLOCK
8-BIT A/D CONTROL & TIMING 8 CHANNELS MULTIPLEXING ANALOG SWITCHES END OF CONVERSION (INTERRUPT)
8 ANALOG INPUTS
S.A.R COMPARATOR TRISTATE OUTPUT LATCH BUFFER
8-BIT OUTPUTS
SWITCH TREE 3-BIT ADDRESS ADDRESS LATCH ENABLE ADDRESS LATCH AND DECODER
256R REGISTOR LADDER VCC GND REF(+) REF(-) OUTPUT ANABLE
Fig. 19.4 The internal architecture of 0809 AD converter
IN3 IN4 IN5 IN6 IN7 START EOC 2

-5
1 2 3 4 5 6 7 8 9 10 11 12 13 14
28 27 26 25 24 23 22 21 20 19 18 17 16 15
IN2 IN1 IN0 ADD A ADD B ADD C ALE 2-1MSB 2-2 2-3 2-4 2-8LSB VREF (-) 2-6
OUTPUT ENABLE CLOCK VCC VREF (+) GND 2

-7
Fig. 19.5 The signals of 0809 AD converter
Functional Description Multiplexer

The device contains an 8-channel single-ended analog signal multiplexer. A particular input channel is selected by using the address decoder. Table 1 shows the input states for the address lines to select any channel. The address is latched into the decoder on the low-to-high transition of the address latch enable signal. TABLE 1 SELECTED ANALOG ADDRESS LINE CHANNEL C B IN0 L L IN1 L L IN2 L H IN3 L H IN4 H L IN5 H L IN6 H H IN7 H H
A L H L H L H L H
The Converter
This 8-bit converter is partitioned into 3 major sections: the 256R ladder network, the successive approximation register, and the comparator. The converters digital outputs are positive true. The Version 2 EE IIT, Kharagpur 8
256R ladder network approach (Figure 1) was chosen over the conventional R/2R ladder because of its inherent monotonicity, which guarantees no missing digital codes. Monotonicity is particularly important in closed loop feedback control systems. A non-monotonic relationship can cause oscillations that will be catastrophic for the system. Additionally, the 256R network does not cause load variations on the reference voltage.
CONTROLS FROM S.A.R. REF(+)
1 R
R 256 R R TO COMPARATOR INPUT
REF(-)
Fig. 19.6 The 256R ladder network The bottom resistor and the top resistor of the ladder network in Fig.19.6 are not the same value as the remainder of the network. The difference in these resistors causes the output characteristic to be symmetrical with the zero and full-scale points of the transfer curve. The first output transition occur when the analog signal has reached +12 LSB and succeeding output transitions occur every 1 LSB later up to full-scale. The successive approximation register (SAR) performs 8-iterations to approximate the input voltage. For any SAR type converter, n-iterations are required for an n-bit converter. Fig.19.7 shows a typical example of a 3-bit converter. The A/D converters successive approximation register (SAR) is reset on the positive edge of the start conversion (SC) pulse. The conversion is begun on the falling edge of the start conversion pulse. A conversion in process will be interrupted by receipt of a new start conversion pulse. Continuous conversion may be accomplished by tying the end-of-conversion (EOC) output to the SC input. If used in this mode, an external start conversion pulse should be applied after power up. End-of-conversion will go low between 0 and 8 clock pulses after the rising edge of start conversion. The most important section of the A/D converter is the comparator. It is this section which is responsible for the ultimate accuracy of the entire converter.
111 A/D OUTPUT CODE 110 101 100 011 010 001 IDEAL CURVE
A/D OUTPUT CODE
FULL-SCALE ERROR = 1/2 LSB
+1/2 LSB TOTAL UNADJUSTED 110 ERROR 101 111 100 011 010 001
INFINITE R PERFECT CO IDEAL 3-BIT CODE -1 LSB ABSOLUTE ACCURACY
NONLINEARITY = 1/2 LSB NONLINEARITY = -1/2 LSB
ZERO ERROR = -1/4 LSB 000 VIN 0/8 1/8 2/8 3/8 4/8 5/8 6/8 7/8 VIN AS FRACTION OF FULL-SCALE
-1/2 LSB QUANTIZATION ERROR 000 VIN 0/8 1/8 2/8 3/8 4/8 5/8 6/8 7/8 VIN AS FRACTION OF FULL-SCALE
Fig. 19.7 The 3-bit AD Converter Resolution
Interface to a typical Processor

Fig.19.8 shows the layout for interface to a processor with 16-address lines(AD0-AD15), read and write lines and 8-data lines (DB0-DB7). The address lines are divided into two groups. AD0AD2 are used to select the analog channel. The ALE signal of the ADC is used to latch the address on the lines A0-A2 for keeping a particular channel selected till the end of conversion. The other group (AD3-AD15) are decoded and combined with Read and Write signals to generate the START, ALE and OE (output enable) signals. A write operation starts the ADC. The EOC signal can be used to initiate an interrupt driven data transfer. The interrupt service subroutine can read the data through DB0-DB7 and initiate the next conversion by subsequent write operation. Fig.19.9 shows the timing diagram with system clock (not the ADC clock).
READ INTERRUPT 500 kHz ADDRESS DECODE (AD4 AD15)* 5.000V 0.000V CLK VREF (+) VREF (-) 2-1 START WRITE AD0 AD1 AD2 5V SUPPLY VCC GND GROUND 0-5V ANALOG INPUT RANGE In0 VIN 1 In7 VIN 8 ALE A B C 2
-2
0E E0C DB7 DB6 DB5 DB4 DB3 DB2 DB1 DB0 LSB INTERRUPT MSB
2-3 2-4 ADC0808 ADC0809 2-5 2 2

-6 -7
2-8
Fig. 19.8 Interface to a typical processor Version 2 EE IIT, Kharagpur 10
The timing Diagram (Fig.19.9)

The address latch enable signal and the start conversion are almost made high at the same time as per the connections in Fig.19.8. The analog input should be stable across the hold capacitor for the conversion time(tc). The digital outputs remain tri-stated till the output is enabled externally by the Output Enable(OE) signal. The comparator input changes by the SAR counter and switch tree through the ladder network till the output almost matches the voltage ate the selected analog input channel. Important Specifications 8- time-multiplexed analog channels Resolution 8 Bits Supply 5 VDC Average Power consumption 15 mW Conversion Time 100 s
19(IV)
The DA Converter DAC0808
The DAC0808 is an 8-bit monolithic digital-to-analog converter (DAC). Fig.19.9 shows the architecture and pin diagram of such a chip.
MSB A1 A2 LSB A8
A3
A4
A5
A6
A7
RANGE CONTROL
CURRENT SWITCHES
I0
R-2R LADDER
BIAS CIRCUIT
GND
VREF (+) NPN CURRENT SOURCE PAIR VREF (-) REFERENCE CURRENT AMP
VCC
COMPEN
VEE
NC (NOTE 2) 1 GND 2 VEE 3 I0 4 MSB A1 5 A2 6 DAC0808
16 15 14 13 12 11 10 9
COMPENSATION VREF(-) VREF(+) VCC A8 LSB A7 A6 A5
A3 7 A4 8
Fig. 19.9 The DAC 0808 Signals The pins are labeled A1 through A8, but note that A1 is the Most Significant Bit, and A8 is the Least Significant Bit (the opposite of the normal convention). The D/A converter has an output current, instead of an output voltage. An op-amp converts the current to a voltage. The output current from pin 4 ranges between 0 (when the inputs are all 0) to Imax*255/256 when all the inputs are 1. The current, Imax, is determined by the current into pin 14 (which is at 0 volts). Since we are using 8 bits, the maximum value is Imax*255/256. The output of the D/A converter takes some time to settle. Therefore there should be a small delay before sending the next data to the DA. However this delay is very small compared to the conversion time of an AD Converter, therefore, does not matter in most real time signal processing platforms. Fig.19.10 shows a typical interface.
VCC = 5V
MSBA1 A2 A3 A4 DIGITAL A5 INPUTS A6 A7 LSB A8
13 5 14 6 7 15 8 2 9 DAC0808 10 4 11 12 16 3
5.000k 10.000V = VREF 5k 5.000k
LF351 0.1 F + V0 OUTPUT
VEE = -15V
Fig. 19.10 Typical connection of DAC0808 LF351 is an operational amplifier used as current to proportional voltage converter. The 8-digital inputs at A8-A1 is converted into proportional current at pin no.4 of the DAC. The reference voltages(10V) are supplied at pin 14 and 15(grounded through resistance). A capacitor is connected across the Compensation pin 16 and the negative supply to bypass high frequency noise. Important Specifications 0.19% Error Settling time: 150 ns Slew rate: 8 mA/s Power supply voltage range: 4.5V to 18V Power consumption: 33 mW @ 5V
19(V)
Conclusion
In this lesson you learnt about the following The internal AD converters of 80196 family of processor The external microprocessor compatible AD0809 converter A typical 8-bit DA Converter Both the ADCs use successive approximation technique. Flash ADCs are complex and therefore generate difficult VLSI circuits unsuitable for coexistence on the same chip. Sigma-Delta need very high sampling rate.
Question Answers
Q.1. What are the possible errors in a system as shown in Fig. 19.2? Ans: Stage-1 Signal Amplification and Conditioning This can also amplify the noise. Stage-2 Anti-aliasing Filter Some useful information such as transients in the real systems cannot be captured. Stage-3 Sample and Hold The leakage and electromagnetic interference due to switching Stage-4 Analog to Digital Converter Quantization error due to finite bit length Stage-5 Digital Processing and Data manipulation in a Processor: Numerical round up errors due to finite word length and the delay caused by the algorithm. Stage-6 Processed Digital Values are temporarily stored in a latch before D-A conversion: Error in reconstruction due to zero-order approximation Q.2 Why it is necessary to separate the digital ground from analog ground in a typical ADC? Ans: Digital circuit noise can get to analogue signal path if separate grounding systems are not used for digital and analogue parts. Digital grounds are invariably noisier than analog grounds because of the switching noise generated in digital chips when they change state. For large current transients, PCB trace inductances causes voltage drops between various ground points on the board (ground bounce). Ground bounce translates into varying voltage level bounce on signal lines. For digital lines this isn't a problem unless one crosses a logic threshold. For analog it's just plain noise to be added to the signals.
Module 4
Design of Embedded Processors
Version 2 EE IIT, Kharagpur
1
Lesson 20
Field Programmable Gate Arrays and Applications
2
After going through this lesson the student will be able to Define what is a field programmable gate array (FPGA) Distinguish between an FPGA and a stored-memory processor List and explain the principle of operation of the various functional units within an FPGA Compare the architecture and performance specifications of various commercially available FPGA Describe the steps in using an FPGA in an embedded system
Introduction
An FPGA is a device that contains a matrix of reconfigurable gate array logic circuitry. When a FPGA is configured, the internal circuitry is connected in a way that creates a hardware implementation of the software application. Unlike processors, FPGAs use dedicated hardware for processing logic and do not have an operating system. FPGAs are truly parallel in nature so different processing operations do not have to compete for the same resources. As a result, the performance of one part of the application is not affected when additional processing is added. Also, multiple control loops can run on a single FPGA device at different rates. FPGA-based control systems can enforce critical interlock logic and can be designed to prevent I/O forcing by an operator. However, unlike hard-wired printed circuit board (PCB) designs which have fixed hardware resources, FPGA-based systems can literally rewire their internal circuitry to allow reconfiguration after the control system is deployed to the field. FPGA devices deliver the performance and reliability of dedicated hardware circuitry. A single FPGA can replace thousands of discrete components by incorporating millions of logic gates in a single integrated circuit (IC) chip. The internal resources of an FPGA chip consist of a matrix of configurable logic blocks (CLBs) surrounded by a periphery of I/O blocks shown in Fig. 20.1. Signals are routed within the FPGA matrix by programmable interconnect switches and wire routes.
PROGRAMMABLE INTERCONNECT
I/O BLOCKS
LOGIC BLOCKS
Fig. 20.1 Internal Structure of FPGA In an FPGA logic blocks are implemented using multiple level low fan-in gates, which gives it a more compact design compared to an implementation with two-level AND-OR logic. FPGA provides its user a way to configure: 1. The intersection between the logic blocks and 2. The function of each logic block. Logic block of an FPGA can be configured in such a way that it can provide functionality as simple as that of transistor or as complex as that of a microprocessor. It can used to implement different combinations of combinational and sequential logic functions. Logic blocks of an FPGA can be implemented by any of the following: 1. 2. 3. 4. 5. Transistor pairs combinational gates like basic NAND gates or XOR gates n-input Lookup tables Multiplexers Wide fan-in And-OR structure.
Routing in FPGAs consists of wire segments of varying lengths which can be interconnected via electrically programmable switches. Density of logic block used in an FPGA depends on length and number of wire segments used for routing. Number of segments used for interconnection typically is a tradeoff between density of logic blocks used and amount of area used up for routing. Simplified version of FPGA internal architecture with routing is shown in Fig. 20.2.
Logic block I/O block
Fig. 20.2 Simplified Internal Structure of FPGA
Why do we need FPGAs?

By the early 1980s large scale integrated circuits (LSI) formed the back bone of most of the logic circuits in major systems. Microprocessors, bus/IO controllers, system timers etc were implemented using integrated circuit fabrication technology. Random glue logic or interconnects were still required to help connect the large integrated circuits in order to: 1. Generate global control signals (for resets etc.) 2. Data signals from one subsystem to another sub system. Systems typically consisted of few large scale integrated components and large number of SSI (small scale integrated circuit) and MSI (medium scale integrated circuit) components.Intial attempt to solve this problem led to development of Custom ICs which were to replace the large amount of interconnect. This reduced system complexity and manufacturing cost, and improved performance. However, custom ICs have their own disadvantages. They are relatively very expensive to develop, and delay introduced for product to market (time to market) because of increased design time. There are two kinds of costs involved in development of custom ICs 1. Cost of development and design 2. Cost of manufacture (A tradeoff usually exists between the two costs) Therefore the custom IC approach was only viable for products with very high volume, and which were not time to market sensitive.FPGAs were introduced as an alternative to custom ICs for implementing entire system on one chip and to provide flexibility of reporogramability to the user. Introduction of FPGAs resulted in improvement of density relative to discrete SSI/MSI components (within around 10x of custom ICs). Another advantage of FPGAs over Custom ICs is that with the help of computer aided design (CAD) tools circuits could be implemented in a short amount of time (no physical layout process, no mask making, no IC manufacturing)
Evaluation of FPGA
In the world of digital electronic systems, there are three basic kinds of devices: memory, microprocessors, and logic. Memory devices store random information such as the contents of a Version 2 EE IIT, Kharagpur
5
spreadsheet or database. Microprocessors execute software instructions to perform a wide variety of tasks such as running a word processing program or video game. Logic devices provide specific functions, including device-to-device interfacing, data communication, signal processing, data display, timing and control operations, and almost every other function a system must perform. The first type of user-programmable chip that could implement logic circuits was the Programmable Read-Only Memory (PROM), in which address lines can be used as logic circuit inputs and data lines as outputs. Logic functions, however, rarely require more than a few product terms, and a PROM contains a full decoder for its address inputs. PROMS are thus an inefficient architecture for realizing logic circuits, and so are rarely used in practice for that purpose. The device that came as a replacement for the PROMs are programmable logic devices or in short PLA. Logically, a PLA is a circuit that allows implementing Boolean functions in sum-of-product form. The typical implementation consists of input buffers for all inputs, the programmable AND-matrix followed by the programmable OR-matrix, and output buffers. The input buffers provide both the original and the inverted values of each PLA input. The input lines run horizontally into the AND matrix, while the so-called product-term lines run vertically. Therefore, the size of the AND matrix is twice the number of inputs times the number of product-terms. When PLAs were introduced in the early 1970s, by Philips, their main drawbacks were that they were expensive to manufacture and offered somewhat poor speed-performance. Both disadvantages were due to the two levels of configurable logic, because programmable logic planes were difficult to manufacture and introduced significant propagation delays. To overcome these weaknesses, Programmable Array Logic (PAL) devices were developed. PALs provide only a single level of programmability, consisting of a programmable wired AND plane that feeds fixed OR-gates. PALs usually contain flip-flops connected to the OR-gate outputs so that sequential circuits can be realized. These are often referred to as Simple Programmable Logic Devices (SPLDs). Fig. 20.3 shows a simplified structure of PLA and PAL.
Inputs
PLA
Inputs
PAL
Outputs
Outputs
Fig. 20.3 Simplified Structure of PLA and PAL
With the advancement of technology, it has become possible to produce devices with higher capacities than SPLDs.As chip densities increased, it was natural for the PLD manufacturers to evolve their products into larger (logically, but not necessarily physically) parts called Complex Programmable Logic Devices (CPLDs). For most practical purposes, CPLDs can be thought of as multiple PLDs (plus some programmable interconnect) in a single chip. The larger size of a CPLD allows to implement either more logic equations or a more complicated design.
Logic block Switch matrix Logic block
Logic block
Logic block
Fig. 20.4 Internal structure of a CPLD Fig. 20.4 contains a block diagram of a hypothetical CPLD. Each of the four logic blocks shown there is the equivalent of one PLD. However, in an actual CPLD there may be more (or less) than four logic blocks. These logic blocks are themselves comprised of macrocells and interconnect wiring, just like an ordinary PLD. Unlike the programmable interconnect within a PLD, the switch matrix within a CPLD may or may not be fully connected. In other words, some of the theoretically possible connections between logic block outputs and inputs may not actually be supported within a given CPLD. The effect of this is most often to make 100% utilization of the macrocells very difficult to achieve. Some hardware designs simply won't fit within a given CPLD, even though there are sufficient logic gates and flip-flops available. Because CPLDs can hold larger designs than PLDs, their potential uses are more varied. They are still sometimes used for simple applications like address decoding, but more often contain high-performance control-logic or complex finite state machines. At the high-end (in terms of numbers of gates), there is also a lot of overlap in potential applications with FPGAs. Traditionally, CPLDs have been chosen over FPGAs whenever high-performance logic is required. Because of its less flexible internal architecture, the delay through a CPLD (measured in nanoseconds) is more predictable and usually shorter. The development of the FPGA was distinct from the SPLD/CPLD evolution just described.This is apparent from the architecture of FPGA shown in Fig 20.1. FPGAs offer the highest amount of logic density, the most features, and the highest performance. The largest FPGA now shipping, part of the Xilinx Virtex line of devices, provides eight million "system gates" (the relative density of logic). These advanced devices also offer features such as built-in hardwired processors (such as the IBM Power PC), substantial amounts of memory, clock management systems, and support for many of the latest, very fast device-to-device signaling technologies. FPGAs are used in a wide variety of applications ranging from data processing and storage, to instrumentation, telecommunications, and digital signal processing. The value of programmable logic has always been its ability to shorten development cycles for electronic equipment manufacturers and help them get their product to market faster. As PLD (Programmable Logic Device) suppliers continue to integrate more functions inside their devices, reduce costs, and increase the availability of time-saving IP cores, programmable logic is certain to expand its popularity with digital designers. Version 2 EE IIT, Kharagpur
7
FPGA Structural Classification

Basic structure of an FPGA includes logic elements, programmable interconnects and memory. Arrangement of these blocks is specific to particular manufacturer. On the basis of internal arrangement of blocks FPGAs can be divided into three classes:
Symmetrical arrays
This architecture consists of logic elements (called CLBs) arranged in rows and columns of a matrix and interconnect laid out between them shown in Fig 20.2. This symmetrical matrix is surrounded by I/O blocks which connect it to outside world. Each CLB consists of n-input Lookup table and a pair of programmable flip flops. I/O blocks also control functions such as tristate control, output transition speed. Interconnects provide routing path. Direct interconnects between adjacent logic elements have smaller delay compared to general purpose interconnect
Row based architecture

Row based architecture shown in Fig 20.5 consists of alternating rows of logic modules and programmable interconnect tracks. Input output blocks is located in the periphery of the rows. One row may be connected to adjacent rows via vertical interconnect. Logic modules can be implemented in various combinations. Combinatorial modules contain only combinational elements which Sequential modules contain both combinational elements along with flip flops. This sequential module can implement complex combinatorial-sequential functions. Routing tracks are divided into smaller segments connected by anti-fuse elements between them.
Hierarchical PLDs
This architecture is designed in hierarchical manner with top level containing only logic blocks and interconnects. Each logic block contains number of logic modules. And each logic module has combinatorial as well as sequential functional elements. Each of these functional elements is controlled by the programmed memory. Communication between logic blocks is achieved by programmable interconnect arrays. Input output blocks surround this scheme of logic blocks and interconnects. This type of architecture is shown in Fig 20.6.
I/O Blocks Logic Block Rows I/O Blocks
Routing Channels I/O Blocks I/O Blocks
Fig. 20.5 Row based Architecture

I/O Block
Logic Module
I/O Block
I/O Block
Interconnects
Fig. 20.6 Hierarchical PLD
FPGA Classification on user programmable switch technologies

FPGAs are based on an array of logic modules and a supply of uncommitted wires to route signals. In gate arrays these wires are connected by a mask design during manufacture. In FPGAs, however, these wires are connected by the user and therefore must use an electronic device to connect them. Three types of devices have been commonly used to do this, pass transistors controlled by an SRAM cell, a flash or EEPROM cell to pass the signal, or a direct connect using antifuses. Each of these interconnect devices have their own advantages and disadvantages. This has a major affect on the design, architecture, and performance of the FPGA. Classification of FPGAs on user programmable switch technology is given in Fig. 20.7 shown below.
I/O Block
FPGA AntifuseProgrammed
SRAMProgrammed
EEPROMProgrammed
Actel ACT1 & 2 Quicklogics pASIC Crosspoints CP20K
Xilinx LCA AT&T Orca Altera Flex
Toshiba Plessers ERA Atmels CLi
Alteras MAX AMDs Mach Xilinxs EPLD
Fig. 20.7 FPGA Classification on user programmable technology
SRAM Based
The major advantage of SRAM based device is that they are infinitely re-programmable and can be soldered into the system and have their function changed quickly by merely changing the contents of a PROM. They therefore have simple development mechanics. They can also be changed in the field by uploading new application code, a feature attractive to designers. It does however come with a price as the interconnect element has high impedance and capacitance as well as consuming much more area than other technologies. Hence wires are very expensive and slow. The FPGA architect is therefore forced to make large inefficient logic modules (typically a look up table or LUT).The other disadvantages are: They needs to be reprogrammed each time when power is applied, needs an external memory to store program and require large area. Fig. 20.8 shows two applications of SRAM cells: for controlling the gate nodes of pass-transistor switches and to control the select lines of multiplexers that drive logic block inputs. The figures gives an example of the connection of one logic block (represented by the AND-gate in the upper left corner) to another through two pass-transistor switches, and then a multiplexer, all controlled by SRAM cells . Whether an FPGA uses pass-transistors or multiplexers or both depends on the particular product.
Logic Cell
SRAM
Logic Cell
SRAM
SRAM
Logic Cell
Logic Cell
Fig. 20.8 SRAM-controlled Programmable Switches.
Antifuse Based
The antifuse based cell is the highest density interconnect by being a true cross point. Thus the designer has a much larger number of interconnects so logic modules can be smaller and more efficient. Place and route software also has a much easier time. These devices however are only one-time programmable and therefore have to be thrown out every time a change is made in the design. The Antifuse has an inherently low capacitance and resistance such that the fastest parts are all Antifuse based. The disadvantage of the antifuse is the requirement to integrate the fabrication of the antifuses into the IC process, which means the process will always lag the SRAM process in scaling. Antifuses are suitable for FPGAs because they can be built using modified CMOS technology. As an example, Actels antifuse structure is depicted in Fig. 20.9. The figure shows that an antifuse is positioned between two interconnect wires and physically consists of three sandwiched layers: the top and bottom layers are conductors, and the middle layer is an insulator. When unprogrammed, the insulator isolates the top and bottom layers, but when programmed the insulator changes to become a low-resistance link. It uses Poly-Si and n+ diffusion as conductors and ONO as an insulator, but other antifuses rely on metal for conductors, with amorphous silicon as the middle layer.
wire wire antifuse
oxide dielectric
Poly-Si
n+ diffusion Silicon substrate
Fig. 20.9 Actel Antifuse Structure.
EEPROM Based
The EEPROM/FLASH cell in FPGAs can be used in two ways, as a control device as in an SRAM cell or as a directly programmable switch. When used as a switch they can be very efficient as interconnect and can be reprogrammable at the same time. They are also non-volatile so they do not require an extra PROM for loading. They, however, do have their detractions. The EEPROM process is complicated and therefore also lags SRAM technology.
Logic Block and Routing Techniques

Crosspoint FPGA: consist of two types of logic blocks. One is transistor pair tiles in which transistor pairs run in parallel lines as shown in figure below:
Transistor Pair
Fig. 20.10 Transistor pair tiles in cross-point FPGA. second type of logic blocks are RAM logic which can be used to implement random access memory. Plessey FPGA: Basic building block here is 2-input NAND gate which is connected to each other to implement desired function.
Latch 8-2 multiplexer
CLK Data
8 interconnect lines
Config RAM
Fig. 20.11 Plessey Logic Block Version 2 EE IIT, Kharagpur 12
Both Crosspoint and Plessey are fine grain logic blocks. Fine grain logic blocks have an advantage in high percentage usage of logic blocks but they require large number of wire segments and programmable switches which occupy lot of area. Actel Logic Block: If inputs of a multiplexer are connected to a constant or to a signal, it can be used to implement different logic functions. For example a 2-input multiplexer with inputs a and b, select, will implement function ac + bc. If b=0 then it will implement ac, and if a=0 it will implement bc.
w x 0 1 0 n1 y z 0 1 n3 n4 n2 1
Fig. 20.12 Actel Logic Block Typically an Actel logic block consists of multiple number of multiplexers and logic gates.
Xilinx Logic block

In Xilinx logic block Look up table is used to implement any number of different functionality. The input lines go into the input and enable of lookup table. The output of the lookup table gives the result of the logic function that it implements. Lookup table is implemented using SRAM.
Data in
M U X
S R X Outputs Y
Inputs
A B C D E
Look-up Table
M U X
S R
Enable clock Clock Reset
Vix
Gnd (Global Reset)
OR
Fig. 20.13 Xilinx - LUT based Version 2 EE IIT, Kharagpur 13
A k-input logic function is implemented using 2^k * 1 size SRAM. Number of different possible functions for k input LUT is 2^2^k. Advantage of such an architecture is that it supports implementation of so many logic functions, however the disadvantage is unusually large number of memory cells required to implement such a logic block in case number of inputs is large. Fig. 20.13 shows 5-input LUT based implementation of logic block LUT based design provides for better logic block utilization. A k-input LUT based logic block can be implemented in number of different ways with tradeoff between performance and logic density.
Set by configuration bit-stream
Logic Block
latch
1 INPUTS 4-LUT FF 0 OUTPUT
4-input look up table
An n-lut can be shown as a direct implementation of a function truth-table. Each of the latch holds the value of the function corresponding to one input combination. For Example: 2-lut shown in figure below implements 2 input AND and OR functions.
Example: 2-lut INPUTS AND OR 00 01 10 11 0 0 0 1 0 1 1 1
Altera Logic Block

Altera's logic block has evolved from earlier PLDs. It consists of wide fan in (up to 100 input) AND gates feeding into an OR gate with 3-8 inputs. The advantage of large fan in AND gate based implementation is that few logic blocks can implement the entire functionality thereby reducing the amount of area required by interconnects. On the other hand disadvantage is the low density usage of logic blocks in a design that requires fewer input logic. Another disadvantage is the use of pull up devices (AND gates) that consume static power. To improve power manufacturers provide low power consuming logic blocks at the expense of delay. Such logic blocks have gates with high threshold as a result they consume less power. Such logic blocks can be used in non-critical paths. Altera, Xilinx are coarse grain architecture. Example: Alteras FLEX 8000 series consists of a three-level hierarchy. However, the lowest level of the hierarchy consists of a set of lookup tables, rather than an SPLD like block, and so the FLEX 8000 is categorized here as an FPGA. It should be noted, however, that FLEX 8000 is Version 2 EE IIT, Kharagpur 14
a combination of FPGA and CPLD technologies. FLEX 8000 is SRAM-based and features a four-input LUT as its basic logic block. Logic capacity ranges from about 4000 gates to more than 15,000 for the 8000 series. The overall architecture of FLEX 8000 is illustrated in Fig. 20.14.
I/O I/O
Fast Track interconnect
LAB (8 Logic Elements & local interconnect)
Fig. 20.14 Architecture of Altera FLEX 8000 FPGAs. The basic logic block, called a Logic Element (LE) contains a four-input LUT, a flip-flop, and special-purpose carry circuitry for arithmetic circuits. The LE also includes cascade circuitry that allows for efficient implementation of wide AND functions. Details of the LE are illustrated in Fig. 20.15.
Cascade in data1 data2 data3 data4
Cascade out
Look-up Table
Cascade
S DQ R
LE out
Carry in
Carry
Carry out
cntrl1 cntrl2 cntrl3 cntrl4
set/clear
clock
Fig. 20.15 Altera FLEX 8000 Logic Element (LE). Version 2 EE IIT, Kharagpur 15
In the FLEX 8000, LEs are grouped into sets of 8, called Logic Array Blocks (LABs, a term borrowed from Alteras CPLDs). As shown in Fig. 20.16, each LAB contains local interconnect and each local wire can connect any LE to any other LE within the same LAB. Local interconnect also connects to the FLEX 8000s global interconnect, called FastTrack. All FastTrack wires horizontal wires are identical, and so interconnect delays in the FLEX 8000 are more predictable than FPGAs that employ many smaller length segments because there are fewer programmable switches in the longer path
From Fast Track interconnect cntrl Cascade, carry 4 2 data 4 LE To Fast Track interconnect
Local interconnect
LE
To Fast Track interconnect
LE
To Fast Track interconnect to adjacent LAB
Fig. 20.16 Altera FLEX 8000 Logic Array Block (LAB).
FPGA Design Flow

One of the most important advantages of FPGA based design is that users can design it using CAD tools provided by design automation companies. Generic design flow of an FPGA includes following steps:
System Design
At this stage designer has to decide what portion of his functionality has to be implemented on FPGA and how to integrate that functionality with rest of the system.
I/O integration with rest of the system

Input Output streams of the FPGA are integrated with rest of the Printed Circuit Board, which allows the design of the PCB early in design process. FPGA vendors provide extra automation software solutions for I/O design process.
Design Description
Designer describes design functionality either by using schematic editors or by using one of the various Hardware Description Languages (HDLs) like Verilog or VHDL.
Synthesis
Once design has been defined CAD tools are used to implement the design on a given FPGA. Synthesis includes generic optimization, slack optimizations, power optimizations followed by placement and routing. Implementation includes Partition, Place and route. The output of design implementation phase is bit-stream file.
Design Verification
Bit stream file is fed to a simulator which simulates the design functionality and reports errors in desired behavior of the design. Timing tools are used to determine maximum clock frequency of the design. Now the design is loading onto the target FPGA device and testing is done in real environment.
Hardware design and development

The process of creating digital logic is not unlike the embedded software development process. A description of the hardware's structure and behavior is written in a high-level hardware description language (usually VHDL or Verilog) and that code is then compiled and downloaded prior to execution. Of course, schematic capture is also an option for design entry, but it has become less popular as designs have become more complex and the language-based tools have improved. The overall process of hardware development for programmable logic is shown in Fig. 20.17 and described in the paragraphs that follow. Perhaps the most striking difference between hardware and software design is the way a developer must think about the problem. Software developers tend to think sequentially, even when they are developing a multithreaded application. The lines of source code that they write are always executed in that order, at least within a given thread. If there is an operating system it is used to create the appearance of parallelism, but there is still just one execution engine. During design entry, hardware designers must think-and program-in parallel. All of the input signals are processed in parallel, as they travel through a set of execution engines-each one a series of macrocells and interconnections-toward their destination output signals. Therefore, the statements of a hardware description language create structures, all of which are "executed" at the very same time.
Design Entry
Simulation
Design Constraints
Synthesis
Place and Route Design Library Download
Fig. 20.17 Programmable logic design process Typically, the design entry step is followed or interspersed with periods of functional simulation. That's where a simulator is used to execute the design and confirm that the correct outputs are produced for a given set of test inputs. Although problems with the size or timing of the hardware may still crop up later, the designer can at least be sure that his logic is functionally correct before going on to the next stage of development. Compilation only begins after a functionally correct representation of the hardware exists. This hardware compilation consists of two distinct steps. First, an intermediate representation of the hardware design is produced. This step is called synthesis and the result is a representation called a netlist. The netlist is device independent, so its contents do not depend on the particulars of the FPGA or CPLD; it is usually stored in a standard format called the Electronic Design Interchange Format (EDIF). The second step in the translation process is called place & route. This step involves mapping the logical structures described in the netlist onto actual macrocells, interconnections, and input and output pins. This process is similar to the equivalent step in the development of a printed circuit board, and it may likewise allow for either automatic or manual layout optimizations. The result of the place & route process is a bitstream. This name is used generically, despite the fact that each CPLD or FPGA (or family) has its own, usually proprietary, bitstream format. Suffice it to say that the bitstream is the binary data that must be loaded into the FPGA or CPLD to cause that chip to execute a particular hardware design. Increasingly there are also debuggers available that at least allow for single-stepping the hardware design as it executes in the programmable logic device. But those only complement a simulation environment that is able to use some of the information generated during the place & route step to provide gate-level simulation. Obviously, this type of integration of device-specific information into a generic simulator requires a good working relationship between the chip and simulation tool vendors.
Things to Ponder
Q.1 Define the following acronyms as they apply to digital logic circuits: ASIC PAL PLA PLD CPLD FPGA Q2.How granularity of logic block influences the performance of an FPGA? Q3. Why would anyone use programmable logic devices (PLD, PAL, PLA, CPLD, FPGA, etc.) in place of traditional "hard-wired" logic such as NAND, NOR, AND, and OR gates? Are there any applications where hard-wired logic would do a better job than a programmable device? Q4.Some programmable logic devices (and PROM memory devices as well) use tiny fuses which are intentionally "blown" in specific patterns to represent the desired program. Programming a device by blowing tiny fuses inside of it carries certain advantages and disadvantages - describe what some of these are. Q5. Use one 4 x 8 x 4 PLA to implement the function. F1 ( w, x, y, z ) = wx ' y ' z + wx ' yz '+ wxy ' F2 ( w, x, y, z )= wx ' y + x ' y ' z
Module 4
Lesson 21
Introduction to Hardware Description Languages - I
At the end of the lesson the student should be able to Describe a digital IC design flow and explain its various abstraction levels. Explain the need for a hardware description language in the IC desing flow Model simple hardware devices at various levels of abstraction using Verilog (Gate/Switch/Behavioral) Write Verilog codes meeting the prescribed requirement at a specified level
1.1 1.1.1
Introduction What is a HDL and where does Verilog come?
HDL is an abbreviation of Hardware Description Language. Any digital system can be represented in a REGISTER TRANSFER LEVEL (RTL) and HDLs are used to describe this RTL. Verilog is one such HDL and it is a general-purpose language easy to learn and use. Its syntax is similar to C. The idea is to specify how the data flows between registers and how the design processes the data. To define RTL, hierarchical design concepts play a very significant role. Hierarchical design methodology facilitates the digital design flow with several levels of abstraction. Verilog HDL can utilize these levels of abstraction to produce a simplified and efficient representation of the RTL description of any digital design. For example, an HDL might describe the layout of the wires, resistors and transistors on an Integrated Circuit (IC) chip, i.e., the switch level or, it may describe the design at a more micro level in terms of logical gates and flip flops in a digital system, i.e., the gate level. Verilog supports all of these levels.
1.1.2
Hierarchy of design methodologies
Bottom-Up Design
The traditional method of electronic design is bottom-up (designing from transistors and moving to a higher level of gates and, finally, the system). But with the increase in design complexity traditional bottom-up designs have to give way to new structural, hierarchical design methods.
Top-Down Design
For HDL representation it is convenient and efficient to adapt this design-style. A real top-down design allows early testing, fabrication technology independence, a structured system design and offers many other advantages. But it is very difficult to follow a pure top-down design. Due to this fact most designs are mix of both the methods, implementing some key elements of both design styles.
1.1.3
Hierarchical design concept and Verilog
To follow the hierarchical design concepts briefly mentioned above one has to describe the design in terms of entities called MODULES.
Modules
A module is the basic building block in Verilog. It can be an element or a collection of low level design blocks. Typically, elements are grouped into modules to provide common functionality used in places of the design through its port interfaces, but hides the internal implementation.
1.1.4

Abstraction Levels
Behavioral level Register-Transfer Level Gate Level Switch level
Behavioral or algorithmic Level

This level describes a system by concurrent algorithms (Behavioral). Each algorithm itself is sequential meaning that it consists of a set of instructions that are executed one after the other. initial, always ,functions and tasks blocks are some of the elements used to define the system at this level. The intricacies of the system are not elaborated at this stage and only the functional description of the individual blocks is prescribed. In this way the whole logic synthesis gets highly simplified and at the same time more efficient.
Register-Transfer Level
Designs using the Register-Transfer Level specify the characteristics of a circuit by operations and the transfer of data between the registers. An explicit clock is used. RTL design contains exact timing possibility, operations are scheduled to occur at certain times. Modern definition of a RTL code is "Any code that is synthesizable is called RTL code".
Gate Level
Within the logic level the characteristics of a system are described by logical links and their timing properties. All signals are discrete signals. They can only have definite logical values (`0', `1', `X', `Z`). The usable operations are predefined logic primitives (AND, OR, NOT etc gates). It must be indicated here that using the gate level modeling may not be a good idea in logic design. Gate level code is generated by tools like synthesis tools in the form of netlists which are used for gate level simulation and for backend.
Switch Level
This is the lowest level of abstraction. A module can be implemented in terms of switches, storage nodes and interconnection between them. However, as has been mentioned earlier, one can mix and match all the levels of abstraction in a design. RTL is frequently used for Verilog description that is a combination of behavioral and dataflow while being acceptable for synthesis.
Instances
A module provides a template from where one can create objects. When a module is invoked Verilog creates a unique object from the template, each having its own name, variables, parameters and I/O interfaces. These are known as instances.
1.1.5
The Design Flow
This block diagram describes a typical design flow for the description of the digital design for both ASIC and FPGA realizations.
LEVEL OF FLOW Specification High Level Design
Micro Design/Low level design RTL Coding Simulation Synthesis
Place & Route
Post Si Validation
TOOLS USED Word processor like Word, Kwriter, AbiWord, Open Office Word processor like Word, Kwriter, AbiWord, for drawing waveform use tools like waveformer or testbencher or Word, Open Office. Word processor like Word, Kwriter, AbiWord, for drawing waveform use tools like waveformer or testbencher or Word. For FSM StateCAD or some similar tool, Open Office Vim, Emacs, conTEXT, HDL TurboWriter Modelsim, VCS, Verilog-XL, Veriwell, Finsim, iVerilog, VeriDOS Design Compiler, FPGA Compiler, Synplify, Leonardo Spectrum. You can download this from FPGA vendors like Altera and Xilinx for free For FPGA use FPGA' vendors P&R tool. ASIC tools require expensive P&R tools like Apollo. Students can use LASI, Magic For ASIC and FPGA, the chip needs to be tested in real environment. Board design, device drivers needs to be in place
Specification
This is the stage at which we define the important parameters of the system that has to be designed. For example for designing a counter one has to decide its bit-size, whether it should have synchronous reset whether it must be active high enable etc.
High Level Design

This is the stage at which one defines various blocks in the design in the form of modules and instances. For instance for a microprocessor a high level representation means splitting the design into blocks based on their function. In this case the various blocks are registers, ALU, Instruction Decode, Memory Interface, etc.
Micro Design/Low level design

Low level design or Micro design is the phase in which, designer describes how each block is implemented. It contains details of State machines, counters, Mux, decoders, internal registers. For state machine entry you can use either Word, or special tools like State CAD. It is always a good idea if waveform is drawn at various interfaces. This is the phase, where one spends lot of time. A sample low level design is indicated in the figure below.
RTL Coding
In RTL coding, Micro Design is converted into Verilog/VHDL code, using synthesizable constructs of the language. Normally, vim editor is used, and conTEXT, Nedit and Emacs are other choices.
Simulation
Simulation is the process of verifying the functional characteristics of models at any level of abstraction. We use simulators to simulate the the Hardware models. To test if the RTL code meets the functional requirements of the specification, see if all the RTL blocks are functionally correct. To achieve this we need to write testbench, which generates clk, reset and required test vectors. A sample testbench for a counter is as shown below. Normally, we spend 60-70% of time in verification of design.
We use waveform output from the simulator to see if the DUT (Device Under Test) is functionally correct. Most of the simulators come with waveform viewer, as design becomes complex, we write self checking testbench, where testbench applies the test vector, compares the output of DUT with expected value. There is another kind of simulation, called timing simulation, which is done after synthesis or after P&R (Place and Route). Here we include the gate delays and wire delays and see if DUT works at the rated clock speed. This is also called as SDF simulation or gate level simulation Version 2 EE IIT, Kharagpur 7
Synthesis
Synthesis is the process in which a synthesis tool like design compiler takes in the RTL in Verilog or VHDL, target technology, and constrains as input and maps the RTL to target technology primitives. The synthesis tool after mapping the RTL to gates, also does the minimal amount of timing analysis to see if the mapped design is meeting the timing requirements. (Important thing to note is, synthesis tools are not aware of wire delays, they know only gate delays). After the synthesis there are a couple of things that are normally done before passing the netlist to backend (Place and Route)

Verification: Check if the RTL to gate mapping is correct. Scan insertion: Insert the scan chain in the case of ASIC.
Place & Route

Gate-level netlist from the synthesis tool is taken and imported into place and route tool in the Verilog netlist format. All the gates and flip-flops are placed, Clock tree synthesis and reset is routed. After this each block is routed. Output of the P&R tool is a GDS file, this file is used by a Version 2 EE IIT, Kharagpur 8
foundry for fabricating the ASIC. Normally the P&R tool are used to output the SDF file, which is back annotated along with the gatelevel netlist from P&R into static analysis tool like Prime Time to do timing analysis.
Post Silicon Validation

Once the chip (silicon) is back from fabrication, it needs to be put in a real environment and tested before it can be released into market. Since the speed of simulation with RTL is very slow (number clocks per second), there is always a possibility to find a bug
1.2 1.2.1
Verilog HDL: Syntax and Semantics Lexical Conventions
The basic lexical conventions used by Verilog HDL are similar to those in the C programming language. Verilog HDL is a case-sensitive language. All keywords are in lowercase.
1.2.2
Data Types
Verilog Language has two primary data types : Nets - represents structural connections between components. Registers - represent variables used to store data. Every signal has a data type associated with it. Data types are: Explicitly declared with a declaration in the Verilog code. Implicitly declared with no declaration but used to connect structural building blocks in the code. Implicit declarations are always net type "wire" and only one bit wide.
Types of Net
Each net type has functionality that is used to model different types of hardware (such as PMOS, NMOS, CMOS, etc).This has been tabularized as follows: Net Data Type wire, tri wor, trior wand,triand tri0,tri1 supply0,suppy1 Functionality Interconnecting wire - no special resolution function Wired outputs OR together (models ECL) Wired outputs AND together (models open-collector) Net pulls-down or pulls-up when not driven Net has a constant logic 0 or logic 1 (supply strength)
Register Data Types

Registers store the last value assigned to them until another assignment statement changes their value. Registers represent data storage constructs. Register arrays are called memories. Version 2 EE IIT, Kharagpur 9
Register data types are used as variables in procedural blocks. A register data type is required if a signal is assigned a value within a procedural block Procedural blocks begin with keyword initial and always.
Some common data types are listed in the following table: Data Types reg integer time real Functionality Unsigned variable Signed variable 32 bits Unsigned integer- 64 bits Double precision floating point variable
1.2.3
Apart from these there are vectors, integer, real & time register data types.
Some examples are as follows: Integer integer counter; // general purpose variable used as a counter. initial counter= -1; // a negative one is stored in the counter Real real delta; // Define a real variable called delta. initial begin delta= 4e10; // delta is assigned in scientific notation delta = 2.13; // delta is assigned a value 2.13 end integer i; // define an integer I; initial i = delta ; // I gets the value 2(rounded value of 2.13) Time time save_sim_time; // define a time variable save_sim_time initial save_sim_time = $time; // save the current simulation time. n.b. $time is invoked to get the current simulation time Arrays integer count [0:7]; // an array of 8 count variables reg [4:0] port_id[0:7]; // Array of 8 port _ids, each 5 bit wide. integer matrix[4:0] [0:255] ; // two dimensional array of integers. Version 2 EE IIT, Kharagpur 10
1.2.4
Some Constructs Using Data Types
Memories
Memories are modeled simply as one dimensional array of registers each element of the array is know as an element of word and is addressed by a single array index. reg membit [0:1023] ; // memory meme1bit with 1K 1- bit words reg [7:0] membyte [0:1023]; memory membyte with 1K 8 bit words membyte [511] // fetches 1 byte word whose address is 511.
Strings
A string is a sequence of characters enclosed by double quotes and all contained on a single line. Strings used as operands in expressions and assignments are treated as a sequence of eight-bit ASCII values, with one eight-bit ASCII value representing one character. To declare a variable to store a string, declare a register large enough to hold the maximum number of characters the variable will hold. Note that no extra bits are required to hold a termination character; Verilog does not store a string termination character. Strings can be manipulated using the standard operators. When a variable is larger than required to hold a value being assigned, Verilog pads the contents on the left with zeros after the assignment. This is consistent with the padding that occurs during assignment of non-string values. Certain characters can be used in strings only when preceded by an introductory character called an escape character. The following table lists these characters in the right-hand column with the escape sequence that represents the character in the left-hand column.
Modules

Module are the building blocks of Verilog designs You create design hierarchy by instantiating modules in other modules. An instance of a module can be called in another, higher-level module.
Ports

Ports allow communication between a module and its environment. All but the top-level modules in a hierarchy have ports. Ports can be associated by order or by name. You declare ports to be input, output or inout. The port declaration syntax is : input [range_val:range_var] list_of_identifiers; output [range_val:range_var] list_of_identifiers; inout [range_val:range_var] list_of_identifiers;
Schematic
1.2.5

Port Connection Rules

Inputs : internally must always be type net, externally the inputs can be connected to variable reg or net type. Outputs : internally can be type net or reg, externally the outputs must be connected to a variable net type. Inouts : internally or externally must always be type net, can only be connected to a variable net type.
Width matching: It is legal to connect internal and external ports of different sizes. But beware, synthesis tools could report problems. Unconnected ports : unconnected ports are allowed by using a "," The net data types are used to connect structure A net data type is required if a signal can be driven a structural connection.
Example Implicit
dff u0 ( q,,clk,d,rst,pre); // Here second port is not connected
Example Explicit
dff u0 (.q (q_out), .q_bar (), .clk (clk_in), .d (d_in), .rst (rst_in), .pre (pre_in)); // Here second port is not connected
1.3
Gate Level Modeling
In this level of abstraction the system modeling is done at the gate level ,i.e., the properties of the gates etc. to be used by the behavioral description of the system are defined. These definitions are known as primitives. Verilog has built in primitives for gates, transmission gates, switches, buffers etc.. These primitives are instantiated like modules except that they are predefined in verilog and do not need a module definition. Two basic types of gates are and/or gates & buf /not gates.
1.3.1
Gate Primitives
And/Or Gates: These have one scalar output and multiple scalar inputs. The output of the gate is evaluated as soon as the input changes . wire OUT, IN1, IN2; // basic gate instantiations and a1(OUT, IN1, IN2); nand na1(OUT, IN1, IN2); or or1(OUT, IN1, IN2); Version 2 EE IIT, Kharagpur 13
nor nor1(OUT, IN1, IN2); xor x1(OUT, IN1, IN2); xnor nx1(OUT, IN1, IN2); // more than two inputs; 3 input nand gate nand na1_3inp(OUT, IN1, IN2, IN3); // gate instantiation without instance name and (OUT, IN1, IN2); // legal gate instantiation Buf/Not Gates: These gates however have one scalar input and multiple scalar outputs \// basic gate instantiations for bufif bufif1 b1(out, in, ctrl); bufif0 b0(out, in, ctrl); // basic gate instantiations for notif notif1 n1(out, in, ctrl); notif0 n0(out, in, ctrl);
Array of instantiations
wire [7:0] OUT, IN1, IN2; // basic gate instantiations nand n_gate[7:0](OUT, IN1, IN2);
Gate-level multiplexer
A multiplexer serves a very efficient basic logic design element // module 4:1 multiplexer module mux4_to_1(out, i1, i2 , i3, s1, s0); // port declarations output out; input i1, i2, i3; input s1, s0; // internal wire declarations wire s1n, s0n; wire y0, y1, y2, y3 ; //gate instantiations // create s1n and s0n signals not (s1n, s1); not (s0n, s0); // 3-input and gates instantiated and (y0, i0, s1n, s0n); and (y1, i1, s1n, s0); and (y2, i2, s1, s0n); and (y3, i3, s1, s0); // 4- input gate instantiated or (out, y0, y1, y2, y3); endmodule Version 2 EE IIT, Kharagpur 14
1.3.2
Gate and Switch delays
In real circuits, logic gates haves delays associated with them. Verilog provides the mechanism to associate delays with gates. Rise, Fall and Turn-off delays. Minimal, Typical, and Maximum delays
Rise Delay
The rise delay is associated with a gate output transition to 1 from another value (0,x,z).
Fall Delay
The fall delay is associated with a gate output transition to 0 from another value (1,x,z).
Turn-off Delay The Turn-off delay is associated with a gate output transition to z from another value (0,1,x). Min Value The min value is the minimum delay value that the gate is expected to have. Typ Value The typ value is the typical delay value that the gate is expected to have. Max Value The max value is the maximum delay value that the gate is expected to have.
1.4 1.4.1
Verilog Behavioral Modeling Procedural Blocks
Verilog behavioral code is inside procedures blocks, but there is an exception, some behavioral code also exist outside procedures blocks. We can see this in detail as we make progress. There are two types of procedural blocks in Verilog

initial : initial blocks execute only once at time zero (start execution at time zero). always : always blocks loop to execute over and over again, in other words as the name means, it executes always.
Example initial module initial_example(); reg clk,reset,enable,data; initial begin clk = 0; reset = 0; enable = 0; data = 0; end endmodule In the above example, the initial block execution and always block execution starts at time 0. Always blocks wait for the the event, here positive edge of clock, where as initial block without waiting just executes all the statements within begin and end statement. Example always module always_example(); reg clk,reset,enable,q_in,data; always @ (posedge clk) if (reset) begin data <= 0; end else if (enable) begin data <= q_in; end endmodule In always block, when the trigger event occurs, the code inside begin and end is executed and then once again the always block waits for next posedge of clock. This process of waiting and executing on event is repeated till simulation stops.
1.4.2

Procedural Assignment Statements

Procedural assignment statements assign values to reg , integer , real , or time variables and can not assign values to nets ( wire data types) You can assign to the register (reg data type) the value of a net (wire), constant, another register, or a specific value.
1.4.3
Procedural Assignment Groups
If a procedure block contains more then one statement, those statements must be enclosed within Sequential begin - end block Parallel fork - join block Example - "begin-end" module initial_begin_end(); reg clk,reset,enable,data; initial begin Version 2 EE IIT, Kharagpur 16
#1 clk = 0; #10 reset = 0; #5 enable = 0; #3 data = 0; end endmodule Begin : clk gets 0 after 1 time unit, reset gets 0 after 6 time units, enable after 11 time units, data after 13 units. All the statements are executed sequentially. Example - "fork-join" module initial_fork_join(); reg clk,reset,enable,data; initial fork #1 clk = 0; #10 reset = 0; #5 enable = 0; #3 data = 0; join endmodule
1.4.4
Sequential Statement Groups
The begin - end keywords: Group several statements together. Cause the statements to be evaluated sequentially (one at a time) o Any timing within the sequential groups is relative to the previous statement. o Delays in the sequence accumulate (each delay is added to the previous delay) o Block finishes after the last statement in the block.
1.4.5
Parallel Statement Groups
The fork - join keywords: Group several statements together. Cause the statements to be evaluated in parallel ( all at the same time). o Timing within parallel group is absolute to the beginning of the group. o Block finishes after the last statement completes( Statement with high delay, it can be the first statement in the block). Example Parallel module parallel(); reg a; initial fork #10 a = 0; #11 a = 1; #12 a = 0; #13 a = 1; Version 2 EE IIT, Kharagpur 17
#14 a = $finish; join endmodule Example - Mixing "begin-end" and "fork - join" module fork_join(); reg clk,reset,enable,data; initial begin $display ( "Starting simulation" ); fork : FORK_VAL #1 clk = 0; #5 reset = 0; #5 enable = 0; #2 data = 0; join $display ( "Terminating simulation" ); #10 $finish; end endmodule
1.4.6
Blocking and Nonblocking assignment
Blocking assignments are executed in the order they are coded, Hence they are sequential. Since they block the execution of the next statement, till the current statement is executed, they are called blocking assignments. Assignment are made with "=" symbol. Example a = b; Nonblocking assignments are executed in parallel. Since the execution of next statement is not blocked due to execution of current statement, they are called nonblocking statement. Assignment are made with "<=" symbol. Example a <= b; Example - blocking and nonblocking module blocking_nonblocking(); reg a, b, c, d ; // Blocking Assignment initial begin #10 a = 0; #11 a = 1; #12 a = 0; #13 a = 1; end initial begin #10 b <= 0; #11 b <=1; #12 b <=0; #13 b <=1; end initial begin c = #10 0; c = #11 1; Version 2 EE IIT, Kharagpur 18
c = #12 0; c = #13 1; end initial begin d <= #10 0; d <= #11 1; d <= #12 0; d <= #13 1; end initial begin $monitor( " TIME = %t A = %b B = %b C = %b D = %b" ,$time, a, b, c, d ); #50 $finish(1); end endmodule
1.4.7
The Conditional Statement if-else
The if - else statement controls the execution of other statements. In programming language like c, if - else controls the flow of program. When more than one statement needs to be executed for an if conditions, then we need to use begin and end as seen in earlier examples. Syntax: if if (condition) statements; Syntax: if-else if (condition) statements; else statements;
1.4.8
Syntax: nested if-else-if
if (condition) statements; else if (condition) statements; ................ ................ else statements; Example- simple if module simple_if(); reg latch; wire enable,din; always @ (enable or din) if (enable) begin latch <= din; end endmodule Example- if-else module if_else(); Version 2 EE IIT, Kharagpur 19
reg dff; wire clk,din,reset; always @ (posedge clk) if (reset) begin dff <= 0; end else begin dff <= din; end endmodule Example- nested-if-else-if module nested_if(); reg [3:0] counter; wire clk,reset,enable, up_en, down_en; always @ (posedge clk) // If reset is asserted if (reset == 1'b0) begin counter <= 4'b0000; // If counter is enable and up count is mode end else if (enable == 1'b1 && up_en == 1'b1) begin counter <= counter + 1'b1; // If counter is enable and down count is mode end else if (enable == 1'b1 && down_en == 1'b1) begin counter <= counter - 1'b0; // If counting is disabled end else begin counter <= counter; // Redundant code end endmodule
Parallel if-else
In the above example, the (enable == 1'b1 && up_en == 1'b1) is given highest pritority and condition (enable == 1'b1 && down_en == 1'b1) is given lowest priority. We normally don't include reset checking in priority as this does not fall in the combo logic input to the flip-flop as shown in figure below.
So when we need priority logic, we use nested if-else statements. On the other end if we don't want to implement priority logic, knowing that only one input is active at a time i.e. all inputs are mutually exclusive, then we can write the code as shown below. It is a known fact that priority implementation takes more logic to implement then parallel implementation. So if you know the inputs are mutually exclusive, then you can code the logic in parallel if. module parallel_if(); reg [3:0] counter; wire clk,reset,enable, up_en, down_en; always @ (posedge clk) // If reset is asserted if (reset == 1'b0) begin counter <= 4'b0000; end else begin // If counter is enable and up count is mode if (enable == 1'b1 && up_en == 1'b1) begin counter <= counter + 1'b1; end // If counter is enable and down count is mode if (enable == 1'b1 && down_en == 1'b1) begin counter <= counter - 1'b0; end end endmodule
1.4.9
The Case Statement
The case statement compares an expression with a series of cases and executes the statement or statement group associated with the first matching case case statement supports single or multiple statements. Group multiple statements using begin and end keywords. Syntax of a case statement look as shown below. case () < case1 > : < statement > < case2 > : < statement > default : < statement > endcase
1.4.10 Looping Statements

Looping statements appear inside procedural blocks only. Verilog has four looping statements like any other programming language. forever repeat while for Version 2 EE IIT, Kharagpur 21
The forever statement The forever loop executes continually, the loop never ends. Normally we use forever statement in initial blocks. syntax : forever < statement > Once should be very careful in using a forever statement, if no timing construct is present in the forever statement, simulation could hang. The repeat statement The repeat loop executes statement fixed < number > of times. syntax : repeat (< number >) (< statement >) The while loop statement The while loop executes as long as an evaluates as true. This is same as in any other programming language. syntax: while (expression)<statement> The for loop statement The for loop is same as the for loop used in any other programming language. Executes an < initial assignment > once at the start of the loop. Executes the loop as long as an < expression > evaluates as true. Executes a at the end of each pass through the loop syntax : for (< initial assignment >; < expression >, < step assignment >) < statement > Note : verilog does not have ++ operator as in the case of C language.
1.5
Switch level modeling
1.5.1 Verilog provides the ability to design at MOS-transistor level, however with increase in complexity of the circuits design at this level is growing tough. Verilog however only provides digital design capability and drive strengths associated to them. Analog capability is not into picture still. As a matter of fact transistors are only used as switches. MOS switches //MOS switch keywords nmos pmos Whereas the keyword nmos is used to model a NMOS transistor, pmos is used for PMOS transistors. Instantiation of NMOS and PMOS switches nmos n1(out, data, control); // instantiate a NMOS switch pmos p1(out, data, control); // instantiate a PMOS switch
CMOS switches
Instantiation of a CMOS switch.
cmos c1(out, data, ncontrol, pcontrol ); // instantiate a cmos switch The ncontrol and pcontrol signals are normally complements of each other
Bidirectional switches
These switches allow signal flow in both directions and are defined by keywords tran,tranif0 , and tranif1
Instantiation
tran t1(inout1, inout2); // instance name t1 is optional tranif0(inout1, inout2, control); // instance name is not specified tranif1(inout1, inout2, control); // instance name t1 is not specified
1.5.2
Delay specification of switches

Zero(no delay) pmos p1(out,data, control); One (same delay in all) pmos#(1) p1(out,data, control); Two(rise, fall) nmos#(1,2) n1(out,data, control); Three(rise, fall, turnoff)mos#(1,3,2) n1(out,data,control);
pmos, nmos, rpmos, rnmos
1.5.3
An Instance: Verilog code for a NOR- gate
// define a nor gate, my_nor module my_nor(out, a, b); output out; input a, b; //internal wires wire c; // set up pwr n ground lines supply1 pwr;// power is connected to Vdd supply0 gnd; // connected to Vss // instantiate pmos switches pmos (c, pwr, b); pmos (out, c, a); //instantiate nmos switches nmos (out, gnd, a); Stimulus to test the NOR-gate // stimulus to test the gate Version 2 EE IIT, Kharagpur 23
module stimulus; reg A, B; wire OUT; //instantiate the my_nor module my_nor n1(OUT, A, B); //Apply stimulus initial begin //test all possible combinations A=1b0; B=1b0; #5 A=1b0; B=1b1; #5 A=1b1; B=1b0; #5 A=1b1; B=1b1; end //check results initial $ monitor($time, OUT = %b, B=%b, OUT, A, B); endmodule
1.6 1.6.1
Some Exercises Gate level modelling
i) A 2 inp xor gate can be build from my_and, my_or and my_not gates. Construct an xor module in verilog that realises the logic function z= xy'+x'y. Inputs are x, y and z is the output. Write a stimulus module that exercises all the four combinations of x and y ii) The logic diagram for an RS latch with delay is being shown.
Write the verilog description for the RS latch, including delays of 1 unit when instantiating the nor gates. Write the stimulus module for the RS latch using the following table and verify the outputs.
Set 0 0 1 1
Reset 0 1 0 1
Qn+1 qn 0 1 ?
iii) Design a 2-input multiplexer using bufif0 and bufif1 gates as shown below
The delay specification for gates b1 and b2 are as follows Min 1 3 5 Typ 2 4 6 Max 3 5 7
Rise Fall Turnoff
1.6.2. Behavioral modelling

i) Using a while loop design a clk generator whose initial value is 0. time period of the clk is 10. ii) Using a forever statement, design a clk with time period=10 and duty cycle =40%. Initial value of clk is 0 iii) Using the repeat loop, delay the statement a=a+1 by 20 positive edges of clk. iv) Design a negative edge triggered D-FF with synchronous clear, active high (D-FF clears only at negative edge of clk when clear is high). Use behavioral statements only. (Hint: output q of DFF must be declared as reg.) Design a clock with a period of 10units and test the D-FF v) Design a 4 to 1 multiplexer using if and else statements vi) Design an 8-bit counter by using a forever loop, named block, and disabling of named block. The counter starts counting at count =5 and finishes at count =67. The count is incremented at positive edge of clock. The clock has a time period of 10. The counter starts through the loop only once and then is disabled (hint: use the disable statement)
Module 4
Lesson 22
Introduction to Hardware Description Languages - II
At the end of the lesson the student should be able to Call a task and a function in a Verilog code and distinguish between them Plan and write test benches to a Verilog code such that it can be simulated to check the desired results and also test the source code Explain what are User Defined Primitives, classify them and use them in code
2.1 2.1.1
Task and Function Task
Tasks are used in all programming languages, generally known as procedures or subroutines. Many lines of code are enclosed in -task....end task- brackets. Data is passed to the task, processing done, and the result returned to the main program. They have to be specifically called, with data in and out, rather than just wired in to the general netlist. Included in the main body of code, they can be called many times, reducing code repetition.
Tasks are defined in the module in which they are used. it is possible to define a task in a separate file and use compile directive 'include to include the task in the file which instantiates the task. Tasks can include timing delays, like posedge, negedge, # delay and wait. Tasks can have any number of inputs and outputs. The variables declared within the task are local to that task. The order of declaration within the task defines how the variables passed to the task by the caller are used. Task can take, drive and source global variables, when no local variables are used. When local variables are used it assigns the output only at the end of task execution. One task can call another task or function. Task can be used for modeling both combinational and sequential logics. A task must be specifically called with a statement, it cannot be used within an expression as a function can.
Syntax

task begins with the keyword task and ends with the keyword endtask Input and output are declared after the keyword task. Local variables are declared after input and output declaration.
module simple_task(); task convert; input [7:0] temp_in; Version 2 EE IIT, Kharagpur 3
output [7:0] temp_out; begin temp_out = (9/5) *( temp_in + 32) end endtask endmodule Example - Task using Global Variables module task_global (); reg[7:0] temp_in; reg [7:0] temp_out; task convert; always@(temp_in) begin temp_out = (9/5) *( temp_in + 32) end endtask endmodule
Calling a task
Lets assume that the task in example 1 is stored in a file called mytask.v. Advantage of coding the task in a separate file is that it can then be used in multiple modules. module task_calling (temp_a, temp_b, temp_c, temp_d); input [7:0] temp_a, temp_c; output [7:0] temp_b, temp_d; reg [7:0] temp_b, temp_d; ìnclude "mytask.v" always @ (temp_a) Begin convert (temp_a, temp_b); End always @ (temp_c) Begin convert (temp_c, temp_d); End Endmodule
Automatic (Re-entrant) Tasks

Tasks are normally static in nature. All declared items are statically allocated and they are shared across all uses of the task executing concurrently. Therefore if a task is called simultaneously from two places in the code, these task calls will operate on the same task variables. it is highly likely that the result of such operation be incorrect. Thus, keyword automatic is added in front of the task keyword to make the tasks re-entrant. All items declared within the automatic task are allocated dynamically for each invocation. Each task call operates in an independent space.
Example // Module that contains an automatic re-entrant task //there are two clocks, clk2 runs at twice the frequency of clk and is synchronous with it. module top; reg[15:0] cd_xor, ef_xor; // variables in module top reg[15:0] c,d,e,f ; // variables in module top task automatic bitwise_xor output[15:0] ab_xor ; // outputs from the task input[15:0] a,b ; // inputs to the task begin #delay ab_and = a & b ab_or= a| b; ab_xor= a^ b; end endtask // these two always blocks will call the bitwise_xor task // concurrently at each positive edge of the clk, however since the task is re-entrant, the //concurrent calls will work efficiently always @(posedge clk) bitwise_xor(ef_xor, e ,f ); always @(posedge clk2)// twice the frequency as that of the previous clk bitwise_xor(cd_xor, c ,d ); endmodule
2.1.2
Function
Function is very much similar to a task, with very little difference, e.g., a function cannot drive more then one output and, also, it can not contain delays.
Functions are defined in the module in which they are used. It is possible to define function in separate file and use compile directive 'include to include the function in the file which instantiates the task. Function can not include timing delays, like posedge, negedge, # delay. This means that a function should be executed in "zero" time delay. Function can have any number of inputs but only one output. The variables declared within the function are local to that function. The order of declaration within the function defines how the variables are passed to it by the caller. Function can take, drive and source global variables when no local variables are used. When local variables are used, it basically assigns output only at the end of function execution. Function can be used for modeling combinational logic. Function can call other functions, but can not call a task.
Syntax
A function begins with the keyword function and ends with the keyword endfunction Version 2 EE IIT, Kharagpur 5
Inputs are declared after the keyword function.
Example - Simple Function module simple_function(); function myfunction; input a, b, c, d; begin myfunction = ((a+b) + (c-d)); end endfunction endmodule Example - Calling a Function module function_calling(a, b, c, d, e, f); input a, b, c, d, e ; output f; wire f; ìnclude "myfunction.v" assign f = (myfunction (a,b,c,d)) ? e :0; endmodule
Automatic (Recursive) Function

Functions used normally are non recursive. But to eliminate problems when the same function is called concurrently from two locations automatic function is used. Example // define a factorial with recursive function module top; // define the function function automatic integer factorial: input[31:0] oper; integer i: begin if (operan>=2) factorial= factorial(oper -1)* oper:// recursive call else factorial=1; end endfunction // call the function integer result; initial begin result=factorial(4); // call the factorial of 7 $ display (Factorial of 4 is %0d, result) ; // Displays 24 end endmodule Version 2 EE IIT, Kharagpur 6
Constant function
A constant function is a regular verilog function and is used to reference complex values, can be used instead of constants.
Signed function
These functions allow the use of signed operation on function return values. module top; // signed function declaration // returns a 64 bit signed value function signed [63:0] compute _signed (input [63:0] vector); --endfunction // call to the signed function from a higher module if ( compute_signed(vector)<-3) begin -end -endmodule
2.1.3
System tasks and functions
Introduction
There are tasks and functions that are used to generate inputs and check the output during simulation. Their names begin with a dollar sign ($). The synthesis tools parse and ignore system functions, and, hence, they can be included even in synthesizable models.
$display, $strobe, $monitor

These commands have the same syntax, and display text on the screen during simulation. They are much less convenient than waveform display tools like GTKWave. or Undertow. $display and $strobe display once every time they are executed, whereas $monitor displays every time one of its parameters changes. The difference between $display and $strobe is that $strobe displays the parameters at the very end of the current simulation time unit rather than exactly where a change in it took place. The format string is like that in C/C++, and may contain format characters. Format characters include %d (decimal), %h (hexadecimal), %b (binary), %c (character), %s (string) and %t (time), %m (hierarchy level). %5d, %5b. b, h, o can be appended to the task names to change the default format to binary, octal or hexadecimal.
Syntax
$display ("format_string", par_1, par_2, ... ); Version 2 EE IIT, Kharagpur 7
$strobe ("format_string", par_1, par_2, ... ); $monitor ("format_string", par_1, par_2, ... ); $displayb ( as above but defaults to binary..); $strobeh (as above but defaults to hex..); $monitoro (as above but defaults to octal..);
$time, $stime, $realtime

These return the current simulation time as a 64-bit integer, a 32-bit integer, and a real number, respectively.
$reset, $stop, $finish

$reset resets the simulation back to time 0; $stop halts the simulator and puts it in the interactive mode where the user can enter commands; $finish exits the simulator back to the operating system.
$scope, $showscope
$scope(hierarchy_name) sets the current hierarchical scope to hierarchy_name. $showscopes(n) lists all modules, tasks and block names in (and below, if n is set to 1) the current scope.
$random
$random generates a random integer every time it is called. If the sequence is to be repeatable, the first time one invokes random give it a numerical argument (a seed). Otherwise, the seed is derived from the computer clock.
$dumpfile, $dumpvar, $dumpon, $dumpoff, $dumpall

These can dump variable changes to a simulation viewer like Debussy. The dump files are capable of dumping all the variables in a simulation. This is convenient for debugging, but can be very slow.
Syntax

$dumpfile("filename.dmp") $dumpvar dumps all variables in the design. $dumpvar(1, top) dumps all the variables in module top and below, but not modules instantiated in top. $dumpvar(2, top) dumps all the variables in module top and 1 level below. $dumpvar(n, top) dumps all the variables in module top and n-1 levels below. Version 2 EE IIT, Kharagpur 8
$dumpvar(0, top) dumps all the variables in module top and all level below. $dumpon initiates the dump. $dumpoff stop dumping.
$fopen, $fdisplay, $fstrobe $fmonitor and $fwrite

These commands write more selectively to files.

$fopen opens an output file and gives the open file a handle for use by the other commands. $fclose closes the file and lets other programs access it. $fdisplay and $fwrite write formatted data to a file whenever they are executed. They are the same except $fdisplay inserts a new line after every execution and $write does not. $strobe also writes to a file when executed, but it waits until all other operations in the time step are complete before writing. Thus initial #1 a=1; b=0; $fstrobe(hand1, a,b); b=1; will write write 1 1 for a and b. $monitor writes to a file whenever any one of its arguments changes.
Syntax

handle1=$fopen("filenam1.suffix") handle2=$fopen("filenam2.suffix") $fstrobe(handle1, format, variable list) //strobe data into filenam1.suffix $fdisplay(handle2, format, variable list) //write data into filenam2.suffix $fwrite(handle2, format, variable list) //write data into filenam2.suffix all on one line. //put in the format string where a new line is // desired.
2.2 2.2.1
Writing Testbenches Testbenches
are codes written in HDL to test the design blocks. A testbench is also known as stimulus, because the coding is such that a stimulus is applied to the designed block and its functionality is tested by checking the results. For writing a testbench it is important to have the design specifications of the "design under test" (DUT). Specifications need to be understood clearly and test plan made accordingly. The test plan, basically, documents the test bench architecture and the test scenarios (test cases) in detail. Example Counter Consider a simple 4-bit up counter, which increments its count when ever enable is high and resets to zero, when reset is asserted high. Reset is synchronous with clock. Version 2 EE IIT, Kharagpur 9
Code for Counter // Function : 4 bit up counter module counter (clk, reset, enable, count); input clk, reset, enable; output [3:0] count; reg [3:0] count; always @ (posedge clk) if (reset == 1'b1) begin count <= 0; end else if ( enable == 1'b1) begin count <= count + 1; end endmodule
2.2.2
Test Plan
We will write self checking test bench, but we will do this in steps to help you understand the concept of writing automated test benches. Our testbench environment will look something like shown in the figure.
DUT is instantiated in testbench which contains a clock generator, reset generator, enable logic generator, compare logic. The compare logic calculates the expected count value of the counter and compares its output with the calculated value
2.2.3

Test Cases
Reset Test : We can start with reset deasserted, followed by asserting reset for few clock ticks and deasserting the reset, See if counter sets its output to zero. Enable Test : Assert/deassert enable after reset is applied. Random Assert/deassert of enable and reset.
2.2.4 Creating testbenches

There are two ways of defining a testbench. Version 2 EE IIT, Kharagpur 10
The first way is to simply instantiate the design block(DUT) and write the code such that it directly drives the signals in the design block. In this case the stimulus block itself is the toplevel block. In the second style a dummy module acts as the top-level module and both the design(DUT) and the stimulus blocks are instantiated within it. Generally, in the stimulus block the inputs to DUT are defined as reg and outputs from DUT are defined as wire. An important point is that there is no port list for the test bench. An example of the stimulus block is given below. Note that the initial block below is used to set the various inputs of the DUT to a predefined logic state.
Test Bench with Clock generator

module counter_tb; reg clk, reset, enable; wire [3:0] count; counter U0 ( .clk (clk), .reset (reset), .enable (enable), .count (count) initial begin clk = 0; reset = 0; enable = 0; end always #5 clk = !clk; endmodule Initial block in verilog is executed only once. Thus, the simulator sets the value of clk, reset and enable to 0(0 makes all this signals disabled). It is a good design practice to keep file names same as the module name. Another elaborated instance of the testbench is shown below. In this instance the usage of system tasks has been explored. module counter_tb; reg clk, reset, enable; wire [3:0] count; counter U0 ( .clk (clk), .reset (reset), .enable (enable), .count (count) initial begin clk = 0; Version 2 EE IIT, Kharagpur 11
reset = 0; enable = 0; end always #5 clk = !clk; initial begin $dumpfile ( "counter.vcd" ); $dumpvars; end initial begin $display( "\t\ttime,\tclk,\treset,\tenable,\tcount" ); $monitor( "%d,\t%b,\t%b,\t%b,\t%d" ,$time, clk,reset,enable,count); end initial #100 $finish; //Rest of testbench code after this line Endmodule $dumpfile is used for specifying the file that simulator will use to store the waveform, that can be used later to view using a waveform viewer. (Please refer to tools section for freeware version of viewers.) $dumpvars basically instructs the Verilog compiler to start dumping all the signals to "counter.vcd". $display is used for printing text or variables to stdout (screen), \t is for inserting tab. Syntax is same as printf. Second line $monitor is bit different, $monitor keeps track of changes to the variables that are in the list (clk, reset, enable, count). When ever anyone of them changes, it prints their value, in the respective radix specified. $finish is used for terminating simulation after #100 time units (note, all the initial, always blocks start execution at time 0)
Adding the Reset Logic

Once we have the basic logic to allow us to see what our testbench is doing, we can next add the reset logic, If we look at the testcases, we see that we had added a constraint that it should be possible to activate reset anytime during simulation. To achieve this we have many approaches, but the following one works quite well. There is something called 'events' in Verilog, events can be triggered, and also monitored to see, if a event has occurred. Lets code our reset logic in such a way that it waits for the trigger event "reset_trigger" to happen. When this event happens, reset logic asserts reset at negative edge of clock and deasserts on next negative edge as shown in code below. Also after de-asserting the reset, reset logic triggers another event called "reset_done_trigger". This trigger event can then be used at some where else in test bench to sync up.
Code for the reset logic

event reset_trigger; Version 2 EE IIT, Kharagpur 12
event reset_done_trigger; initial begin forever begin @ (reset_trigger); @ (negedge clk); reset = 1; @ (negedge clk); reset = 0; reset_done_trigger; end end
Adding test case logic

Moving forward, lets add logic to generate the test cases, ok we have three testcases as in the first part of this tutorial. Lets list them again. Reset Test : We can start with reset deasserted, followed by asserting reset for few clock ticks and deasserting the reset, See if counter sets its output to zero. Enable Test: Assert/deassert enable after reset is applied. Random Assert/deassert of enable and reset.
Adding compare Logic

To make any testbench self checking/automated, a model that mimics the DUT in functionality needs to be designed.For the counter defined previously the model looks similar to: Reg [3:0] count_compare; always @ (posedge clk) if (reset == 1'b1) count_compare <= 0; else if ( enable == 1'b1) count_compare <= count_compare + 1; Once the logic to mimic the DUT functionality has been defined, the next step is to add the checker logic. The checker logic at any given point keeps checking the expected value with the actual value. Whenever there is an error, it prints out the expected and the actual values, and, also, terminates the simulation by triggering the event terminate_sim. This can be appended to the code above as follows: always @ (posedge clk) if (count_compare != count) begin $display ( "DUT Error at time %d" , $time); $display ( " Expected value %d, Got Value %d" , count_compare, count); #5 -> terminate_sim; end
2.3 User Defined Primitives

2.3.1 Verilog comes with built in primitives like gates, transmission gates, and switches. This set sometimes seems to be rather small and a more complex primitive set needs to be constructed. Verilog provides the facility to design these primitives which are known as UDPs or User Defined Primitives. UDPs can model: Combinational Logic Sequential Logic One can include timing information along with the UDPs to model complete ASIC library models.
Syntax
UDP begins with the keyword primitive and ends with the keyword endprimitive. UDPs must be defined outside the main module definition. This code shows how input/output ports and primitve is declared. primitive udp_syntax ( a, // Port a b, // Port b c, // Port c d // Port d ) output a; input b,c,d; // UDP function code here endprimitive Note:

A UDP can contain only one output and up to 10 inputs max. Output Port should be the first port followed by one or more input ports. All UDP ports are scalar, i.e. Vector ports are not allowed. UDP's can not have bidirectional ports.
Body
Functionality of primitive (both combinational and sequential) is described inside a table, and it ends with reserve word endtable (as shown in the code below). For sequential UDPs, one can use initial to assign initial value to output. // This code shows how UDP body looks like primitive udp_body ( a, // Port a b, // Port b c // Port c ); input b,c; Version 2 EE IIT, Kharagpur 14
// UDP function code here // A = B | C; table // B C : A ? 1 : 1; 1 ? : 1; 0 0 : 0; endtable endprimitive Note: A UDP cannot use 'z' in input table and instead it uses x.
2.3.2
Combinational UDPs
In combinational UDPs, the output is determined as a function of the current input. Whenever an input changes value, the UDP is evaluated and one of the state table rows is matched. The output state is set to the value indicated by that row. Let us consider the previously mentioned UDP.
TestBench to Check the above UDP

include "udp_body.v" module udp_body_tb(); reg b,c; wire a; udp_body udp (a,b,c); initial begin $monitor( " B = %b C = %b A = %b" ,b,c,a); b = 0; c=0; #1 b = 1; #1 c = 1; #1 b = 1'bx; #1 c = 0; #1 b = 1; #1 c = 1'bx; #1 b = 0; #10 $finish; end endmodule
Sequential UDPs
Sequential UDPs differ in the following manner from the combinational UDPs The output of a sequential UDP is always defined as a reg An initial statement can be used to initialize output of sequential UDPs Version 2 EE IIT, Kharagpur 15
The format of a state table entry is somewhat different There are 3 sections in a state table entry: inputs, current state and next state. The three states are separated by a colon(:) symbol. The input specification of state table can be in term of input levels or edge transitions The current state is the current value of the output register. The next state is computed based on inputs and the current state. The next state becomes the new value of the output register. All possible combinations of inputs must be specified to avoid unknown output.
Level sensitive UDPs // define level sensitive latch by using UDP primitive latch (q, d, clock, clear) //declarations output q; reg q; // q declared as reg to create internal storage input d, clock, clear; // sequential UDP initialization // only one initial statement allowed initial q=0; // initialize output to value 0 // state table table // d clock clear : q : q+ ; ? ? 1 : ? : 0 ;// clear condition // q+ is the new output value 1 1 0 : ? : 1 ;// latch q = data = 1 0 1 0 : ? : 0 ;// latch q = data = 0
? 0 endtable endprimitive
0 : ? : - ;// retain original state if clock = 0
Edgesensitive UDPs //Define edge sensitive sequential UDP; primitive edge_dff(output reg q = 0 input d, clock, clear); // state table table // d clock clear : q : q+ ; ? ? 1 : ? : 0 ; // output=0 if clear =1 Version 2 EE IIT, Kharagpur 16
? ? (10): ? : - ; // ignore negative transition of clear 1 (10) 0 : ? : 1 ;// latch data on negative transition 0 (10) 0 : ? : 0 ;// clock ? ? ? (1x) (0?) (x1) 0 : ? : - ;// hold q if clock transitions to unknown state 0 : ? : - ;// ignore positive transitions of clock 0 : ? : - ;// ignore positive transitions of clock 0 : ? : - ;// ignore any change in d if clock is steady
(??) ? endtable endprimitive
Some Exercises 1. Task and functions

i. ii. iii. iv. Define a function to multiply 2 four bit number. The output is a 32 bit value. Invoke the function by using stimulus and check results define a function to design an 8-function ALU that takes 2 bit numbers a and computes a 5 bit result out based on 3 bit select signal . Ignore overflow or underflow bits. Define a task to compute even parity of a 16 bit number. The result is a 1-bit value that is assigned to the output after 3 positive edges of clock. (Hint: use a repeat loop in the task) Create a design a using a full adder. Use a conditional compilation (idef). Compile the fulladd4 with def parameter statements in the text macro DPARAM is defined by the 'define 'statement; otherwise compile the full adder with module instance parameter values. Consider a full bit adder. Write a stimulus file to do random testing of the full adder. Use a random number to generate a 32 bit random number. Pick bits 3:0 and apply them to input a; pick bits 7:4 and apply them to input b. use bit 8 and apply it to c_in. apply 20 random test vectors and see the output.
v.
2. Timing
i) a. Consider the negative edge triggered with the asynchronous reset D-FF shown below. Write the verilog description for the module D-FF. describe path delays using parallel connection.
b Modify the above if all the path delays are 5. ii) Assume that a six delay specification is to be specified for all the path delays. All path delays are equal. In the specify block define parameters t_01=4, t_10=5, t_0z=7,t_z1=2, t_z0=8. Using the previous DFF write the six delay specifications for all the paths.
3. UDP
i. ii. Define a positive edge triggered d-f/f with clear as a UDP. Signal clear is active low. Define a level sensitive latch with a preset signal. Inputs are d, clock, and preset. Output is q. If clock=0, then q=d. If clock=1or x then q is unchanged. If preset=1, then q=1. If preset=0 then q is decided by clock and d signals. If preset=x then q=x. Define a negative edge triggered JK FF, jk_ff with asynchronous preset and clear as a UDP. Q=1when preset=1 and q=0 when clear=1
iii.
T he table for JK FF is as follows J 0 0 1 1 K 0 1 0 1 qn+1 qn 0 1 qn
Module 4
Lesson 23
Introduction to Hardware Description Languages-III
At the end of the lesson the student should be able to Interface Verilog code to C & C++ using Programming Language Interface Synthesize a Verilog code and generate a netlist for layout Verify the generated code, and carry out optimization and debugging Classify various types of flows in Verification
3.1 3.1.1
Programming Language interface Verilog
PLI (Programming Language Interface) is a facility to invoke C or C++ functions from Verilog code. The function invoked in Verilog code is called a system call. Examples of built-in system calls are $display, $stop, $random. PLI allows the user to create custom system calls, something that Verilog syntax does not allow to do. Some of these are:
Power analysis. Code coverage tools. Can modify the Verilog simulation data structure - more accurate delays. Custom output displays. Co-simulation. Designs debug utilities. Simulation analysis. C-model interface to accelerate simulation. Testbench modeling.
To achieve the above few application of PLI, C code should have the access to the internal data structure of the Verilog simulator. To facilitate this Verilog PLI provides with something called acc routines or access routines
How it Works?
Write the functions in C/C++ code. Compile them to generate shared lib (*.DLL in Windows and *.so in UNIX). Simulator like VCS allows static linking. Use this Functions in Verilog code (Mostly Verilog Testbench).
Based on simulator, pass the C/C++ function details to simulator during compile process of Verilog Code (This is called linking, and you need to refer to simulator user guide to understand how this is done). Once linked just run the simulator like any other Verilog simulation.
The block diagram representing above is as follows:
During execution of the Verilog code by the simulator, whenever the simulator encounters the user defined system tasks (the one which starts with $), the execution control is passed to PLI routine (C/C++ function). Example - Hello World Define a function hello ( ), which when called will print "Hello World". This example does not use any of the PLI standard functions (ACC, TF and VPI). For exact linking details, the simulator manuals must be referred. Each simulator implements its own strategy for linking with the C/C++ functions.
C Code
#include < stdio.h > Void hello () { printf ( "\nHello World\n" );
Verilog Code
module hello_pli (); initial begin $hello; #10 $finish; end endmodule
3.1.2
Running a Simulation
Once linking is done, simulation is run as a normal simulation with slight modification to the command line options. These modifications tell the simulator that the PLI routines are being used (e.g. Modelsim needs to know which shared objects to load in command line). Writing PLI Application (counter example) Write the DUT reference model and Checker in C and link that to the Verilog Testbench. The requirements for writing a C model using PLI

Means of calling the C model, when ever there is change in input signals (Could be wire or reg or types). Means to get the value of the changes signals in Verilog code or any other signals in Verilog code from inside the C code. Means to drive the value on any signal inside the Verilog code from C code.
There are set of routines (functions), that Verilog PLI provides which satisfy the above requirements
3.1.3
PLI Application Specification
This can be well understood in context to the above counter logic. The objective is to design the PLI function $counter_monitor and check the response of the designed counter using it. This problem can be addressed to in the following steps: Implement the Counter logic in C. Implement the Checker logic in C. Terminate the simulation, whenever the checker fails. This is represented in the block diagram in the figure 23.2.
Calling the C function

The change in clock signal is monitored and with its change the counter function is executed The acc_vcl_add routine is used. The syntax can be obtained in the Verilog PLI LRM. Version 2 EE IIT, Kharagpur 5
acc_vcl_add routine basically monitors the list of signals and whenever any of the monitor signals change, it calls the user defined function (this function is called the Consumer C routine). The vcl routine has four arguments.

Handle to the monitored object Consumer C routine to call when the object value changes String to be passed to consumer C routine Predefined VCL flags: vcl_verilog_logic for logic monitoring vcl_verilog_strength for strength monitoring
acc_vcl_add (net, display_net, netname, vcl_verilog_logic);
C Code Basic
The desired C function is Counter_monitor , which is called from the Verilog Testbench. As like any other C code, header files specific to the application are included.Here the include e file comprises of the acc routines. The access routine acc_initialize initializes the environment for access routines and must be called from the C-language application program before the program invokes any other access routines. Before exiting a C-language application program that calls access routines, it is necessary to exit the access routine environment by calling acc_close at the end of the program. #include < stdio.h > #include "acc_user.h" typedef char * string; handle clk ; handle reset ; handle enable ; handle dut_count ; int count ; void counter_monitor() { acc_initialize(); clk = acc_handle_tfarg(1); reset = acc_handle_tfarg(2); enable = acc_handle_tfarg(3); dut_count = acc_handle_tfarg(4); acc_vcl_add(clk,counter,null,vcl_verilog_logic); acc_close(); } void counter () printf( "Clock changed state\n" ); Handles are used for accessing the Verilog objects. The handle is a predefined data type that is a pointer to a specific object in the design hierarchy. Each handle conveys information to access routines about a unique instance of an accessible object information about the object type and, also, how and where the data pertaining to it can be obtained. The information of specific object Version 2 EE IIT, Kharagpur 6
to handle can be passed from the Verilog code as a parameter to the function $counter_monitor. This parameters can be accessed through the C-program with acc_handle_tfarg( ) routine. For instance clk = acc_handle_tfarg(1) basically makes that the clk is a handle to the first parameter passed. Similarly, all the other handles are assigned clk can now be added to the signal list that needs to be monitored using the routine acc_vcl_add(clk, counter ,null , vcl_verilog_logic). Here clk is the handle, counter is the user function to execute, when the clk changes.
Verilog Code
Below is the code of a simple testbench for the counter example. If the object being passed is an instance, then it should be passed inside double quotes. Since here all the objects are nets or wires, there is no need to pass them inside the double quotes. module counter_tb(); reg enable;; reg reset; reg clk_reg; wire clk; wire [3:0] count; initial begin clk = 0; reset = 0; $display( "Asserting reset" ); #10 reset = 1; #10 reset = 0; $display ( "Asserting Enable" ); #10 enable = 1; #20 enable = 0; $display ( "Terminating Simulator" ); #10 $finish; End Always #5 clk_reg = !clk_reg; assign clk = clk_reg; initial begin $counter_monitor(top.clk,top.reset,top.enable,top.count); end counter U( clk (clk), reset (reset), enable (enable), count (count) ); endmodule Version 2 EE IIT, Kharagpur 7
Access Routines
Access routines are C programming language routines that provide procedural access to information within Verilog. Access routines perform one of two operations: Extract information pertaining to an object from the internal data representation. Write information pertaining to an object into the internal data representation.
Program Flow using access routines

include < acc_user.h > void pli_func() { acc_initialize(); // Main body: Insert the user application code here acc_close();

acc_user.h : all data-structure related to access routines acc_initialize( ) : initialize variables and set up environment main body : User-defined application acc_close( ) : Undo the actions taken by the function acc_initialize( )
Utility Routines
Interaction between the Verilog tool and the users routines is handled by a set of programs that are supplied with the Verilog toolset. Library functions defined in PLI1.0 perform a wide variety of operations on the parameters passed to the system call and are used to do simulation synchronization or implementing conditional program breakpoint.
3.2 3.2.1
Verilog and Synthesis What is logic synthesis?
Logic synthesis is the process of converting a high-level description of design into an optimized gate-level netlist representation. Logic synthesis uses standard cell libraries which consist of simple cells, such as basic logic gates like and, or, and nor, or macro cells, such as adder, muxes, memory, and flip-flops. Standard cells put together form the technology library. Normally, technology library is known by the minimum feature size (0.18u, 90nm). A circuit description is written in Hardware description language (HDL) such as Verilog Design constraints such as timing, area, testability, and power are considered during synthesis. Typical design flow with a large example is given in the last example of this lesson.
3.2.2
Impact of automation on Logic synthesis
For large designs, manual conversions of the behavioral description to the gate-level representation are more prone to error. Prior to the development of modern sophisticated synthesis tools the earlier designers could never be sure that whether after fabrication the design constraints will be met. Moreover, a significant time of the design cycle was consumed in converting the highlevel design into its gate level representation. On account of these, if the gate level design did not meet the requirements then the turnaround time for redesigning the blocks was also very high. Each designer implemented design blocks and there was very little consistency in design cycles, hence, although the individual blocks were optimized but the overall design still contained redundant logics. Moreover, timing, area and power dissipation was fabrication process specific and, hence, with the change of processes the entire process needed to be changed with the design methodology. However, now automated logic synthesis has solved these problems. The high level design is less prone to human error because designs are described at higher levels of abstraction. High level design is done without much concentration on the constraints. The tool takes care of all the constraints and sees to it that the constraints are taken care of. The designer can go back, redesign and synthesize once again very easily if some aspect is found unaddressed. The turnaround time has also fallen down considerably. Automated logic synthesis tools synthesize the design as a whole and, thus, an overall design optimization is achieved. Logic synthesis allows a technology independent design. The tools convert the design into gates using cells from the standard cell library provided by the vendor. Design reuse is possible for technology independent designs. If the technology changes the tool is capable of mapping accordingly. Constructs Not Supported in Synthesis Notes Only in testbenches Events make more sense for syncing test bench components Real data type not supported Version 2 EE IIT, Kharagpur 9
Construct Type Initial event real
time force and release assign and deassign
Time data type not supported force and release of data types not supported assign and deassign of reg data types is not supported, but, assign on wire data type is supported
Example of a Non-Synthesizable Verilog construct

Codes containing one or more of the above constructs are not synthesizable. But even with synthesizable constructs, bad coding may cause serious synthesis concerns. Example - Initial Statement module synthesis_initial( clk,q,d); input clk,d; output q; reg q; initial begin q <= 0; end always @ (posedge clk) begin q <= d; end endmodule Delays are also non-synthesizable e.g. a = #10 b; This code is useful only for simulation purpose. Synthesis tool normally ignores such constructs, and just assumes that there is no #10 in above statement, treating the above code as just a = b.
3.2.3
Constructs and Their Description

Keyword
input, inout, output parameter module wire, reg, tri module instances primitive gate instances function , task Vectors are allowed Eg- nand (out,a,b) bad idea to code RTL this way. Timing constructs ignored
Construct Type
ports parameters module definition signals and variables instantiation function and tasks
Description
Use inout only at IO level. This makes design more generic
procedural procedural blocks data flow named Blocks loops
always, if, then, else, case, casex, casez begin, end, named blocks, disable assign disable for, while, forever
initial is not supported Disabling of named blocks allowed Delay information is ignored Disabling of named block supported. While and forever loops must contain @(posedge clk) or @(negedge clk)
3.2.4
Operators and Their Description

Operator Symbol
* / + % + ! && || > < >= <= == != & ~& | ~| ^ ^~ ~^ >>
Operator Type
Arithmetic
DESCRIPTION
Multiply Division Add Subtract Modulus Unary plus Unary minus Logical negation Logical and Logical or Greater than Less than Greater than or equal Less than or equal Equality inequality Bitwise negation nand or nor xor xnor Right shift Version 2 EE IIT, Kharagpur 11
Logical
Relational
Equality Reduction
Shift
Concatenation Conditional
<< {} ?
Left shift Concatenation conditional
Constructs Supported In Synthesis

Construct Type
ports parameters module definition signals and variables instantiation function and tasks procedural procedural blocks data flow named Blocks loops parameter module wire, reg, tri module instances primitive gate instances function , task always, if, then, else, case, casex, casez begin, end, named blocks, disable assign disable for, while, forever
Keyword
input, inout, output
Description
Use inout only at IO level. This makes design more generic Vectors are allowed Eg- nand (out,a,b) bad idea to code RTL this way. Timing constructs ignored initial is not supported Disabling of named blocks allowed Delay information is ignored Disabling of named block supported. While and forever loops must contain @(posedge clk) or @(negedge clk)
3.2.5
Overall Logic Circuit Modeling and Synthesis in brief
Combinational Circuit modeling using assign

RTL description This comprises the high level description of the circuit incorporating the RTL constructs. Some functional verification is also done at this level to ensure the validity of the RTL description. RTL for magnitude comparator // module magnitude comparator module magnitude_comparator(A_gt_B, A_lt_B, A_eq_B, A,_B); //comparison output; output A_gt_B, A_lt_B, A_eq_B ; // 4- bit numbers input input [3:0] A,B; assign A_gt_B= (A>B) ; // A greater than B assign A_lt_B= (A<B) ; // A greater than B Version 2 EE IIT, Kharagpur 12
assign A_eq_B= (A==B) ; // A greater than B endmodule
Translation
The RTL description is converted by the logic synthesis tool to an optimized, intermediate, internal representation. It understands the basic primitives and operators in the Verilog RTL description but overlooks any of the constraints.
Logic optimization
The logic is optimized to remove the redundant logic. It generates the optimized internal representation.
Technology library
The technology library contains standard library cells which are used during synthesis to replace the behavioral description by the actual circuit components. These are the basic building blocks. Physical layout of these, are done first and then area is estimated. Finally, modeling techniques are used to estimate the power and timing characteristics. The library includes the following: Functionality of the cells Area of the different cell layout Timing information about the various cells Power information of various cells
The synthesis tools use these cells to implement the design. // Library cells for abc_100 technology VNAND// 2 input nand gate VAND// 2 input and gate VNOR // 2 input nor gate VOR// 2 input or gate VNOT// not gate VBUF// buffer
Design constraints
Any circuit must satisfy at least three constraints viz. area, power and timing. Optimization demands a compromise among each of these three constraints. Apart from these operating conditions-temperature etc. also contribute to synthesis complexity.
Logic synthesis
The logic synthesis tool takes in the RTL design, and generates an optimized gate level description with the help of technology library, keeping in pace with design constraints. Version 2 EE IIT, Kharagpur 13
Verification of the gate level netlist

An optimized gate level netlist must always be checked for its functionality and, in addition, the synthesis tool must always serve to meet the timing specifications. Timing verification is done in order to manipulate the synthesis parameters in such a way that different timing constraints like input delay, output delay etc. are suitably met.
Functional verification
Identical stimulus is run with the original RTL and synthesized gate-level description of the design. The output is compared for matches. module stimulus reg [3:0] A, B; wire A_GT_B, A_LT_B, A_EQ_B; // instantiate the magnitude comparator MC (A_GT_B, A_LT_B, A_EQ_B,. A, B); initial $ monitor ($time, A=%b, B=%b, A_GT_B=%b, A_LT_B=%b, A_EQ_B=%b, A_GT_B, A_LT_B, A_EQ_B, A, B) // stimulate the magnitude comparator endmodule
3.3 3.3.1
Verification Traditional verification flow
Traditional verification follows the following steps in general. 1. To verify, first a design specification must be set. This requires analysis of architectural trade-offs and is usually done by simulating various architectural models of the design. 2. Based on this specification a functional test plan is created. This forms the framework for verification. Based on this plan various test vectors are applied to the DUT (design under test), written in verilog. Functional test environments are needed to apply these test vectors. 3. The DUT is then simulated using traditional software simulators. 4. The output is then analyzed and checked against the expected results. This can be done manually using waveform viewers and debugging tools or else can be done automatically by verification tools. If the output matches expected results then verification is complete.
5. Optionally, additional steps can be taken to decrease the risk of future design respin. These include Hardware Acceleration, Hardware Emulation and assertion based Verification.
Functional verification
When the specifications for a design are ready, a functional test plan is created based on them. This is the fundamental framework of the functional verification. Based on this test plan, test vectors are selected and given as input to the design_under_test(DUT). The DUT is simulated to compare its output with the desired results. If the observed results match the expected values, the verification part is over.
Functional verification Environment

The verification part can be divided into three substages : Block level verification: verification is done for blocks of code written in verilog using a number of test cases. Full chip verification: The goal of full chip verification, i.e, all the feature of the full chip described in the test plan is complete. Extended verification: This stage depicts the corner state bugs.
3.3.2
Formal Verification
A formal verification tool proves a design by manipulating it as much as possible. All input changes must, however, conform to the constraints for behaviour validation. Assertions on interfaces act as constraints to the formal tool. Assertions are made to prove the assertions in the RTL code false. However, if the constraints are too tight then the tool will not explore all possible behaviours and may wrongly report the design as faulty. Both the formal and the semi-formal methodologies have come into precedence with the increasing complexity of design.
3.3.3
Semi- formal verification
Semi formal verification combines the traditional verification flow using test vectors with the power and thoroughness of formal verification. Semi-formal methods supplement simulation with test vectors Embedded assertion checks define the properties targeted by formal methods Embedded assertion checks defines the input constraints Semi-formal methods explore limited space exhaustibility from the states reached by simulation, thus, maximizing the effect of simulation.The exploration is limited to a certain point around the state reached by simulation.
3.3.4
Equivalence checking
After logic synthesis and place and route tools create a gate level netlist and physical implementations of the RTL design, respectively, it is necessary to check whether these functionalities match the original RTL design. Here comes equivalence checking. It is an application of formal verification. It ensures that the gate level or physical netlist has the same functionality as the Verilog RTL that was simulated. A logical model of both the RTL and gate level representations is constructed. It is mathematically proved that their functionality are same.
3.4 3.4.1
Some Exercises PLI
i) Write a user defined system task, $count_and_gates, which counts the number of and gate primitive in a module instance. Hierarchical module instance name is the input to the task. Use this task to count the number of and gates in a 4-to-1 multiplexer.
3.4.2
Verilog and Synthesis
i) A 1-bit full subtractor has three inputs x, y, z(previous borrow) and two outputs D(difference) and B(borrow). The logic equations for D & B are as follows D=xyz+ xyz+ xyz + xyz B= xy + xz+ yz Write the verilog RTL description for the full subtractor. Synthesize the full using any technology library available. Apply identical stimulus to the RTL and gate level netlist and compare the outputs. ii) Design a 3-8 decoder, using a Verilog RTL description. A 3-bit input a[2:0] is provided to the decoder. The output of the decoder is out[7:0]. The output bit indexed by a[2:0] gets the value 1, the other bits are 0. Synthesize the decoder, using any technology library available to you. Optimize for smallest area. Apply identical stimulus to the RTL and gate level netlist and compare the outputs. iii) Write the verilog RTL description for a 4-bit binary counter with synchronous reset that is active high.(hint: use always loop with the @ (posedge clock)statement.) synthesize the counter using any technology library available to you. Optimize for smallest area. Apply identical stimulus to the RTL and gate level netlist and compare the outputs.
Module 5
Embedded Communications
Lesson 24
Parallel Data Communication
After going through this lesson the student would be able to Explain why a parallel interface is needed in an embedded system List the names of common parallel bus standards along with their important features Distinguish between the GPIB and other parallel data communication standards Describe how data communication takes place between the controller, talker and listener devices connected via a GPIB interface
Questions
Question
Parallel Data Communication is preferred when the following conditions are satisfied: i) distance between the devices is small ii) the volume of traffic is small iii) the required data rate is high The IEEE 488 standard was originally developed by The devices connected in a GPIB system are classified into the following types of categories Each device connected in a GPIB system has an n-bit address where n=
Visual (If any)
Ans.
D
T T T
T F F
F T T
T F T C C
Intel IBM HP Sun 1 2 3 4
Parallel Data Communication

Data processed by an embedded processor need to be conveyed to other components in the system, namely, an instrument, a smart actuator, a hard disk or a communication network for onward transmission to a central data warehouse. Similarly data may have to be fetched from a digital oscilloscope, a CD-ROM Drive or a sensor from the field. Typically, when the physical distance between the processor and the other component is small, say a within a few meters and a high volume of data need to be conveyed in a short time, parallel bus interfaces are used. In this lesson, we first learn about one of the most popular parallel bus standards, namely the IEEE 488 standard, also known as the GPIB (formerly HPIB). Next we compare and contrast it with the other similar standards. Finally we discuss about its future particularly in view of the recently emerging high-serial bus standards like the USB. Version 2 EE IIT, Kharagpur 3
go to top
The IEEE 488 (GPIB, HPIB) Standard

This BUS-SYSTEM was designed by Hewlett Packards (Currently known as Agilent Technologies) Test & Measurement Division, in 1960s and was named as HPIB, a short form for Hewlett Packard Interface Bus, to control programmable instruments that were manufactured by the company. It was a short-range digital communications cable standard. Because of its success and proven reliability, in 1973 the HPIB bus became an American Standard, adopted by the IEEE and renamed as GPIB, for General Purpose Interface Bus. The standard's number is IEEE488.1. In parallel, the International Electronic Commission (IEC), responsible for the international standardization outside the U.S., approved the standard and called it IEC625.1. Due to introduction of a new naming scheme for all standards, it was renamed to IEC60625.1 later. There was a slight difference between the IEEE488.1 and IEC625.1. The IEC625.1 standard used a 25 pin DSUB connector for the bus, the IEEE488.1 standard favored a Centronics-like 24 pin connector. Today, the 24-pin connector is always used, but there are also adaptors available in case older instruments are equipped with a 25-pin DSUB connector. The '.1 extension of IEEE488.1 / IEC60625.1 indicates that there are several layers of interface standards. In fact, there is a whole 'family' of standards: o o IEEE488.1 / IEC60625.1 defines the physical layer of the bus system. IEEE488.2 / IEC60625.2 is not a revision of the '.1' standard, it extends its functionality: go to top A command language (syntax) is defined and common properties of instruments are defined. Same command names result in similar actions. In contrast to the '.1' standard that defines physical means like cables, timing and so on, the '.2' standard focuses on the instrument model. An application of IEEE488.2 / IEC60625.2 is IEEE1174. It is currently adopted. Briefly stated, it translates GPIB functionality to a serial RS232 line, albeit without networking go to top capability. It is intended for low cost instruments.
Thus GPIB has several versions and makes which reflect the same thing, courtesy to the various developments pertaining to its history. GPIB Electrical and Mechanical Specifications: The BUS actually comprises a 24 Wire Cable with both MALE and FEMALE Connectors at each of the individual ends to facilitate the connectivity in a daisy-chain network topology. Standard TTL level signals are assumed for the ACTIVE, INACTIVE and TRANSITION states both for Control and Communication. Specified Transfer Rate: 1 Mega Byte per second. Cable length: Twenty meters between Controller and one Device or Two meters between two devices
Device fanout : Number of instruments may range from Eight to Ten. CLASSIFICATION of Instruments or Devices (as are called in the Standard) connected through this bus system: TALKER: Designated to send data to other instruments eg., Tape Readers, Data Recorders, Digital Voltmeters, Digital Oscilloscopes etc. LISTENER:Designated to receive data from other instruments or Controllers, eg., Printers, Display devices, Programmable Power Supplies, Programmable Signal Generators etc. CONTROLLER: Decision maker for the designation of an instrument either as a TALKER or a LISTNER. Usually this role is carried out by a computer.
go to top
All the Talkers, Listeners and the Controller are connected to each other via the following three different SYSTEM BUSES: (Also see A TYPICAL SEQUENCE of DATA FLOW) Bidirectional Databus Bus management Lines Handshake Lines Eight BI-DIRECTIONAL DATALINES have the following functionalities. These are used to transfer Data, Addresses, Commands and Status information in the form of Bytes. DATA : Transferred as BYTES with the reception of each data byte being duly acknowledged. ADDRESSES :Instruments intended for use on a GPIB usually have some switches which allow a selection of 5-bit address the instrument will be assuming on the BUS. Addresses are characterized as : o TALK ADDRESSES o LISTEN ADDRESSES CONTROL and COMMAND: BYTES containing information for orienting the devices to perform the functions like listen, talk etc. These commands can be referred to as the CONTROL WORDs necessary for establishing efficient communication between the Controller and the other class of devices. The various commands are: (also see the COMMAND TABLE) o o o o UNIVERSAL Commands UNLISTEN Commands UNTALK Commands SECONDARY Commands
Note: The Commands are sent by the Controller to the instruments. Five BUS MANAGEMENT LINES have the functionalities as follows: Version 2 EE IIT, Kharagpur 5
o o o o o
IFC ATN SRQ REN EOI
: Interface Clear : Attention : Service Request : Remote Enable : Identify
go to top
go to top
Three HANDSHAKE LINES having the functionality of coordinating the transfer of data bytes on the data bus.These functions can be elaborated as : o DAV : Data Valid o NRFD : Not Ready For Data o NDAC : Not Data Accepted Note: The Handshake Signals are necessary to facilitate transmission at different BANDWIDTHS (Data Rates).
go to top
go to top
The Block Diagram
go to top
go to top
The SEQUENCE of events pertaining to the actual communication is as follows: o o o o Power On: Controller takes up the Control of Buses and sends out the IFC signal to set all instruments on the bus to a known state. Controller starts performing the desired series of measurements or tests. Controller asserts the ATN line low and starts sending the command address codes to the talkers and the listeners. The CONTROL WORD Structure:
The Control Words are given in brief in the Command Table: The Command Table COMMAND Ignored X1111111 CONTROL WORD
Listen Command X01 + 5 LSBs (actual address) Talk Command X10 + 5 LSBs (actual address) Universal Command X000 + 4 LSBs (16 Commands) Unlisten Command X0111111 Untalk Command X1011111 Secondary Commands X11 + 5 LSBs (actual address) Note: All the Commands Control words are activated only if the ATN line is asserted low; otherwise, they are in a disabled state. X here represents the dont care condition. + here represents the NEXT indicated number of LSBs. The following are the most important features: The Universal Commands go to all the Listeners and Talkers. The Untalk or Unlisten Commands are for TURNING on or off the indicated device. In addition to all the above-indicated tasks the controller checks for the SRQ line in the context of SERVICE REQUEST.
On finding it as LOW, it POLLS each device on the bus in a serial top go to fashion, that is,one-by-one or in parallel. o It then determines the source of the SRQ, and asserts the ATN line low. o It then sends the relevant information or command to all the listners and the talkers depending on the data utility. The controller again asserts the ATN line high and data is transferred directlytop go to from the TALKER to the LISTENERS using a double-handshake-signal sequence. Some information about DAV, NRFD, and NDAC are given below: All are OPEN-COLLECTOR. o A listener can hold NRFD low to indicate that it is not ready for data. o A listener can hold NDAC low to indicate that it has not yet accepted a data byte. An Instance for the above two points can be sited as follows: o All Listeners release the NRFD line indicating that they are ready to receive data. o The Talker assets the DAV low to indicate that a valid data is on the bus. o All the addressed listeners then pull NRFD low and start accepting the data, NDAC line being asserted as high. o The talker, on sensing the NDAC line getting high unasserts the corresponding DAV signal. The listeners pull NDAC low again, and the sequence is repeated until the talker has sent all the data bytes it has to send. o The Data Transfer Rate depends on the rate at which the slowest listener can accept the data. o On completion of the data transfer the talker pulls the EOI line of the management group of signals low to indicate the transfer completion. o Finally, the controller takes control of all the data bus and sends Untalk and Unlisten commands to all the talkers and the listeners, and continues executing its pre-specified internal instructions.
Other Parallel Bus Standards

The following are some other popular Parallel bus standards. They have been designed mainly for a particular type of application, namely, within a processor mother-board to interface various peripherals. 1. ISA (IBM Standard Architecture) Bus. This was primarily designed for the IBMPC (8086 / 186 / 286 Processor based) and uses a 16 bit data bus. It allows only up to 1024 port addresses. An extension EISA (Extended ISA) allows upto 32 bit data and addresses. 2. PCI (Peripheral Systems Interconnect), PCI /X and PCI Super Buses. This is an advanced version of the IBM-PC bus designed for the Pentium range of processors. It has 32/33 and 64/66 MHz versions ( 64/100 MHz in the PCI / X). A Version 2 EE IIT, Kharagpur 9
current standard PCI Super allows upto 800 Mbps on a 64-bit bus. It supports automatic detection of devices via a 64- byte configuration register which makes it easy to interface plug-and-play devices in a system. 3. IEEE-796 (Multi bus): Originally introduced by Intel as a means of connecting multiple processors on the system board, this bus is no longer very popular. It works with 16 bit data & 24 bit address buses. 4. VME Bus: (Euro-standard) Introduced for the same purpose as Intel Multibus it works with 24 bit address 8/16/32 bit data buses. 5. SCSI Bus (Small Computer System Interface): This standard was originally designed for use with Apple Mcintosh computers and then popularized by the Workstation Vendors. The main purpose is to interface peripherals like harddisks, CD-ROM Drives and similar relatively slow peripheral which use a data rate less than 100Mbps. The following varieties of SCSI are currently implemented: SCSI-1: Uses an 8-bit bus, and supports data rates of 4 Mbps SCSI-2: Same as SCSI-1, but uses a 50-pin connector instead of a 25-pin connector, and supports multiple devices. This is what most people mean when they refer to plain SCSI. Wide SCSI: Uses a wider cable (168 cable lines to 68 pins) to support 16-bit transfers. Fast SCSI: Uses an 8-bit bus, but doubles the clock rate to support data rates of 10 Mbps. Fast Wide SCSI: Uses a 16-bit bus and supports data rates of 20 Mbps. Ultra SCSI: Uses an 8-bit bus, and supports data rates of 20 Mbps. SCSI-3: Uses a 16-bit bus and supports data rates of 40 Mbps. Also called Ultra Wide SCSI. Ultra2 SCSI: Uses an 8-bit bus and supports data rates of 40 Mbps. Wide Ultra2 SCSI: Uses a 16-bit bus and supports data rates of 80 Mbps.
However, for the kind of applications targeted by GPIB, it is now facing a very strong competition from the recently introduced high speed serial bus standards. Currently there are four major candidates for future bus systems in Test & Measurement: The Universal Serial bus (USB) is now very popular. The current implementation provides transfer rates of up to 12MBit/s. From that viewpoint, there is no speed enhancement in comparison to GPIB; in fact, it is a drawback. USB II is an enhanced USB bus capable of transferring up to 480MBit/s. It is backwards compatible to USB. The IEC SC65C Working group 3 (that developed also the IEC625.1 and IEC625.2 standards) is planning to work on this.
IEEE1394 (Fire Wire) is now available with transfer rates up to 400MBit/s. A specification to simulate GPIB was developed by a working group inside the IEEE1394 Trade Association. It is called IICP (Industrial and Instrumentation Control Protocol). Ethernet and related networks using TCP/IP protocol. Transfer rates up to 1GBit/s are possible. For simulating GPIB, a specification called VXI-11, introduced by the VXI plug play alliance, exists.
Module 5
Lesson 25
Serial Data Communication
After going through this lesson the student would be able to Distinguish between serial and parallel data communication Explain why a communication protocol is needed Distinguish between the RS-232 and other serial communication standards Describe how serial communication can be used to interconnect two remote computers using the telephone line
Questions & Answers

Question The minimum number of lines used in two-way (fullduplex) serial data transmission is The digital signals need to be converted to audio tones for transmission through telephone lines because the bandwidth of these lines is low A DCE transmits its digital output data through the line Differential signaling is used to reduce the effect of signal attenuation in the transmission line Visual (If any) A 1 B 2 C 3 D 4 Ans. C
TXD RXD DTR DSR T F
B B
MAIN TOPICS .!!

(CLICK ON THE HYPERLINKS BELOW.!!) DATA COMMUNICATION SERIAL DATA COMMUNICATION: An overview PC-PC COMMUNICATION (short) (detail) ASYNCHRONOUS COMMUNICATION PROTOCOL
Conventional CURRENT LOOPS : The outmoded legend

Serial communicaion using the Current loops 4-20 mA Current Loop
RS232.WHAT IS?.
STANDARD SIGNALLING/COMMUNICATION TECHNIQUE ADVANTAGES/APPLICATIONS Disadvantage
RS422 and RS423.....WHAT IS...?.

STANDARD SIGNALLING/COMMUNICATION TECHNIQUE ADVANTAGES/APPLICATIONS
RS485......... WHAT IS......?.

STANDARD SIGNALLING/COMMUNICATION TECHNIQUE ADVANTAGES/APPLICATIONS
CONNECTERS and PIN DESCRIPTION Differences Between the various standards at a glance!!
(home..)
Serial Data Communication

Data Communication is one of the most challenging fields today as far as technology
development is concerned. Data, essentially meaning information coded in digital form, that is, 0s and 1s, is needed to be sent from one point to the other either directly or through a network. And when many such systems need to share the same information or different information through the same medium, there arises a need for proper organization (rather, socialization) of the whole network of the systems, so that the whole system works in a cohesive fashion. Therefore, in order for a proper interaction between the data transmitter (the device needing to commence data communication) and the data receiver (the system which has to receive the data sent by a transmitter) there has to be some set of rules or (protocols) which all the interested parties must obey. The requirement above finally paves the way for some DATA COMMUNICATION STANDARDS. Depending on the requirement of applications, one has to choose the type of communication strategy. There are basically two major classifications, namely SERIAL and PARALLEL, each with its variants. The discussion about serial communication will be undertaken in this lesson.
Any data communication standard comprises

The protocol. Signal/data/port specifications for the devices or additional electronic circuitry involved.
What is Serial Communication?
(home..)
Serial data communication strategies and, standards are used in situations having a limitation of the number of lines that can be spared for communication. This is the primary mode of transfer in long-distance communication. But it is also the situation in embedded systems where various subsystems share the communication channel and the speed is not a very critical issue. Standards incorporate both the software and hardware aspects of the system while buses mainly define the cable characteristics for the same communication type. Serial data communication is the most common low-level protocol for communicating between two or more devices. Normally, one device is a computer, while the other device can be a modem, a printer, another computer, or a scientific instrument such as an oscilloscope or a function generator. As the name suggests, the serial port sends and receives bytes of information, rather characters (used in the other modes of communication), in a serial fashion - one bit at a time. These bytes are transmitted using either a binary (numerical) format or a text format.
All the data communication systems follow some specific set of standards defined for their communication capabilities so that the systems are not Vendor specific but for each system the user has the advantage of selecting the device and interface according to his own choice of make and range. The most common serial communication system protocols can be studied under the following categories: Asynchronous, Synchronous and Bit-Synchronous communication standards.
Asynchronous Communication and Standards The Protocol

(home..)
This protocol allows bits of information to be transmitted between two devices at an arbitrary point of time. The protocol defines that the data, more appropriately a character is sent as frames which in turn is a collection of bits. The start of a frame is identified according to a START bit(s) and a STOP bit(s) identifies the end of data frame. Thus, the START and the STOP bits are part of the frame being sent or received. The protocol assumes that both the transmitter and the receiver are configured in the same way, i.e., follow the same definitions for the start, stop and the actual data bits. Both devices, namely, the transmitter and the receiver, need to communicate at an agreed upon data rate (baud rate) such as 19,200 KB/s or 115,200 KB/s. This protocol has been in use for 15 years and is used to connect PC peripherals such as modems and the applications include the classic Internet dial-up modem systems. Asynchronous systems allow a number of variations including the number of bits in a character (5, 6, 7 or 8 bits), the number of stops bits used (1, 1.5 or 2) and an optional parity bit. Today the most common standard has 8 bit characters, with 1 stop bit and no parity and this is frequently abbreviated as '8-1-n'. A single 8-bit character, therefore, consists of 10 bits on the line, i.e., One Start bit, Eight Data bits and One Stop bit (as shown in the figure below). Most important observation here is that the individual characters are framed (unlike all the other standards of serial communication) and NO CLOCK data is communicated between the two ends.
The Typical Data Format (known as FRAME) for Asynchronous Communication

START BIT(s) PARITY BIT(s) STOP BIT(s)
Serial Data
DATA BITS
Serial Data
Interface Specifications Communication
for
Asynchronous
Serial
Data
The serial port interface for connecting two devices is specified by the TIA (Telecommunications Industry Association) / EIA-232C (Electronic Industries Alliance) Version 2 EE IIT, Kharagpur 6
standard published by the Telecommunications Industry Association; both the physical and electrical characteristics of the interfaces have been detailed in these publications. RS-232, RS-422, RS-423 and RS-485 are each a recommended standard (RS-XXX) of the Electronic Industry Association (EIA) for asynchronous serial communication and have more recently been rebranded as EIA-232, EIA-422, EIA-423 and EIA-485. It must be mentioned here that, although, some of the more advanced standards for serial communication like the USB and FIREWIRE are being popularized these days to fill the gap for high-speed, relatively short-run, heavy-data-handling applications, but still, the above four satisfy the needs of all those high-speed and longer run applications found most often in industrial settings for plant-wide security and equipment networking. RS-232, 423, 422 and 485 specify the communication system characteristics of the hardware such as voltage levels, terminating resistances, cable lengths, etc. The standards, however, say nothing about the software protocol or how data is framed, addressed, checked for errors or interpreted
THE RS-232
(home..)
This is the original serial port interface standard and it stands for Recommended Standard Number 232 or more appropriately EIA Recommended Standard 232 is the oldest and the most popular serial communication standard. It was first introduced in 1962 to help ensure connectivity and compatibility across manufacturers for simple serial data communications.
Applications
(home..)
Peripheral connectivity for PCs (the PC COM port hardware), which can range beyond modems and printers to many different handheld devices and modern scientific instruments.
All the various characteristics and definitions pertaining to this standard can be summarized according to:

The maximum bit transfer rate capability and cable length. Communication Technique: names, electrical characteristics and functions of signals. The mechanical connections and pin assignments.
The Standard Maximum Bit Transfer Rate, Signal Voltages and Cable Length
RS-232s capabilities range from the original slow data rate of up to 20 kbps to over 1 Mbps for some of the modern applications. RS-232 is mainly intended for short cable runs, or local data transfers in a range up to 50 feet maximum, but it must be mentioned here that it also depends on the Baud Rate. Version 2 EE IIT, Kharagpur 7
It is a robust interface with speeds to 115,200 baud, and It can withstand a short circuit between any 2 pins. It can handle signal voltages as high / low as 15 volts.
Signal States and the Communication Technique
(home..)
Signals can be in either an active state or an inactive state. RS232 is an Active LOW voltage driven interface where: ACTIVE STATE: An active state corresponds to the binary value 1. An active signal state can also be indicated as logic 1, on, true, or a mark. INACTIVE STATE: An inactive signal state is stated as logic 0, off, false, or a space. For data signals, the true state occurs when the received signal voltage is more negative than -3 volts, while the "false" state occurs for voltages more positive than 3 volts. For control signals, the "true" state occurs when the received signal voltage is more positive than 3 volts, while the "false" state occurs for voltages more negative than -3 volts.
Transition or Dead Area

Signal voltage region in the range >-3.0V and < +3.0V is regarded as the 'dead area' and allows for absorption of noise. This same region is considered a transition region, and the signal state is undefined. To bring the signal to the "true" state, the controlling device unasserts (or lowers) the value for data pins and asserts (or raises) the value for control pins. Conversely, to bring the signal to the "false" state, the controlling device asserts the value for data pins and unasserts the value for control pins. The "true" and "false" states for a data signal and for a control signal are as shown below.
The Communication Technique
6 V 3 O L T 0 A G E -3 -6
Signal State 0
Signal State 1
Transition dead region TIME
Signal State 1 Data Signal Status
Signal State 0 Control Signal Status
A factor that limits the distance of reliable data transfer using RS-232 is the signaling technique that it uses. This interface is single-ended meaning that communication occurs over a SINGLE WIRE referenced to GROUND, the ground wire serving as a second wire. Over that single wire, marks and spaces are created. While this is very adequate for slower applications, it is not suitable for faster and longer applications.
The communication technique

RS-232 is designed for a unidirectional half-duplex communications mode. That simply means that a transmitter (driver) is feeding the data to a receiver over a copper line. The data always follows the direction from driver to receiver over that line. If return transmission is desired, another set of driver- receiver pair and separate wires are needed. In other words, if bi-directional or full-duplex capabilities are needed, two separate communications paths are required.
+
Tx
+
Data
D -
+ Data flow -
Rx
RS-232 Single-Ended, Unidirectional, Half Duplex
Disadvantage
(home..)
Being a single-ended system it is more susceptible to induced noise, ground loops and ground shifts, a ground at one end not the same potential as at the other end of the cable e.g. in applications under the proximity of heavy electrical installations and machineries But these vulnerabilities at very high data rates and for those applications a different standard, like the RS422 etc., is required which have been explained further.
Some Modern Perspectives/Advantages

Most applications for RS-232 today are for data connectivity between portable handheld devices and a PC. Some of the differences between the modern RS-232 integrals from the older versions are: Such devices require that the RS-232 IC to be very small, have low current drain, operate from a +3 to +5-V supply. They provide ESD protection on all transmit and receive pins. For example, some RS232 interfaces have specifically been designed for handheld devices and support data rates greater than 250 kbps, can operate down to +2.7 V. They can automatically go into a standby mode drawing very small currents of the order of only 150 nA when not in use, provide 15 kV ESD protection on data pins and are in the near-chip-scale 5 X 5 mm quad flat no-lead package. Nevertheless, for portable and handheld applications the older RS-232 is still the most popular one. (home..)
RS-422 and RS-423 (EIA Recommended Standard 422 and 423)

These were designed, specifically; to overcome the distance and speed limitations of RS232.Although they are similar to the more advanced RS-232C, but can accommodate higher baud rates and longer cable lengths and, accommodate multiple receivers.
The Standard
(home..)
Maximum Bit Transfer Rate, Signal Voltages and Cable Length

For both of these standards the data lines can be up to 4,000 feet with a data rate around 100 kbps. The maximum data rate is around 10 Mbps for short runs, trading off distance for speed. The maximum signal voltage levels are 6 volts. The signaling technique for the RS-422 and RS-423 is mainly responsible for there superiority over RS-232 in terms of speed and length of transmission as explained in the next subsection.
Communication Technique
The flair of this standard lies in its capability in tolerating the ground voltage differences between sender and receiver. Ground voltage differences can occur in electrically noisy environments where heavy electrical machinery is operating. The criterion here is the differential-data communication technique, also referred to as balanced-differential signaling. In this, the driver uses two wires over which the signal is transmitted. However, each wire is driven and floating separate from ground, meaning, neither is grounded and in this respect this system is different to the single-ended systems. Correspondingly, the receiver has two inputs, each floating above ground and electrically balanced with the other when no data is being transmitted. Data on the line causes a desired electrical imbalance, which is recognized and amplified by the receiver. The common-mode signals, such as induced electrical noise on the lines caused from machinery or radio transmissions, are, for the most part, canceled by the receiver. That is because the induced noise is identical on each wire and the receiver inverts the signal on one wire to place it out of phase with the other causing a subtraction to occur which results in a Zero difference. Thus, noise picked up by the long data lines is eliminated at the receiver and does not interfere with data transfer. Also, because the line is balanced and separate from ground, there is no problem associated with ground shifts or ground loops.
Rx
+
Tx
+ + R
Rx
+Data + Data flow Data
D -
R +
Rx
RS-422 Differential Signaling, Unidirectional, Half Duplex, Multi-drop It may be mentioned here to avoid any ambiguity in understanding the RS-422 and the RS-423 standards, that, the standard RS-423 is an advanced counterpart of RS-422 which has been designed to tolerate the ground voltage differences between the sender and the receiver for the more advanced version of RS-232, that is, the RS-232C. Unlike RS-232, an RS-422 driver can service up to 10 receivers on the same line (bus). This is often referred to as a half-duplex single-source multi-drop network, (not to be confused with multi-point networks associated with RS-485), this will be explained further in conjugation with RS-485. Version 2 EE IIT, Kharagpur 11
Like RS-232, however, RS-422 is still half-duplex one-way data communications over a two-wire line. If bi-directional or full-duplex operation is desired, another set of driver, receiver(s) and two-wire line is needed. In which case, RS-485 is worth considering.
Applications
This fits well in process control applications in which instructions are sent out to many actuators or responders. Ground voltage differences can occur in electrically noisy environments where heavy electrical machinery is operating.
RS-485
This is an improved RS-422 with the capability of connecting a number of devices (transceivers) on one serial bus to form a network.
The Standard Maximum Bit Transfer Rate, Signal Voltages and Cable Length
Such a network can have a "daisy chain" topology where each device is connected to two other devices except for the devices on the ends. Only one device may drive data onto the bus at a time. The standard does not specify the rules for deciding who transmits and when on such a network. That solely depends upon the system designer to define. Variable data rates are available for this standards but the standard max. data rate is 10 Mbps, however ,some manufacturers do offer up to double the standard range i.e. around 20 Mbps,but of course, it is at the expense of cable width. It can connect upto 32 drivers and receivers in fully differential mode similar to the RS 422. (home)
Communication Technique

EIA Recommended Standard 485 is designed to provide bi-directional half-duplex multi-point data communications over a single two-wire bus. Like RS-232 and RS-422, full-duplex operation is possible using a four-wire, two-bus network but the RS-485 transceiver ICs must have separate transmit and receive pins to accomplish this. RS-485 has the same distance and data rate specifications as RS-422 and uses differential signaling but, unlike RS-422, allows multiple drivers on the same bus. As depicted in the Figure below, each node on the bus can include both a driver and receiver forming a multi-point star network. Each driver at each node remains in a disabled highimpedance state until called upon to transmit. This is different than drivers made for RS422 where there is only one driver and it is always enabled and cannot be disabled. With automatic repeaters and tri-state drivers the 32-node limit can be greatly exceeded. In fact, the ANSI-based SCSI-2 and SCSI-3 bus specifications use RS-485 for the physical (hardware) layer. Version 2 EE IIT, Kharagpur 12
RX TX
Enable Data Flow TX RX D Data Flow R R D
Enable Enable
D R Enable
TX RX
RX
TX
RS-485 Differential Signaling, Bi-directional, Half Duplex, Multi-point
Advantages
Among all of the asynchronous standards mentioned above this standard offers the maximum data rate. Apart from that special hardware for avoiding bus contention and , A higher receiver input impedance with lower Driver load impedances are its other assets.
(home..)
Differences between the various standards at a glance

All together the important electrical and mechanical characteristics for application purposes may be classified and summarized according to the table below.
RS-232 RS-422/423 RS-485
Signaling Technique
Drivers and
Single-Ended (Unbalanced) 1 Driver 1 Receiver 50 feet 20 kbps
Differential (Balanced) 1 Driver 10 Receivers 4000 feet 10 Mbps down to 100 kbps +/-2.0 V
Differential (Balanced) 32 Drivers 32 Receivers 4000 feet 10 Mbps down to 100 kbps +/-1.5 V Version 2 EE IIT, Kharagpur 13
Receivers on
Bus
Maximum Cable Length Original Standard Maximum Data Rate Minimum Loaded Driver Output Voltage Levels
+/-5.0 V
Driver Load Impedance Receiver Input Impedance
3 to 7 k 3 to 7 k
100 4k or greater
54 12 k or greater
(home..)
Interfacing of Peripherals Involving the Rs-232 Asynchronous Communication Standards

The RS-232 standard defines the two devices connected with a serial cable as the Data Terminal Equipment (DTE) and Data Circuit-Terminating Equipment (DCE). This terminology reflects the RS-232 origin as a standard for communication between a computer terminal and a modem. Primary communication is accomplished using three pins: the Transmit Data (TD) pin, the Receive Data(RD) pin, and the Ground pin (not shown). Other pins are available for data flow control. The serial port pins and the signal assignments for a typical asynchronous serial communication can be shown in the scheme for a 9-pin male connector (DB9) on the DTE as under:
Serial Port Pin and Signal Assignments Pin Label 1 2 3 The DB9 male connector 4 5 6 7 8 9 CD RD TD Signal Name Carrier Detect Received Data Transmitted Data Signal Type Control Data Data Control Ground Control Control Control Control
DTR Data Terminal Ready GND DSR RTS CTS RI Signal Ground Data Set Ready Request to Send Clear to Send Ring Indicator
(The RS-232 standard can be referred for a description of the signals and pin assignments used for a 25-pin connector) Because RS-232 mainly involves connecting a DTE to a DCE, the pin assignments are defined such that straight-through cabling is used, where pin 1 is connected to pin 1, pin 2 is connected to pin 2, and so on. A DTE to DCE serial connection using the Transmit Data (TD) pin and the Receive Data (RD) pin is shown below.
TD (pin 3)
DTE
RD (pin 3) RD (pin 2)
DCE
TD (pin 2)
Connecting two DTE's or two DCE's using a straight serial cable, means that the TD pin on each device are connected to each other, and the RD pin on each device are connected to each other. Therefore, to connect two like devices, a null modem cable has to be used. As shown below, null modem cables crosses the transmit and receive lines in the cable. TD (pin 3)
DTE
TD (pin 3) RD (pin 2)
DTE
RD (pin 2)
Serial ports consist of two signal types: data signals and control signals. To support these signal types, as well as the signal ground, the RS-232 standard defines a 25-pin connection. However, most PC's and UNIX platforms use a 9-pin connection. In fact, only three pins are required for serial port communications: one for receiving data, one for transmitting data, and one for the signal ground. Throughout this discussion computer is considered a DTE, while peripheral devices such as modems and printers are considered DCE's. Note that many scientific instruments function as DTE's. The term "data set" is synonymous with "modem" or "device," while the term "data terminal" is synonymous with "computer."
(Detail PC PC communication.)
(home..)
The schematic for a connection between the PC UART port and the Modem serial port is as shown below:
TxD
RxD TxD CD DSR DTR RTS CTS
UART COM PORT DTE
RxD
MODEM SERIAL PORT DCE
Note: The serial port pin and signal assignments are with respect to the DTE. For example, data is transmitted from the TD pin of the DTE to the RD pin of the DCE.
The Data Pins

Most serial port devices support full-duplex communication meaning that they can send and receive data at the same time. Therefore, separate pins are used for transmitting and receiving data. For these devices, the TD, RD, and GND pins are used. However, some types of serial port devices support only one-way or half-duplex communications. For these devices, only the TD and GND pins are used. In the course of explanation, it is assumed that a full-duplex serial port is connected to the DCE. The TD pin carries data transmitted by a DTE to a DCE. The RD pin carries data that is received by a DTE from a DCE.
The Control Pins

9-pin serial ports provide several control pins whose functions are to:

Signal the presence of connected devices Control the flow of data
The control pins include RTS and CTS, DTR and DSR, CD, and RI.
The RTS and CTS Pins

The RTS and CTS pins are used to signal whether the devices are ready to send or receive data. This type of data flow control - called hardware handshaking - is used to prevent data loss during transmission. When enabled for both the DTE and DCE, hardware handshaking using RTS and CTS follows these steps: 1. The DTE asserts the RTS pin to instruct the DCE that it is ready to receive data. 2. The DCE asserts the CTS pin indicating that it is clear to send data over the TD pin. If data can no longer be sent, the CTS pin is unasserted. 3. The data is transmitted to the DTE over the TD pin. If data can no longer be accepted, the RTS pin is unasserted by the DTE and the data transmission is stopped. Version 2 EE IIT, Kharagpur 16
The DTR and DSR Pins

Many devices use the DSR and DTR pins to signal if they are connected and powered. Signaling the presence of connected devices using DTR and DSR follows these steps: 1. The DTE asserts the DTR pin to request that the DCE connect to the communication line. 2. The DCE asserts the DSR pin to indicate it's connected. 3. DCE unasserts the DSR pin when it's disconnected from the communication line. The DTR and DSR pins were originally designed to provide an alternative method of hardware handshaking. However, the RTS and CTS pins are usually used in this way, and not the DSR and DTR pins. However, you should refer to your device documentation to determine its specific pin behavior.
The CD and RI Pins

The CD and RI pins are typically used to indicate the presence of certain signals during modemmodem connections. CD is used by a modem to signal that it has made a connection with another modem, or has detected a carrier tone. CD is asserted when the DCE is receiving a signal of a suitable frequency. CD is unasserted if the DCE is not receiving a suitable signal. RI is used to indicate the presence of an audible ringing signal. RI is asserted when the DCE is receiving a ringing signal. RI is unasserted when the DCE is not receiving a ringing signal (for example, it's between rings).
A Practical Example: PC-PC Communication
(home..)
PROBLEM: Suppose one PC needs to send data to another computer located far away from its vicinity. Now, the actual data is in the parallel form, it needs to be converted into its serial counterpart. This is done by a Parallel-in-Serial-out Shift register and a Serial-in-Parallel-out Shift register (some electronic component). It has to be made sure that the transmitter must not send the data at a rate faster than with which the receiver can receive it. This is done by introducing some handshaking signals or circuitry in conjugation with the actual system. For very short distances, devices like UART(Universal Asynchronous Receiver Transmitter: IN8250 from National Semiconductors Corporation) and USART (Universal Synchronous Asynchronous Receiver Transmitter; Intel 8251A from Intel Corporation.) incorporate the essential circuitry for handling this serial communication with handshaking. For long distances Telephone lines (switched lines) are more practically feasible because of there pre-availability. ONE COMPLICATION: BANDWIDTH is only 300 3000Hz. Version 2 EE IIT, Kharagpur 17
REMEDY: Convert the digital signal to audio tones. The device, which is used to do this conversion and vice-versa, is known as a MODEM.
But how all the above Principles are Applied in Practice?

Consider the control room of a steel plant where one main computer is time-sharing and communicating data to and fro with some other computers or I/O modules in a DCS or SCADA hierarchy. In the simplest way the communication hardware-software can be represented in a top-level block diagram as follows: MAIN MICROCOMPUTER TXD RXD RTS CTS CD DTR DSR MODEM MODEM DIFFERENT TIMESHARED DEVICES TXD RXD RTS CTS CD DTR DSR
D T E
D
TELEPHONE LINE
D C E
D T E
C E
A TYPICAL DIGITAL TRANSMISSION SYSTEM
Overall Procedure of Communication

(Note: This is actually the initialization and handshaking description for a typical UART, the Intel 8251A) (for more details click the box here ) To start with, it should be mentioned that the signals alongside the arrowheads represent the minimum number of necessary signals for the execution of a typical communication standard or a protocol; being elaborated later. These signals occur when the main control terminal wants to send some control signal to the end device or if the end device wants to send some data, say an alarm or some process output, to the main controller.
Both the main microcomputer and the end-device or the time-shared device can be referred to as terminals. Whenever a terminal is switched on it first performs a self-diagnostic test, in which it checks itself and if it finds that its integrity is fully justified it asserts the DTR (data-terminal ready) signal low. As the modem senses it getting low, it understands that the terminal is ready. The modem then replies the terminal by asserting DSR (data-set ready) signal low. Here the direction of the arrows is of prime importance and must be remembered to get the full understandability of the whole procedure. If the terminal is actually having some valuable data to convey to the end-terminal it will assert the RTS (request-to-send) signal low back to the modem and, in turn, the modem will assert the CD (carrier-detect) signal to the terminal indicating as if now it has justified the connection with the terminal computer. But it may be possible that the modem may not be fully ready to transmit the actual data to the telephone, this may be because of its buffer saturation and several other reasons. When the modem is fully ready to send the data along the telephone line it will assert the CTS (Clear-tosend) signal back to the terminal. The terminal then starts sending the serial data to the modem and the modem. When the terminal gets exhausted of the data it asserts the RTS signal low indicating the modem that it has not got any more data to be sent. The modem in turn unasserts its CTS signal and stops transmitting. The same way initialization and the handshaking processes are executed at the other end. Therefore, it must be noted here that the very important aspect of data communication is the definition of the handshaking signals defined for transferring serial data to and from the modem.
Current loops
(home..)
Current loops are a standard, which are used widely in process automation. 20 mA are wirely used for transmitting serial communication data to programmable process controlling devices. Other widely used standard is 4-20mA current loop, which is used for transmitting analogue measurement signals between the sensor and measurement device.
Serial communication using current loop
(home..)
In digital communications 20 mA current loop is a standard. The transmitters will only source 20 mA and the receiver will only sink 20 mA. Current loops often use opto-couplers. Here it is the current which matters and not the voltages. For measurement purposes a small resistance, say of value1k, is connected in series with the receiver/transmitter and the current meter. The current flowing into the receiver indicates the scaled data, which is actually going inside it. The data transmitted though this kind of interface is usually a standard RS-232 signal just converted to current pulses. Current on and off the
transmission line depends on how the RS-232 circuit distinguishes between the value of currents and in what way it interprets the logic state thus obtained.
4-20 mA current loop
(home..)
4-20 mA current loop interface is the standard for almost all the process control instruments. This interface works as follows. The sensor is connected to a process controlling equipment, which reads the sensor value and supplies a voltage to the loop where the sensor is connected and reads the amount of current it takes. The typical supply voltage for this arrangement is around 12-24 Volts through a resistor and the measured output is the voltage drop across that resistor converted into its current counterpart. The current loop is designed so that a sensor takes 4 mA current when it is at its minimum value and 20 mA when it is in its maximum value. Because the sensor will always pass at least 4 mA current and there is usually a voltage drop of many volts over the sensor, many sensor types can be made to be powered from only that loop current.
Module 5
Lesson 26
Network Communication
After going through this lesson the student would be able to Describe the need and importance of networking in an embedded system List the commonly adopted network communication standards and explain their basic features Distinguish between the CAN Bus, Field Bus and other network communication standards for embedded applications Choose a particular network standard to suit an application
Questions & Answers

Question Ethernet-type networks are not suitable in an embedded system because (i) These are very slow (ii) These do not provide any guarantee on service times (iii) These are expensive Foundation Fieldbus implements the following layers of the OSI protocol (i) 2 (ii) 3 (iii) 4 (iv) 7 The I2C Bus has the following features (i) Two-wire (ii) Full-duplex (iii) Master-Slave CAN Bus standard was originally developed for Chemical Processes Visual (If any) A B C D Ans. D
T T
T F
T T
T F B
T T T T
T F F T
F T T T
F F T T A
T F T T
T T F F
F T T
F F T B
Network Communication
The role of networking in present-day data communication hardly needs any elaboration. The situation is also similar in the case of embedded systems, particularly those which are distributed over a larger geographical region the so-called distributed embedded systems. Unfortunately, the most common network standard, namely the Ethernet, is not suitable for such distributed systems, especially when there are real-time constraints to be satisfied. This is due to the lack of any service time guarantee in the Ethernet standard. On the other hand, alternatives like Token Ring, which do provide a service-time guarantee, are not very suitable because of the requirement of a ring-type topology not very convenient to implement in the industrial environment. The industry therefore proposed a standard called Token-bus (and got it approved as the IEEE 802.5 specification) to cater to such requirements. However, the standard became too complex and inefficient as a result. Subsequently different manufacturers have come up with their own standards, which are being implemented in specific applications. In this lesson we learn about three such standards, namely o o o I2C Bus Field Bus CAN Bus
We discuss about the last one in a little more detail because it is slowly emerging as one of the most popular networking standards for many embedded applications, like Home Appliances, Automobiles, Ships, Vending Machines, Medical Equipment, small-scale industries etc.
The I2C (Inter-Integrated Circuit) Bus Standard

This standard was introduced by Philips primarily to connect a number of integrated circuits using a single serial communication link. It uses a two-wire serial protocol. One of these carries the Data while the other carries the clock. As shown in the figure below, one of the Integrated Circuits (IC-1 in this case) is configured as the master while all the others are configured as slaves. Usually a microprocessor or a microcontroller serves as the master. The Protocol does not limit the number of masters but only master devices can initiate a data transfer. Both master and Servant devices can act as the senders or receivers of data. Normally, all servant devices go into high impedance state while the master maintains logic high.
Master IC-1 Microprocessor Clock Date
0x01 Servant IC-2 E2PROM
0x02 Servant IC-3 Temp. Sensor
0x03 Servant IC-4 LCD Display
Start bit
7 bits Slave Address
Read / Write
8 Data bits
Always Asserted by Master
Ack Bits
Stop Bit Always Asserted by master
Used by the receiver of the data to indicate successful reception The original specifications for this standards were quite low, namely, 100 kbps with 7 bit addressing. The recent specifications have raised the data rate to 3.4 Mbps with 10 bit addressing.
The Field Bus

The Fieldbus comprises several versions of which, the PROFI (Process Field)-BUS is the standard for local area network for integrated communications from the field level to the cell level. It enables large numbers of field devices to be networked, and carries signals from the distributed I/Os to the programmable controller, which might be several kilometers distant, in a matter of milliseconds. With transmission rates of up to 12 Mbit/s, PROFIBUS-DP, the high-speed fieldbus for distributed I/O, guarantees very short response times. Through the increase of the transmission rate to 12 Mbit/s, the LAN execution time has become irrelevant. There are no bottlenecks in data transmission. Short response times can be achieved even over large distances - with copper conductors up to 9600 meters between the PLC and its remote I/O, and no less than 23 km using fibere optic conductors. The fieldbus consists of a single two-conductor (copper) cable or a thin fiber optic conductor. Up to 125 nodes can be networked (up to 32 per LAN segment).
Initiatives such as the Interoperable Systems Project (ISP) from manufacturers under the leadership of Siemens, Fisher-Rosement and Yokagawa, or its counterpart, the WorldFIP, mainly supported by Honeywell, wanted to establish a de-facto Fieldbus standard by introducing their products onto the market. Both organisations merged in the Fieldbus Foundation (FF). This foundation strives to get a single world standard worked out. Industrial applications range from pulp and paper production and wastewater treatment right through to power station technology. PROFIBUS operations are processed by standard telegrams passing between master and slave using predefined channels called communication relations. Data is stored as objects which can be addressed in the object directory via an index. PROFIBUS specifies an RS 485 interface with a baud rate of 9.6 kbit/s over a cable length of 1200 m and up to 500 kbit/s over a cable length of 200 m. Telegrams consist of communication relations of the target device, the PROFIBUS partner address as well as the indices of the object to be addressed along with any data. With the exception of broadcasts, all telegrams are answered with a positive or negative acknowledgement. This ensures rapid recognition of faulty or non-existent stations. Transmission technology (Physical Layer) of the PROFIBUS-PA can be characterized as follows: o Digital, synchronous bit data transmission. o Data rate 31.25 kbit/s. o Manchester coding. o Signal transmission and remote power supply with transposed two-wire cabling (screened/unscreened). o Remote power supply DC voltage 9V...32V. o Signal AC voltage 0.75 Vpp...1 Vpp (send voltage). o Line and tree topology. o Up to 1.9 km total cabling. o Up to 32 members per cable segment. o Can be expanded with maximum four repeaters. The FOUNDATION fieldbus model is based on the IEC Open Systems Interconnect (OSI) layered communication model.
The Physical layer

The fieldbus physical layer is OSI layer 1. Layer 1 receives encoded messages from the upper layers and converts the messages to physical signals on the fieldbus transmission medium. Physical layer requirements are defined by the approved IEC 1158-2 and ISA S50.02-1992 Physical Layer Standards. Communications rates supported are 31.25 kbit/s, 1.0 Mbit/s and 2.5 Mbit/s. The fieldbus physical layer operating at 31.25 kbit/s is intended to replace the 4-20 mA analog standard currently used to connect field devices to control systems. Like the 4-20 mA standard, the FOUNDATION fieldbus supports single wire pair operation, bus powered devices, and intrinsic safety options.
Fieldbus has additional advantages over 4-20 mA because many devices can connect to a single wire pair resulting in significant savings in wiring costs.
Communication stack
The communications stack comprises OSI Layers 2 and 7. The FOUNDATION fieldbus does not use the OSI layers 3, 4, 5 and 6 because the functions of these layers are not needed. Instead of these layers, the Fieldbus Access Sublayer (FAS) is used to map layer 7 directly to layer 2. Layer 2, the Data Link Layer (DLL), controls transmission of messages onto the fieldbus. The DDL manages access to the fieldbus through a deterministic centralised bus scheduler called the Link Active Scheduler (LAS). A fieldbus may have multiple Link Masters. If the current LAS fails, one of the Link Masters will become the LAS and the operation of the FOUNDATION fieldbus will continue. The FOUNDATION fieldbus is designed to "fail operational". The DLL is a subset of the emerging ISA/IEC DLL standards committee work. The Fieldbus Message Specification (FMS) is modeled after the OSI layer 7 Application Layer. FMS provides the communications services needed by the User Layer for remote access of data across the fieldbus network.
User Layer
The User Layer is not defined by the OSI model. However, for the first time, the FOUNDATION fieldbus specification defines a complete user layer based on function blocks. Function blocks provide the elements necessary for manufacturers to construct interoperable instruments and controllers.
Device descriptions
Each fieldbus device is described by a device description (DD) written in a special programming language known as Device Description Language (DDL). The DD can be thought of as a "driver" for the device. The DD provides all of the information needed for a control system or host to interpret communications coming from the device, including configuration, and diagnostic information. Any control system or host can communicate with a device if it "knows" the DD for the device. The host device uses an interpreter called Device Description Services (DDS) to read the DD for the device. New FOUNDATION fieldbus devices can be added to the fieldbus at any time by simply connecting the device to the fieldbus wire and providing the control system or host can read the identification of the fieldbus device, including the DD identifier, over the fieldbus. Once the DD identifier is is known, the host reads the DD from a CDROM and supplies the DD to DDS for interpretation. Version 2 EE IIT, Kharagpur 7
The completion of the technical specifications for an interoperable fieldbus system is a major milestone in the history of automation. The FOUNDATION fieldbus specification was developed by a consortium of instrument and control system manufacturers that represent over 90% of the instrumentation and control systems provided to end-users worldwide. The specifications will allow many manufacturers to deliver a wide range of interoperable fieldbus devices. These devices will usher in the next major technology transition in process and manufacturing automation.
The CAN Bus

CAN was the solution developed by Robert Bosch GmbH, Germany in 1986 for the development of a communication system between three ECUs (electronic control units) in vehicles being designed by Mercedes. The UART, which had been in use for long, had been rendered unsuitable in their situation because of its point-to-point communication methodology. The need for a multi-master communication system became a stringent requirement. Intel then fabricated the first CAN in 1987. Controller Area Network (CAN) is a very reliable and message-oriented serial network that was originally designed for the automotive industry, but has become a sought after bus in industrial automation as well as other applications. The CAN bus is primarily used in embedded systems, and is actually a network established among micro controllers. The main features are a two-wire, half duplex, high-speed network system mainly suited for high-speed applications using short messages. Its robustness, reliability and compatibility to the design issues in the semiconductor industry are some of the remarkable aspects of the CAN technology.
Main Features
CAN can link up to 2032 devices (assuming one node with one identifier) on a single network. But accounting to the practical limitations of the hardware (transceivers), it may only link up to110 nodes (with 82C250, Philips) on a single network. It offers high-speed communication rate up to 1 Mbits/sec thus facilitating real-time control. It embodies unique error confinement and the error detection features making it more trustworthy and adaptable to a noise critical environment.
CAN Versions
Originally, Bosch provided the specifications. However the modern counterpart is designated as Version 2.0 of this specification, which is divided into two parts: Version 2.0A or Standard CAN; Using 11 bit identifiers. Version 2.0B or Extended CAN; Using 29 bit identifiers. The main aspect of these Versions is the formats of the MESSAGE FRAME; the main difference being the IDENTIFIER LENGTH.
CAN Standards
There are two ISO standards for CAN. The two differ in their physical layer descriptions. ISO 11898 handles high-speed applications up to 1Mbit/second. ISO 11519 can go upto an upper limit of 125kbit/second.
The Can Protocol/Message Formats

In a CAN system, data is transmitted and received using Message Frames. Message Frames carry data from any transmitting node to single or multiple receiving nodes. CAN protocol can support two Message Frame formats: - Version 2.0A - Standard CAN - Version 2.0B - Extended CAN Both formats can be overviewed from figure 1 below.
Main advantages are

The protocol is highly reliable and error resistant. Has become a worldwide-accepted standard (framework) with the development of industrial and available embedded system applications like CANopen and DeviceNet. Embodies a MULTI-MASTER topology. Possess Sophisticated Error Detection and Handling Capability. Has High immunity to Electromagnetic Interference. Associated with, is a Short Latency time for High-Priority Message. The total number of Nodes is not limited by the protocol itself. Very easy Adaptation and entails flexible Extension and Modification features.
BASIC CAN Controller

The basic topology for the CAN Controller has been shown in figure 2 below. The basic controller involves FIFOs for message transfers and it has an enhanced counterpart in Full-CAN controller, which uses message BUFFERS instead.
FIGURE
THE CAN 2.0 A PROTOCOL / MESSAGE FRAME

M E S S A G E
Idle Arbitration Field Control
F R A M E
CRC Field ACK EOF Intr Idle
Data Field
11-bit Identifier
DLC Data (0-8) Bytes 15 bits
SOF
RTR r1 r0
slot delimiter delimiter
THE CAN 2.0 B PROTOCOL / MESSAGE FRAME
M E S S A G E
Idle Arbitration Field Control
F R A M E
Data Field CRC Field ACK EOF Intr Idle
11-bit Identifier
18-bit Identifier
DLC
Data (0-8) Bytes 15 bits
SOF
SRR IDE
RTR r0
r1
slot delimiter delimiter Version 2 EE IIT, Kharagpur 10
FIGURE
THE CONTROLLER TOPOLOGY
B u s CAN Bus I n t e r f a c e PROTOCOL CONTROLLER
Global Status and Control Registers 10 Bytes Transmit Buffer
C P U I n t e r f a c e Host CPU System
Acceptance Decision Filter
10 Bytes Receive Buffer
Module 5
Lesson 27
Wireless Communication
After going through this lesson the student would be able to Describe the benefits and issues in wireless communication Distinguish between WLAN, WPAN and their different implementations like Ricochet, HiperLAN, HomeRF and Bluetooth Choose a particular wireless communication standard to suit an application
Wireless Communication
Third generation wireless technologies are being developed to enable personal, high-speed interactive connectivity to wide area networks (WANs). The IEEE 802.11x wireless technologies finds themselves with an increasing presence in corporate and academic office spaces, buildings, and campuses. Furthermore, with slow but steady inroads into public areas such as airports and coffee bars. WAN, LAN and PAN technologies enable device connectivity to infrastructure-based services - either through campus or corporate backbone intranet. The other end of coverage spectrum is occupied by the short-range embedded wireless connectivity technologies that allow devices to communicate with each other directly without the need for an established infrastructure. At this end of the coverage spectrum the wireless technologies like Ricochet, Bluetooth etc. offer the benefits of omni-directionality and the elimination of the line-of-sight requirement of RF-based connectivity. The embedded connectivity space resembles a communication bubble that follows people around and empowers them to connect their personal devices with other devices that enter the bubble. Connectivity in this bubble is spontaneous and ephemeral and can involve several devices of diverse computing capabilities, unlike wireless LAN solutions that are designed for communication between devices of sufficient computing power and battery. The table below shows a short comparison of various technologies in the wireless arena.
In this lesson we look at the most commonly adopted prospects of different wireless technologies mentioned above.
WLANs-IEEE 802.11X
This is the most prominent technology standard for WLANs (Wireless Local Area Networks). This comprises of a PHY (Physical Layer) and MAC (Physical and Medium Access Control). This allows specific carrier frequencies in the 2.4 GHz range bandwidths with data rates of 1 or 2 Mbps. Further enhancements to the same technology has lead to the modern day protocol known as the 802.11b which provides a basic data rate of 11Mbps and a fall-back rate of 5.5Mbps.All these technologies operate in the internationally available 2.4GHz ISM band. Both IEEE 802.11 and 802.11b standards are capable of providing communications between a number of terminals as an ad hoc network using peer-to-peer mode (see figures at the end) or as a client/server (see figures at the end) wireless configuration or a complicated distributed network (see figures at the end). All these networks require Wireless Cards (PCMCIA-Personal Computer Memory Card International Association-Cards) and wireless LAN Access points. There are two transmission types for these technologies: Frequency Hopping Spread Spectrum (FHSS) and Direct Sequence Spread Spectrum (DSSS). Whereas FHSS is primarily used for low power, low-range applications, the DSSS is popular with Ethernet-like data rates. In the ad-hoc network mode, as there is no central controller, the wireless access cards use the CSMA/CA(Carrier Sense Multiple Access with Collision Avoidance) protocol to resolve shared access of the channel. In the client/server configuration, many PCs and laptops, physically close to each other (20 to 500 meters), can be linked to a central hub (Known as the access point) that serves as a bridge between them and the wired network. The wireless access cards provide the interface between the PCs and the antenna while the access point serves as the wireless LAN hub. The access point is as high as the ceiling of a roof and can support 115-250 users for receiving, buffering and transmitting data between the WLAN and the wired network. Access points can be programmed to select one of the hopping sequences, and he PCMCIA cards tune in to the corresponding sequence. The WLAN bridge could also be implemented using line-of-sight directional antennas. Handover and roaming can also be supported across the various access points. Encryption is also supported using the optional shared-key RC4 (Ron's Code 4 or Rivest's Cipher) algorithm. Version 2 EE IIT, Kharagpur 4
Palm Pilot
Server with wireless card
Laptop with wireless card
PDA
Peer-to-Peer wireless mode

Wireless LAN Access Point Wired Network
Client/Server wireless configuration
Station
Wired Network
Access Point Distributed System Station
Access Point
Access Point Station Station
Wired distributed network
WPANs-802.15X
WPANs (Wireless Personal Area Networks) work as short-range wireless networks. The various WPAN protocols and their interfaces have been and are being standardized by the IEEE 802.15 WG (WPAN Working Group). There are four divisions of this standardization.
1. Under the IEEE 802.15 WPAN/Bluetooth Task Group

This group deals with support and development of applications requiring medium-rate WPANs (e.g. Bluetooth). These WPANs are supposed to handle technicalities for PDA communications, Cell-phones and also possess the QoS for voice applications.
2. Under the IEEE 802.15 Coexistence Task Group

This division deals with developing specifications on the unlicensed ISM band. This standard also called 802.15.2 is developing recommendations to facilitate coexistence of WPANs (802.15) and WLANs (802.11) such that applications like Bluetooth and Microwaves could operate flawlessly in the ISM range.
3. Under the IEEE 802.15 WPAN/High Rate Task Group

This division deals with the development of high-rate (20Mb/s or higher) WPANs. Besides a high rate the new standard provides low-power and low-cost solutions, addressing the needs of portable consumer digital imaging and multimedia applications.
4. Under the IEEE 802.15 WPAN/Low Rate Task Group

This group deals with standardization of ultra-low complexity, cost, and power for a low-data rate (200Kb/s or less) connectivity among inexpensive fixed, portable, and moving devices. A unique capability this standard is supposed to achieve is location awareness. The targeted applications are sensors, interactive toys, smart badges, remote controls, and home automation. Version 2 EE IIT, Kharagpur 6
Ricochet
This provides a secure mobile access to the desktop from outside an office. This service is provided by MERICOM a commercial Internet Service Provider (ISP). This was primarily provided at the airports and some selected areas. The Ricochet Network is a wide area wireless network system using spread spectrum packet switching technique and Metricom's patented frequency hopping, checker architecture. The network operates within the license-free (902-928) MHz) ISM band. A Ricochet wireless micro cellular data network (MCDN) is shown in the figure below.
Microcell radios on streetlights or other utility poles Name Server Gateway
Modem radio
Wireless Access Point
Router
Computer
Network Interconnection facility
Ricochet Wireless Microcellular data network

It consists of shoebox sized radio transceivers, also called microcell radios, and are typically mounted to streetlights or utility poles. The microcells require only a small amount of power from the streetlight itself with the help of a special adapter. Each micro cell radio employs 162 frequency-hopping channels and uses a randomly selected hopping sequence. This allows for a very secure network to all subscribers. Within a 20-sq-mile radius containing about 100 microcell radios Richochet installs wired access points (WAPs) to collect and convert RF packets into a format for transmission through a T1 connection. The Richochet Network has a backbone called the name server, by checking the subscriber serial number. Data packets between a Ricochet modem and a micro cell radio may take different routes during transmissions. They can be routed to another Richochet modem or to one of the Internet gateways, a telephone system, an X.25 network, and LANs or other corporate intranets, The telephone system gateway provides telephone modem access (TMA), which can also be used to connect to online internet services.
Services
Richochet provides immediate, dependable, and secure connections without the cost and complexities of land-based phone lines, dial-up connections, or cellular modems. Richochet Version 2 EE IIT, Kharagpur 7
modem features are its 28,800 bps, 24-hour access. The Richochet wireless network is based on frequency hopping, spread-spectrum packet radio technology, with transmissions randomly hopping every two-fifths of a second over 162 channels.
HomeRF
This technology comes under ad-hoc networking which spans an area such as enclosed home or an office building or a warehouse floor in a workshop. A specification for wireless communications in home called the shared wireless access protocol (SWAP) has been developed. Some common applications targeted are: access to a public network telephone (isochronous multimedia) and Internet (data) entertainment networks (cable television, digital audio and video with IEEE 1394 transfer and sharing of data and resources (printer, Internet connection, etc.), and home control and automation.
Advantages of home RF
In HomeRF same connection can be shared for both voice and data among the devices, at the same time. This technology provides a platform for a broad range of interoperable consumer devices for wireless digital communication between PCs and consumer electronic devices anywhere in and around the home.
The Working Group

The working group comprises of Compaq Computer Corp., Ericsson Enterprise Networks, Hewlett-Packard Co., IBM, Intel Corp., Microsoft Corp., Motorola Corp. and several others. A typical home RF is shown below.
Phone Connection
MAIN PC Cell Phone Microwave Oven Fridge Data Pad Television Handheld Communicator
Clock
Wireless Headset Cable Modem
Pager
Other PCs
Other PCs Architecture-HomeRF
Typical characteristics

Uses the 2.4 GHz ISM band Data rate: 2 Mbps and 1 Mbps Range: 50m Mobility 10m/s Topology: Packet-Oriented Supports both centralized communication (Infrastructure) and ad-hoc (Infrastructure-less) communication Support for simultaneous voice and data transmissions Provides Six audio connections at 32kbps with 20ms latency Maximum data throughput 1.2 Mbps Supports Low-Power paging mode Provides QoS to voice-only devices and best effort for data-only devices.
HiperLAN
"HiperLAN" or "High-performance LAN" has been designed specifically for an ad-hoc environment.
Main characteristics of HiperLAN

Can support both multimedia data and asynchronous data at rates as high as 23.5 Mbps. Employs 5.15 GHz and 1.71 GHz frequency bands Range : 50m Mobility 10m/s Version 2 EE IIT, Kharagpur 9
Topology : Packet-Oriented Supports both centralized and ad-hoc communication. Supports 25 audio connections at 32kbps and latency=10ms and, a video connection of 2 Mbps with 100ms latency and data rate=13.4Mbps.It supports MPEG or other state-ofthe-art real-time digital audio and video standards. HiperLANs are available in two types : o TYPE 1 : This has distributed MAC with QoS provisions and is based on GMSK (Gaussian minimum shift keying) o TYPE 2: This has a centralized scheduled MAC and is based on OFDM.
Objectives of HiperLAN
Provide QoS to build multiservice networks Provide strong security Handoff when moving between local area and wide area Increased throughput Ease of use, deployment, and maintenance Affordability and Scalability A typical HiperLAN system is shown in the figure below:

Fixed Network
AP
AP
AP
AP
HiperLAN System
Bluetooth & Infrared communication

Bluetooth has been designed to allow low-bandwidth wireless connections. It was started by a group of companies including Ericsson, Intel, IBM, Nokia and Toshiba known collectively as a Bluetooth Special Interest Group (SIG) in 1998. Some other companies like Microsoft, Lucent, 3COM, Motorola etc joined it in 1999.The effort got focused on to develop a reliable universal link for short-range RF communication.
INFRARED WIRELESS COMMUNICATION and BLUETOOTH

Infrared technology is another dominant in the field of wireless communications. This has been incorporated in remote controls, notebook computers, personal digital assistants etc. It uses the invisible spectrum of light for transmissions. The IrDA (Infrared Data Association) has specified one standard method for infrared communications, which is commonly used with mobile phones and notebook to handheld computers. Like Bluetooth, IrDA has also been designed for short-range and low-power applications. In addition, the spectrum it utilizes is also unlicensed. Also, like Bluetooth it also defines a physical layer and a software protocol stack and, hence, promotes interoperable communications. The difference lies in transmission speeds and signal paths (Infrared requires line-of-sight paths where RF can penetrate through objects). A typical use of Bluetooth to connect a Laptop is as shown below.
A Bluetooth Connection Bluetooth provides many options to the user. For instance, Bluetooth radio technology built into both cellular telephone and a laptop replaces the cable used today to connect a laptop to a cellular phone. Printers, desktops, FAX machines, keyboards, joysticks and virtually any other digital device can be networked by the Bluetooth system. Bluetooth also provides a universal bridge to existing data networks and mechanism to form small private ad hoc groups of connected devices away from fixed network architectures. Bluetooth wireless communication technologies operate in the 2.4 GHz range. There are certain propositions related to RF Communication in the 2.4 GHz spectrum which the device developers must follow. This is important for an organized use of the spectrum because it is globally unlicensed. As such it is bound by specific regulations put forth by various countries in their respective territories. In context to wireless communications the RF Spectrum has been divided into 79 channels where bandwidth is limited to 1 MHz per channel. Frequency Hopping spread spectrum communications must be incorporated. Also proper mechanism for interference anticipation and removal should also be there. This is essential on account of the fact that the 2.4 GHz spectrum is unlicensed and, hence, more vulnerable to signal congestion because of increasing number of new users trying to communicate within the bandwidth.
Bluetooth Communication Topology

The Bluetooth network model is based upon the concept of proximity networking which implies that as soon as two or more devices come within a range of eachother they should be able to establish a connection. This enables the structuring of PAN (Personal Area Network). There are Version 2 EE IIT, Kharagpur 11
two different communication topologies of Bluetooth PANs are piconet and scatternet. They are described in brief below.
The Piconet
PROXIMITY SPHERE AS1 Active slave (includes sniff hold modes) Parked Slave Standby Outside Piconet AS1 PS1
MASTER
PICONET
PS1 SB1
AS2 PS2
SB1 SB2
AS3
AS4
A PICONET A piconet consists of single master and all slaves in its proximity, which are communicating with it. The slaves may be in active, sniff, hold or park modes at any instant of time. There can be upto seven slaves and any number of parked slaves and standby devices in the vicinity of the master. The above figure shows a typical piconet. The figure shows two spheres. The white filled inscribed sphere comprises the piconet where the ellipses represent the devices or slaves and the box represents the master. Thus, there is only one master and several slaves. The slave names starting from À' represent the Active slaves and these are linked to the master with continuous lines meaning ÀCTIVE'. The slave names starting with `P' represent the parked slaves. Dashed lines are shown connecting it to the master meaning that the connection is not continuous but the devices are in the piconet i.e., `PARKED'. Some other slaves with names starting form `S' indicate the slaves, which are in STAND-BY and these, are actually outside the piconet but inside the proximity sphere.
The Scatternet
Piconet A AS A1 PS A1 Piconet B AS B1 PS B1 AS B2 PS B2 AS B4
MASTER
MASTER
AS B3 AS A2
AS A3
A SCATTERNET Scatternet is formed when two or more piconets fall in each others proximity. More precisely, a scatternet is formed when two or more piconets at least partially overlap in time and space. Within a Scatternet a slave can participate in multiple piconets by establishing connections and synchronizing with different masters in its proximity. A single device may act as master in one piconet and at the same time as slave in another one. A practical example of scatternet is mobile communication in which devices move frequently in and out of proximity of other devices. Figure above shows a typical Scatternet.
Bluetooth Specifications
Typical Bluetooth specifications have been characterized in the table below.
Bluetooth Core Protocols

Upper Layer SDP
L2CAP
LMP
Audio
Baseband
Low radio Layer
Bluetooth Core Protocols A brief description is as follows. Service Discovery Protocol (SDP) provides means for application to discover which services are provided by or available through a Bluetooth device. It also allows applications to determine the characteristics of those available services. Logical Link Control and adaptation layer protocol (L2CAP) supports higher-level protocol multiplexing, packet-segmentation and reassembly, and the conveying of QoS (Quality of Service) information. The link managers (on either side) for link step and control use Link Manager Protocol (LMP). The baseband and link control layer enables physical RF link between Bluetooth units forming piconet. It provides two different packets, SCO and ACL, which can be transmitted in a multiplexing manner on the same RF link. Different master/Slave pairs of the same piconet can use different link types, and the link type may change arbitrarily during a session. Each link type supports up to sixteen different packet types. Four of these are control packets and are common for both SCO and ACL links. Both link types use a TDD scheme or full-duplex transmissions. The SCO link is symmetric and typically supports time-bounded voice traffic. SCO packets are transmitted over reserved intervals. Once the connection is established, both master and slave units may send SCO packet types and allow both voice and data transmission-with only the data portion being retransmitted when corrupted.
Operational States
OPERATIONAL STATES OF THE BLUETOOTH DEVICES
Stand -By
PAGE
PAGE SCAN
INQUIRY SCAN
INQUIRY
Master Response
Slave Response
Inquiry Response
Connection
OPERATIONAL STATE MACHINE
State Description
STANDBY This is the default state and the lowest power consuming one too. Only the Bluetooth clock operates in the low-power mode. INQUIRY In this state a device seeks and gets familiar with the identity of other devices in its proximity. The other devices must have their Inquiry Scan state ENABLED if they want to entertain the query from other devices. PAGE In this state master of a piconet invites other devices to join in. To entertain this request the invitee must have its Page Scan state ENABLED.
A device may bypass the inquiry state if the identity of the device it is wanting to page is previously known (see the figure above). The figure above also indicates that any member of a piconet not necessarily the master, may still perform INQUIRY and PAGE operations for additional devices, thus, paving way for a Scatternet.
Module 6
Embedded System Software
Lesson 28
Introduction to Real-Time Systems
Specific Instructional Objectives

At the end of this lesson, the student would be able to: Know what a Real-Time system is Get an overview of the various applications of Real-Time systems Visualize the basic model of a Real-Time system Identify the characteristics of a Real-Time system Understand the safety and reliability aspects of a Real-Time system Know how to achieve highly reliable software Get an overview of the software fault-tolerant techniques Classify the Real-Time tasks into different categories
1. Introduction
Commercial usage of computer dates back to a little more than fifty years. This brief period can roughly be divided into mainframe, PC, and post-PC eras of computing. The mainframe era was marked by expensive computers that were quite unaffordable by individuals, and each computer served a large number of users. The PC era saw the emergence of desktops which could be easily be afforded and used by the individual users. The post-PC era is seeing emergence of small and portable computers, and computers embedded in everyday applications, making an individual interact with several computers everyday. Real-time and embedded computing applications in the first two computing era were rather rare and restricted to a few specialized applications such as space and defense. In the post-PC era of computing, the use of computer systems based on real-time and embedded technologies has already touched every facet of our life and is still growing at a pace that was never seen before. While embedded processing and Internet-enabled devices have now captured everyones imagination, they are just a small fraction of applications that have been made possible by real-time systems. If we casually look around us, we can discover many of them often they are camouflaged inside simple looking devices. If we observe carefully, we can notice several gadgets and applications which have today become indispensable to our every day life, are in fact based on embedded real-time systems. For example, we have ubiquitous consumer products such as digital cameras, cell phones, microwave ovens, camcorders, video game sets; telecommunication domain products and applications such as set-top boxes, cable modems, voice over IP (VoIP), and video conferencing applications; office products such as fax machines, laser printers, and security systems. Besides, we encounter real-time systems in hospitals in the form of medical instrumentation equipments and imaging systems. There are also a large number of equipments and gadgets based on real-time systems which though we normally do not use directly, but never the less are still important to our daily life. A few examples of such systems are Internet routers, base stations in cellular systems, industrial plant automation systems, and industrial robots. It can be easily inferred from the above discussion that in recent times real-time computers have become ubiquitous and have permeated large number of application areas. At present, the Version 2 EE IIT, Kharagpur 3
computers used in real-time applications vastly outnumber the computers that are being used in conventional applications. According to an estimate [3], 70% of all processors manufactured world-wide are deployed in real-time embedded applications. While it is already true that an overwhelming majority of all processors being manufactured are getting deployed in real-time applications, what is more remarkable is the unmistakable trend of steady rise in the fraction of all processors manufactured world-wide finding their way to real-time applications. Some of the reasons attributable to the phenomenal growth in the use of real-time systems in the recent years are the manifold reductions in the size and the cost of the computers, coupled with the magical improvements to their performance. The availability of computers at rapidly falling prices, reduced weight, rapidly shrinking sizes, and their increasing processing power have together contributed to the present scenario. Applications which not too far back were considered prohibitively expensive to automate can now be affordably automated. For instance, when microprocessors cost several tens of thousands of rupees, they were considered to be too expensive to be put inside a washing machine; but when they cost only a few hundred rupees, their use makes commercial sense. The rapid growth of applications deploying real-time technologies has been matched by the evolutionary growth of the underlying technologies supporting the development of real-time systems. In this book, we discuss some of the core technologies used in developing real-time systems. However, we restrict ourselves to software issues only and keep hardware discussions to the bare minimum. The software issues that we address are quite expansive in the sense that besides the operating system and program development issues, we discuss the networking and database issues. In this chapter, we restrict ourselves to some introductory and fundamental issues. In the next three chapters, we discuss some core theories underlying the development of practical real-time and embedded systems. In the subsequent chapter, we discuss some important features of commercial real-time operating systems. After that, we shift our attention to realtime communication technologies and databases.
1.1. What is Real-Time?

Real-time is a quantitative notion of time. Real-time is measured using a physical (real) clock. Whenever we quantify time using a physical clock, we deal with real time. An example use of this quantitative notion of time can be observed in a description of an automated chemical plant. Consider this: when the temperature of the chemical reaction chamber attains a certain o predetermined temperature, say 250 C, the system automatically switches off the heater within a predetermined time interval, say within 30 milliseconds. In this description of a part of the behavior of a chemical plant, the time value that was referred to denotes the readings of some physical clock present in the plant automation system. In contrast to real time, logical time (also known as virtual time) deals with a qualitative notion of time and is expressed using event ordering relations such as before, after, sometimes, eventually, precedes, succeeds, etc. While dealing with logical time, time readings from a physical clock are not necessary for ordering the events. As an example, consider the following part of the behavior of library automation software used to automate the book-keeping activities of a college library: After a query book command is given by the user, details of all matching Version 2 EE IIT, Kharagpur 4
books are displayed by the software. In this example, the events issue of query book command and display of results are logically ordered in terms of which events follow the other. But, no quantitative expression of time was required. Clearly, this example behavior is devoid of any real-time considerations. We are now in a position to define what a real-time system is: A system is called a real-time system, when we need quantitative expression of time (i.e. real-time) to describe the behavior of the system. Remember that in this definition of a real-time system, it is implicit that all quantitative time measurements are carried out using a physical clock. A chemical plant, whose part behavior description is - when temperature of the reaction chamber attains certain predetermined temperature value, say 250oC, the system automatically switches off the heater within say 30 milliseconds - is clearly a real-time system. Our examples so far were restricted to the description of partial behavior of systems. The complete behavior of a system can be described by listing its response to various external stimuli. It may be noted that all the clauses in the description of the behavior of a real-time system need not involve quantitative measures of time. That is, large parts of a description of the behavior of a system may not have any quantitative expressions of time at all, and still qualify as a real-time system. Any system whose behavior can completely be described without using any quantitative expression of time is of course not a real-time system.
1.2. Applications of Real-Time Systems

Real-time systems have of late, found applications in wide ranging areas. In the following, we list some of the prominent areas of application of real-time systems and in each identified case, we discuss a few example applications in some detail. As we can imagine, the list would become very vast if we try to exhaustively list all areas of applications of realtime systems. We have therefore restricted our list to only a handful of areas, and out of these we have explained only a few selected applications to conserve space. We have pointed out the quantitative notions of time used in the discussed applications. The examples we present are important to our subsequent discussions and would be referred to in the later chapters whenever required.
1.2.1. Industrial Applications

Industrial applications constitute a major usage area of real-time systems. A few examples of industrial applications of real-time systems are: process control systems, industrial automation systems, SCADA applications, test and measurement equipments, and robotic equipments. Example 1: Chemical Plant Control Chemical plant control systems are essentially a type of process control application. In an automated chemical plant, a real-time computer periodically monitors plant conditions. The plant conditions are determined based on current readings of pressure, temperature, and chemical concentration of the reaction chamber. These parameters are sampled periodically. Based on the values sampled at any time, the automation system decides on the corrective actions necessary at that instant to maintain the chemical reaction at a certain rate. Version 2 EE IIT, Kharagpur 5
Each time the plant conditions are sampled, the automation system should decide on the exact instantaneous corrective actions required such as changing the pressure, temperature, or chemical concentration and carry out these actions within certain predefined time bounds. Typically, the time bounds in such a chemical plant control application range from a few micro seconds to several milliseconds. Example 2: Automated Car Assembly Plant An automated car assembly plant is an example of a plant automation system. In an automated car assembly plant, the work product (partially assembled car) moves on a conveyor belt (see Fig. 28.1). By the side of the conveyor belt, several workstations are placed. Each workstation performs some specific work on the work product such as fitting engine, fitting door, fitting wheel, and spray painting the car, etc. as it moves on the conveyor belt. An empty chassis is introduced near the first workstation on the conveyor belt. A fully assembled car comes out after the work product goes past all the workstations. At each workstation, a sensor senses the arrival of the next partially assembled product. As soon as the partially assembled product is sensed, the workstation begins to perform its work on the work product. The time constraint imposed on the workstation computer is that the workstation must complete its work before the work product moves away to the next workstation. The time bounds involved here are typically of the order of a few hundreds of milliseconds.
Chassis
Fit engine
Fit door
Fit wheel
Spray paint Finished car
Conveyor Belt Fig. 28.1 Schematic Representation of an Automated Car Assembly Plant Example 3: Supervisory Control And Data Acquisition (SCADA) SCADA are a category of distributed control systems being used in many industries. A SCADA system helps monitor and control a large number of distributed events of interest. In SCADA systems, sensors are scattered at various geographic locations to collect raw data (called events of interest). These data are then processed and stored in a real-time database. The database models (or reflects) the current state of the environment. The database is updated frequently to make it a realistic model of the up-to-date state of the environment. An example of a SCADA application is an Energy Management System (EMS). An EMS helps to carry out load balancing in an electrical energy distribution network. The EMS senses the energy consumption at the distribution points and computes the load across different phases of power supply. It also helps dynamically balance the load. Another example of a SCADA system is a system that monitors and controls traffic in a computer network. Depending on the sensed load in different segments of the network, the SCADA system makes the router change its traffic routing policy dynamically. The time constraint in such a SCADA
application is that the sensors must sense the system state at regular intervals (say every few milliseconds) and the same must be processed before the next state is sensed.
1.2.2. Medical
A few examples of medical applications of real-time systems are: robots, MRI scanners, radiation therapy equipments, bedside monitors, and computerized axial tomography (CAT). Example 4: Robot Used in Recovery of Displaced Radioactive Material Robots have become very popular nowadays and are being used in a wide variety of medical applications. An application that we discuss here is a robot used in retrieving displaced radioactive materials. Radioactive materials such as Cobalt and Radium are used for treatment of cancer. At times during treatment, the radioactive Cobalt (or Radium) gets dislocated and falls down. Since human beings can not come near a radioactive material, a robot is used to restore the radioactive material to its proper position. The robot walks into the room containing the radioactive material, picks it up, and restores it to its proper position. The robot has to sense its environment frequently and based on this information, plan its path. The real-time constraint on the path planning task of the robot is that unless it plans the path fast enough after an obstacle is detected, it may collide with it. The time constraints involved here are of the order of a few milliseconds.
1.2.3. Peripheral equipments

A few examples of peripheral equipments that contain embedded real-time systems are: laser printers, digital copiers, fax machines, digital cameras, and scanners. Example 5: Laser Printer Most laser printers have powerful microprocessors embedded in them to control different activities associated with printing. The important activities that a microprocessor embedded in a laser printer performs include the following: getting data from the communication port(s), typesetting fonts, sensing paper jams, noticing when the printer runs out of paper, sensing when the user presses a button on the control panel, and displaying various messages to the user. The most complex activity that the microprocessor performs is driving the laser engine. The basic command that a laser engine supports is to put a black dot on the paper. However, the laser engine has no idea about the exact shapes of different fonts, font sizes, italic, underlining, boldface, etc. that it may be asked to print. The embedded microprocessor receives print commands on its input port and determines how the dots can be composed to achieve the desired document and manages printing the exact shapes through a series of dot commands issued to the laser engine. The time constraints involved here are of the order of a few milli seconds.
1.2.4. Automotive and Transportation

A few examples of automotive and transportation applications of real-time systems are: automotive engine control systems, road traffic signal control, air-traffic control, high-speed train control, car navigation systems, and MPFI engine control systems. Version 2 EE IIT, Kharagpur 7
Example 6: Multi-Point Fuel Injection (MPFI) System An MPFI system is an automotive engine control system. A conceptual diagram of a car embedding an MPFI system is shown in Fig.28.2. An MPFI is a real-time system that controls the rate of fuel injection and allows the engine to operate at its optimal efficiency. In older models of cars, a mechanical device called the carburetor was used to control the fuel injection rate to the engine. It was the responsibility of the carburetor to vary the fuel injection rate depending on the current speed of the vehicle and the desired acceleration. Careful experiments have suggested that for optimal energy output, the required fuel injection rate is highly nonlinear with respect to the vehicle speed and acceleration. Also, experimental results show that the precise fuel injection through multiple points is more effective than single point injection. In MPFI engines, the precise fuel injection rate at each injection point is determined by a computer. An MPFI system injects fuel into individual cylinders resulting in better power balance among the cylinders as well as higher output from each one along with faster throttle response. The processor primarily controls the ignition timing and the quantity of fuel to be injected. The latter is achieved by controlling the duration for which the injector valve is open popularly known as pulse width. The actions of the processor are determined by the data gleaned from sensors located all over the engine. These sensors constantly monitor the ambient temperature, the engine coolant temperature, exhaust temperature, emission gas contents, engine rpm (speed), vehicle road speed, crankshaft position, camshaft position, etc. An MPFI engine with even an 8-bit computer does a much better job of determining an accurate fuel injection rate for given values of speed and acceleration compared to a carburetor-based system. An MPFI system not only makes a vehicle more fuel efficient, it also minimizes pollution by reducing partial combustion. Multi Point Fuel Injection (MPFI) System
Computer Fig. 28.2 A Real-Time System Embedded in an MPFI Car
1.2.5. Telecommunication Applications

A few example uses of real-time systems in telecommunication applications are: cellular systems, video conferencing, and cable modems. Example 7: A Cellular System Cellular systems have become a very popular means of mobile communication. A cellular system usually maps a city into cells. In each cell, a base station monitors the mobile handsets present in the cell. Besides, the base station performs several tasks such as locating a user, sending and receiving control messages to a handset, keeping track of Version 2 EE IIT, Kharagpur 8
call details for billing purposes, and hand-off of calls as the mobile moves. Call hand-off is required when a mobile moves away from a base station. As a mobile moves away, its received signal strength (RSS) falls at the base station. The base station monitors this and as soon as the RSS falls below a certain threshold value, it hands-off the details of the on-going call of the mobile to the base station of the cell to which the mobile has moved. The hand-off must be completed within a sufficiently small predefined time interval so that the user does not feel any temporary disruption of service during the hand-off. Typically call hand-off is required to be achieved within a few milliseconds.
1.2.6. Aerospace
A few important use of real-time systems in aerospace applications are: avionics, flight simulation, airline cabin management systems, satellite tracking systems, and computer on-board an aircraft. Example 8: Computer On-board an Aircraft In many modern aircrafts, the pilot can select an auto pilot option. As soon as the pilot switches to the auto pilot mode, an on-board computer takes over all controls of the aircraft including navigation, take-off, and landing of the aircraft. In the auto pilot mode, the computer periodically samples velocity and acceleration of the aircraft. From the sampled data, the on-board computer computes X, Y, and Z co-ordinates of the current aircraft position and compares them with the pre-specified track data. Before the next sample values are obtained, it computes the deviation from the specified track values and takes any corrective actions that may be necessary. In this case, the sampling of the various parameters, and their processing need to be completed within a few micro seconds.
1.2.7. Internet and Multimedia Applications

Important use of real-time systems in multimedia and Internet applications include: video conferencing and multimedia multicast, Internet routers and switches. Example 9: Video Conferencing In a video conferencing application, video and audio signals are generated by cameras and microphones respectively. The data are sampled at a certain pre-specified frame rate. These are then compressed and sent as packets to the receiver over a network. At the receiver-end, packets are ordered, decompressed, and then played. The time constraint at the receiver-end is that the receiver must process and play the received frames at a predetermined constant rate. Thus if thirty frames are to be shown every minute, once a frame play-out is complete, the next frame must be played within two seconds.
1.2.8. Consumer Electronics

Consumer electronics area abounds numerous applications of real-time systems. A few sample applications of real-time systems in consumer electronics are: set-top boxes, audio equipment, Internet telephony, microwave ovens, intelligent washing machines, home security systems, air conditioning and refrigeration, toys, and cell phones. Version 2 EE IIT, Kharagpur 9
Example 10: Cell Phones Cell phones are possibly the fastest growing segment of consumer electronics. A cell phone at any point of time carries out a number of tasks simultaneously. These include: converting input voice to digital signals by deploying digital signal processing (DSP) techniques, converting electrical signals generated by the microphone to output voice signals, and sampling incoming base station signals in the control channel. A cell phone responds to the communications received from the base station within certain specified time bounds. For example, a base station might command a cell phone to switch the on-going communication to a specific frequency. The cell phone must comply with such commands from the base station within a few milliseconds.
1.2.9. Defense Applications

Typical defense applications of real-time systems include: missile guidance systems, antimissile systems, satellite-based surveillance systems. Example 11: Missile Guidance System A guided missile is one that is capable of sensing the target and homes onto it. Homing becomes easy when the target emits either electrical or thermal radiation. In a missile guidance system, missile guidance is achieved by a computer mounted on the missile. The mounted computer computes the deviation from the required trajectory and effects track changes of the missile to guide it onto the target. The time constraint on the computer-based guidance system is that the sensing and the track correction tasks must be activated frequently enough to keep the missile from diverging from the target. The target sensing and track correction tasks are typically required to be completed within a few hundreds of microseconds or even lesser time depending on the speed of the missile and the type of the target.
1.2.10. Miscellaneous Applications

Besides the areas of applications already discussed, real-time systems have found numerous other applications in our every day life. An example of such an application is a railway reservation system. Example 12: Railway Reservation System In a railway reservation system, a central repository maintains the up-to-date data on booking status of various trains. Ticket booking counters are distributed across different geographic locations. Customers queue up at different booking counters and submit their reservation requests. After a reservation request is made at a counter, it normally takes only a few seconds for the system to confirm the reservation and print the ticket. A real-time constraint in this application is that once a request is made to the computer, it must print the ticket or display the seat unavailability message before the average human response time (about 20 seconds) expires, so that the customers do not notice any delay and get a feeling of having obtained instant results. However, as we discuss a little later (in Section 1.6), this application is an example of a category of applications that is in some aspects different from the other Version 2 EE IIT, Kharagpur 10
discussed applications. For example, even if the results are produced just after 20 seconds, nothing untoward is going to happen - this may not be the case with the other discussed applications.
1.3. A Basic Model of a Real-Time System

We have already pointed out that this book confines itself to the software issues in real-time systems. However, in order to be able to see the software issues in a proper perspective, we need to have a basic conceptual understanding of the underlying hardware. We therefore in this section try to develop a broad understanding of high level issues of the underlying hardware in a real-time system. For a more detailed study of the underlying hardware issues, we refer the reader to [2]. Fig.28.3 shows a simple model of a real-time system in terms of its important functional blocks. Unless otherwise mentioned, all our subsequent discussions would implicitly assume such a model. Observe that in Fig. 28.3, the sensors are interfaced with the input conditioning block, which in turn is connected to the input interface. The output interface, output conditioning, and the actuator are interfaced in a complementary manner. In the following, we briefly describe the roles of the different functional blocks of a real-time system. Sensor: A sensor converts some physical characteristic of its environment into electrical signals. An example of a sensor is a photo-voltaic cell which converts light energy into electrical energy. A wide variety of temperature and pressure sensors are also used. A temperature sensor typically operates based on the principle of a thermocouple. Temperature sensors based on many other physical principles also exist. For example, one type of temperature sensor employs the principle of variation of electrical resistance with temperature (called a varistor). A pressure sensor typically operates based on the piezoelectricity principle. Pressure sensors based on other physical principles also exist. Input Conditioning Unit Input Interface Real-Time Computer Output Conditioning Unit Output Interface Operators Fig. 28.3 A Model of a Real-Time System Actuator: An actuator is any device that takes its inputs from the output interface of a computer and converts these electrical signals into some physical actions on its environment. The physical actions may be in the form of motion, change of thermal, electrical, pneumatic, or physical characteristics of some objects. A popular actuator is a motor. Heaters are also very commonly used. Besides, several hydraulic and pneumatic actuators are also popular. Signal Conditioning Units: The electrical signals produced by a computer can rarely be used to directly drive an actuator. The computer signals usually need conditioning Version 2 EE IIT, Kharagpur 11 Human Computer Interface
Sensor
Actuator
before they can be used by the actuator. This is termed output conditioning. Similarly, input conditioning is required to be carried out on sensor signals before they can be accepted by the computer. For example, analog signals generated by a photo-voltaic cell are normally in the milli-volts range and need to be conditioned before they can be processed by a computer. The following are some important types of conditioning carried out on raw signals generated by sensors and digital signals generated by computers: 1. Voltage Amplification: Voltage amplification is normally required to be carried out to match the full scale sensor voltage output with the full scale voltage input to the interface of a computer. For example, a sensor might produce voltage in the millivolts range, whereas the input interface of a computer may require the input signal level to be of the order of a volt. 2. Voltage Level Shifting: Voltage level shifting is often required to align the voltage level generated by a sensor with that acceptable to the computer. For example, a sensor may produce voltage in the range -0.5 to +0.5 volt, whereas the input interface of the computer may accept voltage only in the range of 0 to 1 volt. In this case, the sensor voltage must undergo level shifting before it can be used by the computer. 3. Frequency Range Shifting and Filtering: Frequency range shifting is often used to reduce the noise components in a signal. Many types of noise occur in narrow bands and the signal must be shifted from the noise bands so that noise can be filtered out. 4. Signal Mode Conversion: A type of signal mode conversion that is frequently carried out during signal conditioning involves changing direct current into alternating current and vice-versa. Another type signal mode conversion that is frequently used is conversion of analog signals to a constant amplitude pulse train such that the pulse rate or pulse width is proportional to the voltage level. Conversion of analog signals to a pulse train is often necessary for input to systems such as transformer coupled circuits that do not pass direct current.
From Processor Bus
D/A register
D/A converter
To output signal conditioning unit
Fig. 28.4 An Output Interface Interface Unit: Normally commands from the CPU are delivered to the actuator through an output interface. An output interface converts the stored voltage into analog form and then outputs this to the actuator circuitry. This of course would require the value generated to be written on a register (see Fig. 28.4). In an output interface, in order to produce an analog output, the CPU selects a data register of the output interface and writes the necessary data to it. The two main functional blocks of an output interface are shown in Fig. 28.4. The interface takes care of the buffering and the handshake control aspects. Analog to digital conversion is frequently deployed in an input interface. Similarly, digital to analog conversion is frequently used in an output interface. In the following, we discuss the important steps of analog to digital signal conversion (ADC). Version 2 EE IIT, Kharagpur 12
Analog to Digital Conversion: Digital computers can not process analog signals. Therefore, analog signals need to be converted to digital form. Analog signals can be converted to digital form using a circuitry whose block diagram is shown in Fig. 28.7. Using the block diagram shown in Fig. 28.7, analog signals are normally converted to digital form through the following two main steps:
Voltage
Time Fig. 28.5 Continuous Analog Voltage
Sample the analog signal (shown in Fig. 28.5) at regular intervals. This sampling can be done by a capacitor circuitry that stores the voltage levels. The stored voltage levels can be made discrete. After sampling the analog signal (shown in Fig. 28.5), a step waveform as shown in Fig. 28.6 is obtained. Convert the stored value to a binary number by using an analog to digital converter (ADC) as shown in Fig. 28.7 and store the digital value in a register.
Voltage
Time Fig. 28.6 Analog Voltage Converted to Discrete Form
Analog Voltage from signal conditions
Sample and Hold A/D Converter Data Register 16 Binary digits
Fig. 28.7 Conversion of an Analog Signal to a 16 bit Binary Number Digital to analog conversion can be carried out through a complementary set of operations. We leave it as an exercise to the reader to figure out the details of the circuitry that can perform the digital to analog conversion (DAC).
1.4. Characteristics of Real-Time Systems

We now discuss a few key characteristics of real-time systems. These characteristics distinguish real-time systems from non-real-time systems. However, the reader may note that all the discussed characteristics may not be applicable to every real-time system. Real-time systems cover such an enormous range of applications and products that a generalization of the characteristics into a set that is applicable to each and every system is difficult. Different categories of real-time systems may exhibit the characteristics that we identify to different extents or may not even exhibit some of the characteristics at all. 1. Time constraints: Every real-time task is associated with some time constraints. One form of time constraints that is very common is deadlines associated with tasks. A task deadline specifies the time before which the task must complete and produce the results. Other types of timing constraints are delay and duration (see Section 1.7). It is the responsibility of the real-time operating system (RTOS) to ensure that all tasks meet their respective time constraints. We shall examine in later chapters how an RTOS can ensure that tasks meet their respective timing constraints through appropriate task scheduling strategies. 2. New Correctness Criterion: The notion of correctness in real-time systems is different from that used in the context of traditional systems. In real-time systems, correctness implies not only logical correctness of the results, but the time at which the results are produced is important. A logically correct result produced after the deadline would be considered as an incorrect result.
Actuators
Real Time Computer
Environment
Sensors
Fig. 28.8 A Schematic Representation of an Embedded Real-Time System 3. Embedded: A vast majority of real-time systems are embedded in nature [3]. An embedded computer system is physically embedded in its environment and often controls it. Fig. 28.8 shows a schematic representation of an embedded system. As shown in Fig. 28.8, the sensors of the real-time computer collect data from the environment, and pass them on to the realtime computer for processing. The computer, in turn passes information (processed data) to the actuators to carry out the necessary work on the environment, which results in controlling some characteristics of the environment. Several examples of embedded systems were discussed in Section 1.2. An example of an embedded system that we would often refer is the Multi-Point Fuel Injection (MPFI) system discussed in Example 6 of Sec. 1.2. 4. Safety-Criticality: For traditional non-real-time systems safety and reliability are independent issues. However, in many real-time systems these two issues are intricately bound together making them safety-critical. Note that a safe system is one that does not cause any damage even when it fails. A reliable system on the other hand, is one that can operate for long durations of time without exhibiting any failures. A safety-critical system is required to be highly reliable since any failure of the system can cause extensive damages. We elaborate this issue in Section 1.5. 5. Concurrency: A real-time system usually needs to respond to several independent events within very short and strict time bounds. For instance, consider a chemical plant automation system (see Example1 of Sec. 1.2), which monitors the progress of a chemical reaction and controls the rate of reaction by changing the different parameters of reaction such as pressure, temperature, chemical concentration. These parameters are sensed using sensors fixed in the chemical reaction chamber. These sensors may generate data asynchronously at different rates. Therefore, the real-time system must process data from all the sensors concurrently, otherwise signals may be lost and the system may malfunction. These systems can be considered to be non-deterministic, since the behavior of the system depends on the exact timing of its inputs. A non-deterministic computation is one in which two runs using the same set of input data can produce two distinct sets of output data in the two runs. 6. Distributed and Feedback Structure: In many real-time systems, the different components of the system are naturally distributed across widely spread geographic locations. In such Version 2 EE IIT, Kharagpur 15
systems, the different events of interest arise at the geographically separate locations. Therefore, these events may often have to be handled locally and responses produced to them to prevent overloading of the underlying communication network. Therefore, the sensors and the actuators may be located at the places where events are generated. An example of such a system is a petroleum refinery plant distributed over a large geographic area. At each data source, it makes good design sense to locally process the data before being passed on to a central processor. Many distributed as well as centralized real-time systems have a feedback structure as shown in Fig. 28.9. In these systems, the sensors usually sense the environment periodically. The sensed data about the environment is processed to determine the corrective actions necessary. The results of the processing are used to carry out the necessary corrective actions on the environment through the actuators, which in turn again cause a change to the required characteristics of the controlled environment, and so on.
Actuator Actuator Processing
Sensor Sensor Processing
Computation
Environment Fig. 28.9 Feedback Structure of Real-Time Systems 7. Task Criticality: Task criticality is a measure of the cost of failure of a task. Task criticality is determined by examining how critical are the results produced by the task to the proper functioning of the system. A real-time system may have tasks of very different criticalities. It is therefore natural to expect that the criticalities of the different tasks must be taken into consideration while designing for fault-tolerance. The higher the criticality of a task, the more reliable it should be made. Further, in the event of a failure of a highly critical task, immediate failure detection and recovery are important. However, it should be realized that task priority is a different concept and task criticality does not solely determine the task priority or the order in which various tasks are to be executed (these issues shall be elaborated in the later chapters). 8. Custom Hardware: A real-time system is often implemented on custom hardware that is specifically designed and developed for the purpose. For example, a cell phone does not use traditional microprocessors. Cell phones use processors which are tiny, supporting only those processing capabilities that are really necessary for cell phone operation and specifically designed to be power-efficient to conserve battery life. The capabilities of the processor used in a cell phone are substantially different from that of a general purpose processor. Another example is the embedded processor in an MPFI car. In this case, the processor used need not be a powerful general purpose processor such as a Pentium or an Athlon processor. Some of the most powerful computers used in MPFI engines are 16- or 32-bit processors running at approximately 40 MHz. However, unlike the conventional PCs, a processor used Version 2 EE IIT, Kharagpur 16
in these car engines do not deal with processing frills such as screen-savers or a dozen of different applications running at the same time. All that the processor in an MPFI system needs to do is to compute the required fuel injection rate that is most efficient for a given speed and acceleration. 9. Reactive: Real-time systems are often reactive. A reactive system is one in which an ongoing interaction between the computer and the environment is maintained. Ordinary systems compute functions on the input data to generate the output data (See Fig. 28.10 (a)). In other words, traditional systems compute the output data as some function of the input data. That is, output data can mathematically be expressed as: output data = (input data). For example, if some data I1 is given as the input, the system computes O1 as the result O1 = (I1). To elaborate this concept, consider an example involving library automation software. In a library automation software, when the query book function is invoked and Real-Time Systems is entered as the input book name, then the software displays Author name: R. Mall, Rack Number: 001, Number of Copies: 1.
Input data
Output data
Starting Parameters
Reactive System
Traditional System (a) Fig. 28.10 Traditional versus Reactive Systems
(b)
In contrast to the traditional computation of the output as a simple function of the input data, real-time systems do not produce any output data but enter into an on-going interaction with their environment. In each interaction step, the results computed are used to carry out some actions on the environment. The reaction of the environment is sampled and is fed back to the system. Therefore the computations in a real-time system can be considered to be non-terminating. This reactive nature of real-time systems is schematically shown in the Fig. 28.10(b). 10. Stability: Under overload conditions, real-time systems need to continue to meet the deadlines of the most critical tasks, though the deadlines of non-critical tasks may not be met. This is in contrast to the requirement of fairness for traditional systems even under overload conditions. 11. Exception Handling: Many real-time systems work round-the-clock and often operate without human operators. For example, consider a small automated chemical plant that is set up to work non-stop. When there are no human operators, taking corrective actions on a failure becomes difficult. Even if no corrective actions can be immediate taken, it is desirable that a failure does not result in catastrophic situations. A failure should be detected and the system should continue to operate in a gracefully degraded mode rather than shutting off abruptly.
1.5. Safety and Reliability

In traditional systems, safety and reliability are normally considered to be independent issues. It is therefore possible to identify a traditional system that is safe and unreliable and systems that are reliable but unsafe. Consider the following two examples. Word-processing software may not be very reliable but is safe. A failure of the software does not usually cause any significant damage or financial loss. It is therefore an example of an unreliable but safe system. On the other hand, a hand gun can be unsafe but is reliable. A hand gun rarely fails. A hand gun is an unsafe system because if it fails for some reason, it can misfire or even explode and cause significant damage. It is an example of an unsafe but reliable system. These two examples show that for traditional systems, safety and reliability are independent concerns - it is therefore possible to increase the safety of a system without affecting its reliability and vice versa. In real-time systems on the other hand, safety and reliability are coupled together. Before analyzing why safety and reliability are no longer independent issues in real-time systems, we need to first understand what exactly is meant by a fail-safe state. A fail-safe state of a system is one which if entered when the system fails, no damage would result. To give an example, the fail-safe state of a word processing program is one where the document being processed has been saved onto the disk. All traditional non real-time systems do have one or more fail-safe states which help separate the issues of safety and reliability - even if a system is known to be unreliable, it can always be made to fail in a fail-safe state, and consequently it would still be considered to be a safe system. If no damage can result if a system enters a fail-safe state just before it fails, then through careful transit to a fail-safe state upon a failure, it is possible to turn an extremely unreliable and unsafe system into a safe system. In many traditional systems this technique is in fact frequently adopted to turn an unreliable system into a safe system. For example, consider a traffic light controller that controls the flow of traffic at a road intersection. Suppose the traffic light controller fails frequently and is known to be highly unreliable. Though unreliable, it can still be considered safe if whenever a traffic light controller fails, it enters a fail-safe state where all the traffic lights are orange and blinking. This is a fail-safe state, since the motorists on seeing blinking orange traffic light become aware that the traffic light controller is not working and proceed with caution. Of course, a fail-safe state may not be to make all lights green, in which case severe accidents could occur. Similarly, all lights turned red is also not a fail-safe state - it may not cause accidents, but would bring all traffic to a stand still leading to traffic jams. However, in many real-time systems there are no fail-safe states. Therefore, any failure of the system can cause severe damages. Such systems are said to be safety-critical systems. A safety-critical system is one whose failure can cause severe damages. An example of a safety-critical system is a navigation system on-board an aircraft. An onboard navigation system has no fail-safe states. When the computer on-board an aircraft fails, a fail-safe state may not be one where the engine is switched-off! In a safety-critical system, the absence of fail-safe states implies that safety can only be ensured through increased reliability. Thus, for safety-critical systems the issues of safety and reliability become interrelated - safety
can only be ensured through increased reliability. It should now be clear why safety-critical systems need to be highly reliable. Just to give an example of the level of reliability required of safety-critical systems, consider the following. For any fly-by-wire aircraft, most of its vital parts are controlled by a computer. Any failure of the controlling computer is clearly not acceptable. The standard reliability requirement for such aircrafts is at most 1 failure per 109 flying hours (that is, a million years of continuous flying!). We examine how a highly reliable system can be developed in the next section.
1.5.1. How to Achieve High Reliability?

If you are asked by your organization to develop software which should be highly reliable, how would you proceed to achieve it? Highly reliable software can be developed by adopting all of the following three important techniques:
Error Avoidance: For achieving high reliability, every possibility of occurrence of errors should be minimized during product development as much as possible. This can be achieved by adopting a variety of means: using well-founded software engineering practices, using sound design methodologies, adopting suitable CASE tools, and so on. Error Detection and Removal: In spite of using the best available error avoidance techniques, many errors still manage to creep into the code. These errors need to be detected and removed. This can be achieved to a large extent by conducting thorough reviews and testing. Once errors are detected, they can be easily fixed. Fault-Tolerance: No matter how meticulously error avoidance and error detection techniques are used, it is virtually impossible to make a practical software system entirely error-free. Few errors still persist even after carrying out thorough reviews and testing. Errors cause failures. That is, failures are manifestation of the errors latent in the system. Therefore to achieve high reliability, even in situations where errors are present, the system should be able to tolerate the faults and compute the correct results. This is called fault-tolerance. Fault-tolerance can be achieved by carefully incorporating redundancy. Legend: C1, C2, C3: Redundant copies of the same component C1 C2 V O T I N G
Majority Result
C3
Fig. 28.11 Schematic Representation of TMR
It is relatively simple to design a hardware equipment to be fault-tolerant. The following are two methods that are popularly used to achieve hardware fault-tolerance:
Error Detection and Removal: In spite of using the best available error avoidance techniques, many errors still manage to creep into the code. These errors need to be detected and removed. This can be achieved to a large extent by conducting thorough reviews and testing. Once errors are detected, they can be easily fixed. Built In Self Test (BIST): In BIST, the system periodically performs self tests of its components. Upon detection of a failure, the system automatically reconfigures itself by switching out the faulty component and switching in one of the redundant good components. Triple Modular Redundancy (TMR): In TMR, as the name suggests, three redundant copies of all critical components are made to run concurrently (see Fig. 28.11). Observe that in Fig. 28.11, C1, C2, and C3 are the redundant copies of the same critical component. The system performs voting of the results produced by the redundant components to select the majority result. TMR can help tolerate occurrence of only a single failure at any time. (Can you answer why a TMR scheme can effectively tolerate a single component failure only?). An assumption that is implicit in the TMR technique is that at any time only one of the three redundant components can produce erroneous results. The majority result after voting would be erroneous if two or more components can fail simultaneously (more precisely, before a repair can be carried out). In situations where two or more components are likely to fail (or produce erroneous results), then greater amounts of redundancies would be required to be incorporated. A little thinking can show that at least 2n+1 redundant components are required to tolerate simultaneous failures of n component.
As compared to hardware, software fault-tolerance is much harder to achieve. To investigate the reason behind this, let us first discuss the techniques currently being used to achieve software fault-tolerance. We do this in the following subsection.
1.6. Software Fault-Tolerance Techniques

Two methods are now popularly being used to achieve software fault-tolerance: N-version programming and recovery block techniques. These two techniques are simple adaptations of the basic techniques used to provide hardware fault-tolerance. We discuss these two techniques in the following. N-Version Programming: This technique is an adaptation of the TMR technique for hardware fault-tolerance. In the N-version programming technique, independent teams develop N different versions (value of N depends on the degree of fault-tolerance required) of a software component (module). The redundant modules are run concurrently (possibly on redundant hardware). The results produced by the different versions of the module are subjected to voting at run time and the result on which majority of the components agree is accepted. The central idea behind this scheme is that independent teams would commit different types of mistakes, which would be eliminated when the results produced by them are subjected to voting. However, this scheme is not very successful in achieving fault-tolerance, and the problem can be attributed Version 2 EE IIT, Kharagpur 20
to statistical correlation of failures. Statistical correlation of failures means that even though individual teams worked in isolation to develop the different versions of a software component, still the different versions fail for identical reasons. In other words, the different versions of a component show similar failure patterns. This does not mean that the different modules developed by independent programmers, after all, contain identical errors. The reason for this is not far to seek, programmers commit errors in those parts of a problem which they perceive to be difficult - and what is difficult to one team is usually difficult to all teams. So, identical errors remain in the most complex and least understood parts of a software component. Recovery Blocks: In the recovery block scheme, the redundant components are called try blocks. Each try block computes the same end result as the others but is intentionally written using a different algorithm compared to the other try blocks. In N-version programming, the different versions of a component are written by different teams of programmers, whereas in recovery block different algorithms are used in different try blocks. Also, in contrast to the Nversion programming approach where the redundant copies are run concurrently, in the recovery block approach they are (as shown in Fig. 28.12) run one after another. The results produced by a try block are subjected to an acceptance test (see Fig. 28.12). If the test fails, then the next try block is tried. This is repeated in a sequence until the result produced by a try block successfully passes the acceptance test. Note that in Fig. 28.12 we have shown acceptance tests separately for different try blocks to help understand that the tests are applied to the try blocks one after the other, though it may be the case that the same test is applied to each try block. Legend: TB: try block
Component
Input
TB1 Result test
TB2
TB3
TB4 Result
Exception
test
test
test Failure
Success Failure
Success Result
Fig. 28.12 A Software Fault-Tolerance Scheme Using Recovery Blocks As was the case with N-version programming, the recovery blocks approach also does not achieve much success in providing effective fault-tolerance. The reason behind this is again statistical correlation of failures. Different try blocks fail for identical reasons as was explained in case of N-version programming approach. Besides, this approach suffers from a further limitation that it can only be used if the task deadlines are much larger than the task computation times (i.e. tasks have large laxity), since the different try blocks are put to execution one after the other when failures occur. The recovery block approach poses special difficulty when used with real-time tasks with very short slack time (i.e. short deadline and considerable execution time), Version 2 EE IIT, Kharagpur 21
as the try blocks are tried out one after the other deadlines may be missed. Therefore, in such cases the later try-blocks usually contain only skeletal code. Check points Acceptance test Progress of computation Rollback recovery Fig. 28.13 Checkpointing and Rollback Recovery Of course, it is possible that the later try blocks contain only skeletal code, produce only approximate results and therefore take much less time for computation than the first try block. Checkpointing and Rollback Recovery: Checkpointing and roll-back recovery is another popular technique to achieve fault-tolerance. In this technique as the computation proceeds, the system state is tested each time after some meaningful progress in computation is made. Immediately after a state-check test succeeds, the state of the system is backed up on a stable storage (see Fig. 28.13). In case the next test does not succeed, the system can be made to rollback to the last checkpointed state. After a rollback, from a checkpointed state a fresh computation can be initiated. This technique is especially useful, if there is a chance that the system state may be corrupted as the computation proceeds, such as data corruption or processor failure.
1.7. Types of Real-Time Tasks

We have already seen that a real-time task is one for which quantitative expressions of time are needed to describe its behavior. This quantitative expression of time usually appears in the form of a constraint on the time at which the task produces results. The most frequently occurring timing constraint is a deadline constraint which is used to express that a task is required to compute its results within some deadline. We therefore implicitly assume only deadline type of timing constraints on tasks in this section, though other types of constraints (as explained in Sec. .) may occur in practice. Real-time tasks can be classified into the following three broad categories: A real-time task can be classified into either hard, soft, or firm real-time task depending on the consequences of a task missing its deadline. It is not necessary that all tasks of a real-time application belong to the same category. It is possible that different tasks of a real-time system can belong to different categories. We now elaborate these three types of real-time tasks.
1.7.1. Hard Real-Time Tasks

A hard real-time task is one that is constrained to produce its results within certain predefined time bounds. The system is considered to have failed whenever any of its hard real-time tasks does not produce its required results before the specified time bound.
An example of a system having hard real-time tasks is a robot. The robot cyclically carries out a number of activities including communication with the host system, logging all completed activities, sensing the environment to detect any obstacles present, tracking the objects of interest, path planning, effecting next move, etc. Now consider that the robot suddenly encounters an obstacle. The robot must detect it and as soon as possible try to escape colliding with it. If it fails to respond to it quickly (i.e. the concerned tasks are not completed before the required time bound) then it would collide with the obstacle and the robot would be considered to have failed. Therefore detecting obstacles and reacting to it are hard real-time tasks. Another application having hard real-time tasks is an anti-missile system. An anti-missile system consists of the following critical activities (tasks). An anti-missile system must first detect all incoming missiles, properly position the anti-missile gun, and then fire to destroy the incoming missile before the incoming missile can do any damage. All these tasks are hard realtime in nature and the anti-missile system would be considered to have failed, if any of its tasks fails to complete before the corresponding deadlines. Applications having hard real-time tasks are typically safety-critical (Can you think an example of a hard real-time system that is not safety-critical?1 ) This means that any failure of a real-time task, including its failure to meet the associated deadlines, would result in severe consequences. This makes hard real-time tasks extremely critical. Criticality of a task can range from extremely critical to not so critical. Task criticality therefore is a different dimension than hard or soft characterization of a task. Criticality of a task is a measure of the cost of a failure the higher the cost of failure, the more critical is the task. For hard real-time tasks in practical systems, the time bounds usually range from several micro seconds to a few milli seconds. It may be noted that a hard real-time task does not need to be completed within the shortest time possible, but it is merely required that the task must complete within the specified time bound. In other words, there is no reward in completing a hard real-time task much ahead of its deadline. This is an important observation and this would take a central part in our discussions on task scheduling in the next two chapters.
1.7.2. Firm Real-Time Tasks

Every firm real-time task is associated with some predefined deadline before which it is required to produce its results. However, unlike a hard real-time task, even when a firm real-time task does not complete within its deadline, the system does not fail. The late results are merely discarded. In other words, the utility of the results computed by a firm real-time task becomes zero after the deadline. Fig. 28.14 schematically shows the utility of the results produced by a firm real-time task as a function of time. In Fig. 28.14 it can be seen that if the response time of a task exceeds the specified deadline, then the utility of the results becomes zero and the results are discarded.
Some computer games have hard real-time tasks; these are not safety-critical though. Whenever a timing constraint is not met, the game may fail, but the failure may at best be a mild irritant to the user.
Utility 100% Deadline
Response Time
Fig. 28.14 Utility of Result of a Firm Real-Time Task with Time Firm real-time tasks typically abound in multimedia applications. The following are two examples of firm real- time tasks:
Video conferencing: In a video conferencing application, video frames and the accompanying audio are converted into packets and transmitted to the receiver over a network. However, some frames may get delayed at different nodes during transit on a packet-switched network due to congestion at different nodes. This may result in varying queuing delays experienced by packets traveling along different routes. Even when packets traverse the same route, some packets can take much more time than the other packets due to the specific transmission strategy used at the nodes. When a certain frame is being played, if some preceding frame arrives at the receiver, then this frame is of no use and is discarded. Due to this reason, when a frame is delayed by more than say one second, it is simply discarded at the receiver-end without carrying out any processing on it. Satellite-based tracking of enemy movements: Consider a satellite that takes pictures of an enemy territory and beams it to a ground station computer frame by frame. The ground computer processes each frame to find the positional difference of different objects of interest with respect to their position in the previous frame to determine the movements of the enemy. When the ground computer is overloaded, a new image may be received even before an older image is taken up for processing. In this case, the older image is of not much use. Hence the older images may be discarded and the recently received image could be processed.
For firm real-time tasks, the associated time bounds typically range from a few milli seconds to several hundreds of milli seconds.
Utility 100% Deadline
0 Response Time Fig. 28.15 Utility of the Results Produced by a Soft Real-Time Task as a Function of Time Version 2 EE IIT, Kharagpur 24
1.7.3. Soft Real-Time Tasks

Soft real-time tasks also have time bounds associated with them. However, unlike hard and firm real-time tasks, the timing constraints on soft real-time tasks are not expressed as absolute values. Instead, the constraints are expressed either in terms of the average response times required. An example of a soft real-time task is web browsing. Normally, after an URL (Uniform Resource Locater) is clicked, the corresponding web page is fetched and displayed within a couple of seconds on the average. However, when it takes several minutes to display a requested page, we still do not consider the system to have failed, but merely express that the performance of the system has degraded. Another example of a soft real-time task is a task handling a request for a seat reservation in a railway reservation application. Once a request for reservation is made, the response should occur within 20 seconds on the average. The response may either be in the form of a printed ticket or an apology message on account of unavailability of seats. Alternatively, we might state the constraint on the ticketing task as: At least in case of 95% of reservation requests, the ticket should be processed and printed in less than 20 seconds. Let us now analyze the impact of the failure of a soft real-time task to meet its deadline, by taking the example of the railway reservation task. If the ticket is printed in about 20 seconds, we feel that the system is working fine and get a feel of having obtained instant results. As already stated, missed deadlines of soft real-time tasks do not result in system failures. However, the utility of the results produced by a soft real-time task falls continuously with time after the expiry of the deadline as shown in Fig. 28.15. In Fig. 28.15, the utility of the results produced are 100% if produced before the deadline, and after the deadline is passed the utility of the results slowly falls off with time. For soft real-time tasks that typically occur in practical applications, the time bounds usually range from a fraction of a second to a few seconds.
1.7.4. Non-Real-Time Tasks

A non-real-time task is not associated with any time bounds. Can you think of any example of a non-real-time task? Most of the interactive computations you perform nowadays are handled by soft real-time tasks. However, about two or three decades back, when computers were not interactive almost all tasks were non-real-time. A few examples of non-real-time tasks are: batch processing jobs, e-mail, and back ground tasks such as event loggers. You may however argue that even these tasks, in the strict sense of the term, do have certain time bounds. For example, an e-mail is expected to reach its destination at least within a couple of hours of being sent. Similar is the case with a batch processing job such as pay-slip printing. What then really is the difference between a non-real-time task and a soft real-time task? For non-real-time tasks, the associated time bounds are typically of the order of a few minutes, hours or even days. In contrast, the time bounds associated with soft real-time tasks are at most of the order of a few seconds.
1.8. Exercises
1. State whether you consider the following statements to be TRUE or FALSE. Justify your answer in each case. a. A hard real-time application is made up of only hard real-time tasks. b. Every safety-critical real-time system has a fail-safe state. c. A deadline constraint between two stimuli can be considered to be a behavioral constraint on the environment of the system. d. Hardware fault-tolerance techniques can easily be adapted to provide software faulttolerance. e. A good algorithm for scheduling hard real-time tasks must try to complete each task in the shortest time possible. f. All hard real-time systems are safety-critical in nature. g. Performance constraints on a real-time system ensure that the environment of the system is well-behaved. h. Soft real-time tasks are those which do not have any time bounds associated with them. i. Minimization of average task response times is the objective of any good hard realtime task-scheduling algorithm. j. It should be the goal of any good real-time operating system to complete every hard real-time task as ahead of its deadline as possible. What do you understand by the term real-time? How is the concept of real-time different from the traditional notion of time? Explain your answer using a suitable example. Using a block diagram show the important hardware components of a real-time system and their interactions. Explain the roles of the different components. In a real-time system, raw sensor signals need to be preprocessed before they can be used by a computer. Why is it necessary to preprocess the raw sensor signals before they can be used by a computer? Explain the different types of preprocessing that are normally carried out on sensor signals to make them suitable to be used directly by a computer. Identify the key differences between hard real-time, soft real-time, and firm realtime systems. Give at least one example of real-time tasks corresponding to these three categories. Identify the timing constraints in your tasks and justify why the tasks should be categorized into the categories you have indicated. Give an example of a soft real-time task and a non-real-time task. Explain the key difference between the characteristics of these two types of tasks. Draw a schematic model showing the important components of a typical hard real-time system. Explain the working of the input interface using a suitable schematic diagram. Explain using a suitable circuit diagram how analog to digital (ADC) conversion is achieved in an input interface. Explain the check pointing and rollback recovery scheme to provide fault-tolerant realtime computing. Explain the types of faults it can help tolerate and the faults it can not tolerate. Explain the situations in which this technique is useful. Answer the following questions concerning fault-tolerance of real-time systems. a. Explain why hardware fault-tolerance is easier to achieve compared to software faulttolerance. b. Explain the main techniques available to achieve hardware fault-tolerance. Version 2 EE IIT, Kharagpur 26
2.
3. 4.
5.
6. 7.
8.
9.
10. 11.
12.
13.
14.
15.
16.
17.
18.
What are the main techniques available to achieve software fault-tolerance? What are the shortcomings of these techniques? What do you understand by the fail-safe state of a system? Safety-critical real-time systems do not have a fail-safe state. What is the implication of this? Is it possible to have an extremely safe but unreliable system? If your answer is affirmative, then give an example of such a system. If you answer in the negative, then justify why it is not possible for such a system to exist. What is a safety-critical system? Give a few practical examples safety-critical hard realtime systems. Are all hard real-time systems safety-critical? If not, give at least one example of a hard real-time system that is not safety-critical. Explain with the help of a schematic diagram how the recovery block scheme can be used to achieve fault- tolerance of real-time tasks. What are the shortcomings of this scheme? Explain situations where it can be satisfactorily be used and situations where it can not be used. Identify and represent the timing constraints in the following air-defense system by means of an extended state machine diagram. Classify each constraint into either performance or behavioral constraint. Every incoming missile must be detected within 0.2 seconds of its entering the radar coverage area. The intercept missile should be engaged within 5 seconds of detection of the target missile. The intercept missile should be fired after 0.1 seconds of its engagement but no later than 1 sec. Represent a washing machine having the following specification by means of an extended state machine diagram. The wash-machine waits for the start switch to be pressed. After the user presses the start switch, the machine fills the wash tub with either hot or cold water depending upon the setting of the Hot Wash switch. The water filling continues until the high level is sensed. The machine starts the agitation motor and continues agitating the wash tub until either the preset timer expires or the user presses the stop switch. After the agitation stops, the machine waits for the user to press the start Drying switch. After the user presses the start Drying switch, the machine starts the hot air blower and continues blowing hot air into the drying chamber until either the user presses the stop switch or the preset timer expires. Represent the timing constraints in a collision avoidance task in an air surveillance system as an extended finite state machine (EFSM) diagram. The collision avoidance task consists of the following activities. a. The first subtask named radar signal processor processes the radar signal on a signal processor to generate the track record in terms of the targets location and velocity within 100 mSec of receipt of the signal. b. The track record is transmitted to the data processor within 1 mSec after the track record is determined. c. A subtask on the data processor correlates the received track record with the track records of other targets that come close to detect potential collision that might occur within the next 500 mSec. d. If a collision is anticipated, then the corrective action is determined within 10 mSec by another subtask running on the data processor. e. The corrective action is transmitted to the track correction task within 25 mSec. Consider the following (partial) specification of a real-time system: The velocity of a space-craft must be sampled by a computer on-board the space-craft at least once every second (the sampling event is denoted by S). After sampling the velocity, the current position is computed (denoted by event C) within 100msec. Concurrently, the Version 2 EE IIT, Kharagpur 27
c.
19.
expected position of the space-craft is retrieved from the database within 200msec (denoted by event R). Using these data, the deviation from the normal course of the spacecraft must be determined within 100 msec (denoted by event D) and corrective velocity adjustments must be carried out before a new velocity value is sampled in (the velocity adjustment event is denoted by A). Calculated positions must be transmitted to the earth station at least once every minute (position transmission event is denoted by the event T). Identify the different timing constraints in the system. Classify these into either performance or behavioral constraints. Construct an EFSM to model the system. Construct the EFSM model of a telephone system whose (partial) behavior is described below: After lifting the receiver handset, the dial tone should appear within 20 seconds. If a dial tone can not be given within 20 seconds, then an idle tone is produced. After the dial tone appears, the first digit should to be dialed within 10 seconds and the subsequent five digits within 5 seconds of each other. If the dialing of any of the digits is delayed, then an idle tone is produced. The idle tone continues until the receiver handset is replaced.
Module 6
Lesson 29
Real-Time Task Scheduling Part 1

At the end of this lesson, the student would be able to: Understand the basic terminologies associated with Real-Time task scheduling Classify the Real-Time tasks with respect to their recurrence Get an overview of the different types of schedulers Get an overview of the various ways of classifying scheduling algorithms Understand the logic of clock-driven scheduling Get an overview of table-driven schedulers Get an overview of cyclic schedulers Work out problems related to table-driven and cyclic schedulers Understand how a generalized task scheduler would be Compare table-driven and cyclic schedulers
1. Real-Time Task Scheduling

In the last Chapter we defined a real-time task as one that has some constraints associated with it. Out of the three broad classes of time constraints we discussed, deadline constraint on tasks is the most common. In all subsequent discussions we therefore implicitly assume only deadline constraints on real-time tasks, unless we mention otherwise. Real-time tasks get generated in response to some events that may either be external or internal to the system. For example, a task might get generated due to an internal event such as a clock interrupt occurring every few milliseconds to periodically poll the temperature of a chemical plant. Another task might get generated due to an external event such as the user pressing a switch. When a task gets generated, it is said to have arrived or got released. Every real-time system usually consists of a number of real-time tasks. The time bounds on different tasks may be different. We had already pointed out that the consequences of a task missing its time bounds may also vary from task to task. This is often expressed as the criticality of a task. In the last Chapter, we had pointed out that appropriate scheduling of tasks is the basic mechanism adopted by a real-time operating system to meet the time constraints of a task. Therefore, selection of an appropriate task scheduling algorithm is central to the proper functioning of a real-time system. In this Chapter we discuss some fundamental task scheduling techniques that are available. An understanding of these techniques would help us not only to satisfactorily design a real-time application, but also understand and appreciate the features of modern commercial real-time operating systems discussed in later chapters. This chapter is organized as follows. We first introduce some basic concepts and terminologies associated with task scheduling. Subsequently, we discuss two major classes of task schedulers: clock-driven and event-driven. Finally, we explain some important issues that must be considered while developing practical applications.
1.1. Basic Terminologies

In this section we introduce a few important concepts and terminologies which would be useful in understanding the rest of this Chapter. Task Instance: Each time an event occurs, it triggers the task that handles this event to run. In other words, a task is generated when some specific event occurs. Real-time tasks therefore normally recur a large number of times at different instants of time depending on the event occurrence times. It is possible that real-time tasks recur at random instants. However, most real-time tasks recur with certain fixed periods. For example, a temperature sensing task in a chemical plant might recur indefinitely with a certain period because the temperature is sampled periodically, whereas a task handling a device interrupt might recur at random instants. Each time a task recurs, it is called an instance of the task. The first time a task occurs, it is called the first instance of the task. The next occurrence of the task is called its second instance, and so on. The jth instance of a task Ti would be denoted as Ti(j). Each instance of a real-time task is associated with a deadline by which it needs to complete and produce results. We shall at times refer to task instances as processes and use these two terms interchangeably when no confusion arises.
Absolute deadline of Ti (1) =+d Relative deadline of Ti(1) =d 0 Arrival of Ti (1) Ti(1) +d Deadline of Ti (1) Ti(2) + pi
Fig. 29.1 Relative and Absolute Deadlines of a Task Relative Deadline versus Absolute Deadline: The absolute deadline of a task is the absolute time value (counted from time 0) by which the results from the task are expected. Thus, absolute deadline is equal to the interval of time between the time 0 and the actual instant at which the deadline occurs as measured by some physical clock. Whereas, relative deadline is the time interval between the start of the task and the instant at which deadline occurs. In other words, relative deadline is the time interval between the arrival of a task and the corresponding deadline. The difference between relative and absolute deadlines is illustrated in Fig. 29.1. It can be observed from Fig. 29.1 that the relative deadline of the task Ti(1) is d, whereas its absolute deadline is + d. Response Time: The response time of a task is the time it takes (as measured from the task arrival time) for the task to produce its results. As already remarked, task instances get generated
due to occurrence of events. These events may be internal to the system, such as clock interrupts, or external to the system such as a robot encountering an obstacle. The response time is the time duration from the occurrence of the event generating the task to the time the task produces its results. For hard real-time tasks, as long as all their deadlines are met, there is no special advantage of completing the tasks early. However, for soft real-time tasks, average response time of tasks is an important metric to measure the performance of a scheduler. A scheduler for soft realtime tasks should try to execute the tasks in an order that minimizes the average response time of tasks. Task Precedence: A task is said to precede another task, if the first task must complete before the second task can start. When a task Ti precedes another task Tj, then each instance of Ti precedes the corresponding instance of Tj. That is, if T1 precedes T2, then T1(1) precedes T2(1), T1(2) precedes T2(2), and so on. A precedence order defines a partial order among tasks. Recollect from a first course on discrete mathematics that a partial order relation is reflexive, antisymmetric, and transitive. An example partial ordering among tasks is shown in Fig. 29.2. Here T1 precedes T2, but we cannot relate T1 with either T3 or T4. We shall later use task precedence relation to develop appropriate task scheduling algorithms. T2 T1
T4
T3
Fig. 29.2 Precedence Relation among Tasks Data Sharing: Tasks often need to share their results among each other when one task needs to share the results produced by another task; clearly, the second task must precede the first task. In fact, precedence relation between two tasks sometimes implies data sharing between the two tasks (e.g. first task passing some results to the second task). However, this is not always true. A task may be required to precede another even when there is no data sharing. For example, in a chemical plant it may be required that the reaction chamber must be filled with water before chemicals are introduced. In this case, the task handling filling up the reaction chamber with water must complete, before the task handling introduction of the chemicals is activated. It is therefore not appropriate to represent data sharing using precedence relation. Further, data sharing may occur not only when one task precedes the other, but might occur among truly concurrent tasks, and overlapping tasks. In other words, data sharing among tasks does not necessarily impose any particular ordering among tasks. Therefore, data sharing relation among tasks needs to be represented using a different symbol. We shall represent data sharing among two tasks using a dashed arrow. In the example of data sharing among tasks represented in Fig. 29.2, T2 uses the results of T3, but T2 and T3 may execute concurrently. T2 may even Version 2 EE IIT, Kharagpur 5
start executing first, after sometimes it may receive some data from T3, and continue its execution, and so on.
1.2. Types of Real-Time Tasks

Based on the way real-time tasks recur over a period of time, it is possible to classify them into three main categories: periodic, sporadic, and aperiodic tasks. In the following, we discuss the important characteristics of these three major categories of real-time tasks. Periodic Task: A periodic task is one that repeats after a certain fixed time interval. The precise time instants at which periodic tasks recur are usually demarcated by clock interrupts. For this reason, periodic tasks are sometimes referred to as clock-driven tasks. The fixed time interval after which a task repeats is called the period of the task. If Ti is a periodic task, then the time from 0 till the occurrence of the first instance of Ti (i.e. Ti(1)) is denoted by i, and is called the phase of the task. The second instance (i.e. Ti(2)) occurs at i + pi. The third instance (i.e. Ti(3)) occurs at i + 2 pi and so on. Formally, a periodic task Ti can be represented by a 4 tuple (i, pi, ei, di) where pi is the period of task, ei is the worst case execution time of the task, and di is the relative deadline of the task. We shall use this notation extensively in future discussions.
ei = 2000 di
+ pi
+ 2*pi
Fig. 29.3 Track Correction Task (2000mSec; pi; ei; di) of a Rocket To illustrate the above notation to represent real-time periodic tasks, let us consider the track correction task typically found in a rocket control software. Assume the following characteristics of the track correction task. The track correction task starts 2000 milliseconds after the launch of the rocket, and recurs periodically every 50 milliseconds then on. Each instance of the task requires a processing time of 8 milliseconds and its relative deadline is 50 milliseconds. Recall that the phase of a task is defined by the occurrence time of the first instance of the task. Therefore, the phase of this task is 2000 milliseconds. This task can formally be represented as (2000 mSec, 50 mSec, 8 mSec, 50 mSec). This task is pictorially shown in Fig. 29.3. When the deadline of a task equals its period (i.e. pi=di), we can omit the fourth tuple. In this case, we can represent the task as Ti= (2000 mSec, 50 mSec, 8 mSec). This would automatically mean pi=di=50 mSec. Similarly, when i = 0, it can be omitted when no confusion arises. So, Ti = (20mSec; 100mSec) would indicate a task with i = 0, pi=100mSec, ei=20mSec, and di=100mSec. Whenever there is any scope for confusion, we shall explicitly write out the parameters Ti = (pi=50 mSecs, ei = 8 mSecs, di = 40 mSecs), etc. Version 2 EE IIT, Kharagpur 6
A vast majority of the tasks present in a typical real-time system are periodic. The reason for this is that many activities carried out by real-time systems are periodic in nature, for example monitoring certain conditions, polling information from sensors at regular intervals to carry out certain action at regular intervals (such as drive some actuators). We shall consider examples of such tasks found in a typical chemical plant. In a chemical plant several temperature monitors, pressure monitors, and chemical concentration monitors periodically sample the current temperature, pressure, and chemical concentration values which are then communicated to the plant controller. The instances of the temperature, pressure, and chemical concentration monitoring tasks normally get generated through the interrupts received from a periodic timer. These inputs are used to compute corrective actions required to maintain the chemical reaction at a certain rate. The corrective actions are then carried out through actuators. Sporadic Task: A sporadic task is one that recurs at random instants. A sporadic task Ti can be is represented by a three tuple: Ti = (ei, gi, di) where ei is the worst case execution time of an instance of the task, gi denotes the minimum separation between two consecutive instances of the task, di is the relative deadline. The minimum separation (gi) between two consecutive instances of the task implies that once an instance of a sporadic task occurs, the next instance cannot occur before gi time units have elapsed. That is, gi restricts the rate at which sporadic tasks can arise. As done for periodic tasks, we shall use the convention that the first instance of a sporadic task Ti is denoted by Ti(1) and the successive instances by Ti(2), Ti(3), etc. Many sporadic tasks such as emergency message arrivals are highly critical in nature. For example, in a robot a task that gets generated to handle an obstacle that suddenly appears is a sporadic task. In a factory, the task that handles fire conditions is a sporadic task. The time of occurrence of these tasks can not be predicted. The criticality of sporadic tasks varies from highly critical to moderately critical. For example, an I/O device interrupt, or a DMA interrupt is moderately critical. However, a task handling the reporting of fire conditions is highly critical. Aperiodic Task: An aperiodic task is in many ways similar to a sporadic task. An aperiodic task can arise at random instants. However, in case of an aperiodic task, the minimum separation gi between two consecutive instances can be 0. That is, two or more instances of an aperiodic task might occur at the same time instant. Also, the deadline for an aperiodic tasks is expressed as either an average value or is expressed statistically. Aperiodic tasks are generally soft real-time tasks. It is easy to realize why aperiodic tasks need to be soft real-time tasks. Aperiodic tasks can recur in quick succession. It therefore becomes very difficult to meet the deadlines of all instances of an aperiodic task. When several aperiodic tasks recur in a quick succession, there is a bunching of the task instances and it might lead to a few deadline misses. As already discussed, soft real-time tasks can tolerate a few deadline misses. An example of an aperiodic task is a logging task in a distributed system. The logging task can be started by different tasks running on different nodes. The logging requests from different tasks may arrive at the logger almost at the same time, or the requests may be spaced out in time. Other examples of aperiodic tasks include operator requests, keyboard presses, mouse movements, etc. In fact, all interactive commands issued by users are handled by aperiodic tasks.
1.3. Task Scheduling

Real-time task scheduling essentially refers to determining the order in which the various tasks are to be taken up for execution by the operating system. Every operating system relies on one or more task schedulers to prepare the schedule of execution of various tasks it needs to run. Each task scheduler is characterized by the scheduling algorithm it employs. A large number of algorithms for scheduling real-time tasks have so far been developed. Real-time task scheduling on uniprocessors is a mature discipline now with most of the important results having been worked out in the early 1970s. The research results available at present in the literature are very extensive and it would indeed be grueling to study them exhaustively. In this text, we therefore classify the available scheduling algorithms into a few broad classes and study the characteristics of a few important ones in each class.
1.3.1. A Few Basic Concepts

Before focusing on the different classes of schedulers more closely, let us first introduce a few important concepts and terminologies which would be used in our later discussions. Valid Schedule: A valid schedule for a set of tasks is one where at most one task is assigned to a processor at a time, no task is scheduled before its arrival time, and the precedence and resource constraints of all tasks are satisfied. Feasible Schedule: A valid schedule is called a feasible schedule, only if all tasks meet their respective time constraints in the schedule. Proficient Scheduler: A task scheduler sch1 is said to be more proficient than another scheduler sch2, if sch1 can feasibly schedule all task sets that sch2 can feasibly schedule, but not vice versa. That is, sch1 can feasibly schedule all task sets that sch2 can, but there exists at least one task set that sch2 can not feasibly schedule, whereas sch1 can. If sch1 can feasibly schedule all task sets that sch2 can feasibly schedule and vice versa, then sch1 and sch2 are called equally proficient schedulers. Optimal Scheduler: A real-time task scheduler is called optimal, if it can feasibly schedule any task set that can be feasibly scheduled by any other scheduler. In other words, it would not be possible to find a more proficient scheduling algorithm than an optimal scheduler. If an optimal scheduler can not schedule some task set, then no other scheduler should be able to produce a feasible schedule for that task set. Scheduling Points: The scheduling points of a scheduler are the points on time line at which the scheduler makes decisions regarding which task is to be run next. It is important to note that a task scheduler does not need to run continuously, it is activated by the operating system only at the scheduling points to make the scheduling decision as to which task to be run next. In a clock-driven scheduler, the scheduling points are defined at the time instants marked by interrupts generated by a periodic timer. The scheduling points in an event-driven scheduler are determined by occurrence of certain events.
Preemptive Scheduler: A preemptive scheduler is one which when a higher priority task arrives, suspends any lower priority task that may be executing and takes up the higher priority task for execution. Thus, in a preemptive scheduler, it can not be the case that a higher priority task is ready and waiting for execution, and the lower priority task is executing. A preempted lower priority task can resume its execution only when no higher priority task is ready. Utilization: The processor utilization (or simply utilization) of a task is the average time for which it executes per unit time interval. In notations: for a periodic task Ti, the utilization ui = ei/pi, where ei is the execution time and pi is the period of Ti. For a set of periodic tasks {Ti}: the n total utilization due to all tasks U = i=1 ei/pi. It is the objective of any good scheduling algorithm to feasibly schedule even those task sets that have very high utilization, i.e. utilization approaching 1. Of course, on a uniprocessor it is not possible to schedule task sets having utilization more than 1. Jitter: Jitter is the deviation of a periodic task from its strict periodic behavior. The arrival time jitter is the deviation of the task from arriving at the precise periodic time of arrival. It may be caused by imprecise clocks, or other factors such as network congestions. Similarly, completion time jitter is the deviation of the completion of a task from precise periodic points. The completion time jitter may be caused by the specific scheduling algorithm employed which takes up a task for scheduling as per convenience and the load at an instant, rather than scheduling at some strict time instants. Jitters are undesirable for some applications.
1.4. Classification of Real-Time Task Scheduling Algorithms

Several schemes of classification of real-time task scheduling algorithms exist. A popular scheme classifies the real-time task scheduling algorithms based on how the scheduling points are defined. The three main types of schedulers according to this classification scheme are: clock-driven, event-driven, and hybrid. The clock-driven schedulers are those in which the scheduling points are determined by the interrupts received from a clock. In the event-driven ones, the scheduling points are defined by certain events which precludes clock interrupts. The hybrid ones use both clock interrupts as well as event occurrences to define their scheduling points. A few important members of each of these three broad classes of scheduling algorithms are the following: 1. Clock Driven Table-driven Cyclic 2. Event Driven Simple priority-based Rate Monotonic Analysis (RMA) Earliest Deadline First (EDF) 3. Hybrid Round-robin
Important members of clock-driven schedulers that we discuss in this text are table-driven and cyclic schedulers. Clock-driven schedulers are simple and efficient. Therefore, these are frequently used in embedded applications. We investigate these two schedulers in some detail in Sec. 2.5. Important examples of event-driven schedulers are Earliest Deadline First (EDF) and Rate Monotonic Analysis (RMA). Event-driven schedulers are more sophisticated than clock-driven schedulers and usually are more proficient and flexible than clock-driven schedulers. These are more proficient because they can feasibly schedule some task sets which clock-driven schedulers cannot. These are more flexible because they can feasibly schedule sporadic and aperiodic tasks in addition to periodic tasks, whereas clock-driven schedulers can satisfactorily handle only periodic tasks. Event-driven scheduling of real-time tasks in a uniprocessor environment was a subject of intense research during early 1970s, leading to publication of a large number of research results. Out of the large number of research results that were published, the following two popular algorithms are the essence of all those results: Earliest Deadline First (EDF), and Rate Monotonic Analysis (RMA). If we understand these two schedulers well, we would get a good grip on real-time task scheduling on uniprocessors. Several variations to these two basic algorithms exist. Another classification of real-time task scheduling algorithms can be made based upon the type of task acceptance test that a scheduler carries out before it takes up a task for scheduling. The acceptance test is used to decide whether a newly arrived task would at all be taken up for scheduling or be rejected. Based on the task acceptance test used, there are two broad categories of task schedulers: Planning-based Best effort In planning-based schedulers, when a task arrives the scheduler first determines whether the task can meet its deadlines, if it is taken up for execution. If not, it is rejected. If the task can meet its deadline and does not cause other already scheduled tasks to miss their respective deadlines, then the task is accepted for scheduling. Otherwise, it is rejected. In best effort schedulers, no acceptance test is applied. All tasks that arrive are taken up for scheduling and best effort is made to meet its deadlines. But, no guarantee is given as to whether a tasks deadline would be met. A third type of classification of real-time tasks is based on the target platform on which the tasks are to be run. The different classes of scheduling algorithms according to this scheme are: Uniprocessor Multiprocessor Distributed Uniprocessor scheduling algorithms are possibly the simplest of the three classes of algorithms. In contrast to uniprocessor algorithms, in multiprocessor and distributed scheduling algorithms first a decision has to be made regarding which task needs to run on which processor and then these tasks are scheduled. In contrast to multiprocessors, the processors in a distributed system do not possess shared memory. Also in contrast to multiprocessors, there is no global upto-date state information available in distributed systems. This makes uniprocessor scheduling algorithms that assume central state information of all tasks and processors to exist unsuitable for use in distributed systems. Further in distributed systems, the communication among tasks is through message passing. Communication through message passing is costly. This means that a scheduling algorithm should not incur too much communication overhead. So carefully designed distributed algorithms are normally considered suitable for use in a distributed system. In the following sections, we study the different classes of schedulers in more detail. Version 2 EE IIT, Kharagpur 10
1.5. Clock-Driven Scheduling

Clock-driven schedulers make their scheduling decisions regarding which task to run next only at the clock interrupt points. Clock-driven schedulers are those for which the scheduling points are determined by timer interrupts. Clock- driven schedulers are also called off-line schedulers because these schedulers fix the schedule before the system starts to run. That is, the scheduler pre-determines which task will run when. Therefore, these schedulers incur very little run time overhead. However, a prominent shortcoming of this class of schedulers is that they can not satisfactorily handle aperiodic and sporadic tasks since the exact time of occurrence of these tasks can not be predicted. For this reason, this type of schedulers is also called static scheduler. In this section, we study the basic features of two important clock-driven schedulers: table-driven and cyclic schedulers.
1.5.1. Table-Driven Scheduling

Table-driven schedulers usually pre-compute which task would run when, and store this schedule in a table at the time the system is designed or configured. Rather than automatic computation of the schedule by the scheduler, the application programmer can be given the freedom to select his own schedule for the set of tasks in the application and store the schedule in a table (called schedule table) to be used by the scheduler at run time. An example of a schedule table is shown in Table 1. Table 1 shows that task T1 would be taken up for execution at time instant 0, T2 would start execution 3 milliseconds afterwards, and so on. An important question that needs to be addressed at this point is what would be the size of the schedule table that would be required for some given set of periodic real-time tasks to be run on a system? An answer to this question can be given as follows: if a set ST = {Ti} of n tasks is to be scheduled, then the entries in the table will replicate themselves after LCM (p1, p2, ,pn) time units, where p1, p2, , pn are the periods of T1, T2, ..., Tn. For example, if we have the following three tasks: (e1=5 msecs, p1=20 msecs), (e2=20 msecs, p2=100 msecs), (e3=30 msecs, p3=250 msecs); then, the schedule will repeat after every 1000 msecs. So, for any given task set, it is sufficient to store entries only for LCM (p1, p2, ,pn) duration in the schedule table. LCM (p1, p2, , pn) is called the major cycle of the set of tasks ST. A major cycle of a set of tasks is an interval of time on the time line such that in each major cycle, the different tasks recur identically. In the reasoning we presented above for the computation of the size of a schedule table, one assumption that we implicitly made is that i = 0. That is, all tasks are in phase.
Task T1 T2 T3 T4 T5 Start time in millisecs 0 3 10 12 17
Table 29.1 An Example of a Table-Driven Schedule Version 2 EE IIT, Kharagpur 11
However, tasks often do have non-zero phase. It would be interesting to determine what would be the major cycle when tasks have non-zero phase. The result of an investigation into this issue has been given as Theorem 2.1.
1.5.2. Theorem 1
The major cycle of a set of tasks ST = {T1, T2, , Tn} is LCM ({p1, p2, , pn}) even when the tasks have arbitrary phasing. Proof: As per our definition of a major cycle, even when tasks have non-zero phasing, task instances would repeat the same way in each major cycle. Let us consider an example in which the occurrences of a task Ti in a major cycle be as shown in Fig. 29.4. As shown in the example of Fig. 29.4, there are k-1 occurrences of the task Ti during a major cycle. The first occurrence of Ti starts time units from the start of the major cycle. The major cycle ends x time units after the last (i.e. (k-1)th) occurrence of the task Ti in the major cycle. Of course, this must be the same in each major cycle. +x=pi M Ti(1) Ti(2) time Fig. 29.4 Major Cycle When a Task Ti has Non-Zero Phasing Assume that the size of each major cycle is M. Then, from an inspection of Fig. 29.4, for the task to repeat identically in each major cycle: M = (k-1)pi + + x (2.1) Now, for the task Ti to have identical occurrence times in each major cycle, + x must equal to pi (see Fig. 29.4). Substituting this in Expr. 2.1, we get, M = (k-1) pi + pi = k pi (2.2) So, the major cycle M contains an integral multiple of pi. This argument holds for each task in the task set irrespective of its phase. Therefore M = LCM ({p1, p2, , pn}). Ti(k-1) x Ti(k) Ti(k+1) M Ti(2k-1) x
1.5.3. Cyclic Schedulers

Cyclic schedulers are very popular and are being extensively used in the industry. A large majority of all small embedded applications being manufactured presently are based on cyclic schedulers. Cyclic schedulers are simple, efficient, and are easy to program. An example application where a cyclic scheduler is normally used is a temperature controller. A Version 2 EE IIT, Kharagpur 12
temperature controller periodically samples the temperature of a room and maintains it at a preset value. Such temperature controllers are embedded in typical computer-controlled air conditioners. Major Cycle Minor Cycle f2 Major Cycle
f1
f3
f4
f4n
f4n+1
f4n+2
f4n+3
Fig. 29.5 Major and Minor Cycles in a Cyclic Scheduler A cyclic scheduler repeats a pre-computed schedule. The pre-computed schedule needs to be stored only for one major cycle. Each task in the task set to be scheduled repeats identically in every major cycle. The major cycle is divided into one or more minor cycles (see Fig. 29.5). Each minor cycle is also sometimes called a frame. In the example shown in Fig. 29.5, the major cycle has been divided into four minor cycles (frames). The scheduling points of a cyclic scheduler occur at frame boundaries. This means that a task can start executing only at the beginning of a frame. The frame boundaries are defined through the interrupts generated by a periodic timer. Each task is assigned to run in one or more frames. The assignment of tasks to frames is stored in a schedule table. An example schedule table is shown in Figure 29.6.
Task Number T3 T1 T3 T4 Frame Number f1 f2 f3 f4
Fig. 29.6 An Example Schedule Table for a Cyclic Scheduler The size of the frame to be used by the scheduler is an important design parameter and needs to be chosen very carefully. A selected frame size should satisfy the following three constraints. 1. Minimum Context Switching: This constraint is imposed to minimize the number of context switches occurring during task execution. The simplest interpretation of this constraint is that a task instance must complete running within its assigned frame. Unless a task completes within its allocated frame, the task might have to be suspended and restarted in a later frame. This would require a context switch involving some processing overhead. To avoid unnecessary context switches, the selected frame size should be larger than the execution time of each task, so that when a task starts at a frame boundary it should be able to complete within the same frame. Formally, we can state this constraint as: max({ei}) < F where ei is the execution times of the of task Ti, and F is the frame size. Note that this constraint imposes a lower-bound on frame size, i.e., frame size F must not be smaller than max({ei}). Version 2 EE IIT, Kharagpur 13
2. Minimization of Table Size: This constraint requires that the number of entries in the schedule table should be minimum, in order to minimize the storage requirement of the schedule table. Remember that cyclic schedulers are used in small embedded applications with a very small storage capacity. So, this constraint is important to the commercial success of a product. The number of entries to be stored in the schedule table can be minimized when the minor cycle squarely divides the major cycle. When the minor cycle squarely divides the major cycle, the major cycle contains an integral number of minor cycles (no fractional minor cycles). Unless the minor cycle squarely divides the major cycle, storing the schedule for one major cycle would not be sufficient, as the schedules in the major cycle would not repeat and this would make the size of the schedule table large. We can formulate this constraint as: M/F = M/F (2.3) In other words, if the floor of M/F equals M/F, then the major cycle would contain an integral number of frames. Task arrival t t d Deadline
kF
(k+1)F
(k+2)F
Fig. 29.7 Satisfaction of a Task Deadline 3. Satisfaction of Task Deadline: This third constraint on frame size is necessary to meet the task deadlines. This constraint imposes that between the arrival of a task and its deadline, there must exist at least one full frame. This constraint is necessary since a task should not miss its deadline, because by the time it could be taken up for scheduling, the deadline was imminent. Consider this: a task can only be taken up for scheduling at the start of a frame. If between the arrival and completion of a task, not even one frame exists, a situation as shown in Fig. 29.7 might arise. In this case, the task arrives sometimes after the kth frame has started. Obviously it can not be taken up for scheduling in the kth frame and can only be taken up in the k+1th frame. But, then it may be too late to meet its deadline since the execution time of a task can be up to the size of a full frame. This might result in the task missing its deadline since the task might complete only at the end of (k+1)th frame much after the deadline d has passed. We therefore need a full frame to exist between the arrival of a task and its deadline as shown in Fig. 29.8, so that task deadlines could be met.
t Task arrival t d
Deadline
kF
(k+1) F (k+2)F
Fig. 29.8 A Full Frame Exists Between the Arrival and Deadline of a Task More formally, this constraint can be formulated as follows: Suppose a task arises after t time units have passed since the last frame (see Fig. 29.8). Then, assuming that a single frame is sufficient to complete the task, the task can complete before its deadline iff (2F t) di, or 2F (di + t). (2.4) Remember that the value of t might vary from one instance of the task to another. The worst case scenario (where the task is likely to miss its deadline) occurs for the task instance having the minimum value of t, such that t > 0. This is the worst case scenario, since under this the task would have to wait the longest before its execution can start. It should be clear that if a task arrives just after a frame has started, then the task would have to wait for the full duration of the current frame before it can be taken up for execution. If a task at all misses its deadline, then certainly it would be under such situations. In other words, the worst case scenario for a task to meet its deadline occurs for its instance that has the minimum separation from the start of a frame. The determination of the minimum separation value (i.e. min(t)) for a task among all instances of the task would help in determining a feasible frame size. We show by Theorem 2.2 that min(t) is equal to gcd(F, pi). Consequently, this constraint can be written as: for every Ti, 2F gcd(F, pi) di (2.5) Note that this constraint defines an upper-bound on frame size for a task Ti, i.e., if the frame size is any larger than the defined upper-bound, then tasks might miss their deadlines. Expr. 2.5 defined the frame size, from the consideration of one task only. Now considering all tasks, the frame size must be smaller than max(gcd(F, pi)+di)/2.
1.5.4. Theorem 2
The minimum separation of the task arrival from the corresponding frame start time (min(t)), considering all instances of a task Ti, is equal to gcd(F, pi). Proof: Let g = gcd(F, pi), where gcd is the function determining the greatest common divisor of its arguments. It follows from the definition of gcd that g must squarely divide each of F and pi. Let Ti be a task with zero phasing. Now, assume that this Theorem is violated for certain integers m and n, such that the Ti(n) occurs in the mth frame and the difference between Version 2 EE IIT, Kharagpur 15
the start time of the mth frame and the nth task arrival time is less than g. That is, 0 < (m F n pi) < g. Dividing this expression throughout by g, we get: 0 < (m F/g n pi/g) < 1 (2.6) However, F/g and pi/g are both integers because g is gcd(F, pi,). Therefore, we can write F/g = I1 and pi/g = I2 for some integral values I1 and I2. Substituting this in Expr 2.6, we get 0 < mI1 nI2 < 1. Since mI1 and nI2 are both integers, their difference cannot be a fractional value lying between 0 and 1. Therefore, this expression can never be satisfied. It can therefore be concluded that the minimum time between a frame boundary and the arrival of the corresponding instance of Ti can not be less than gcd(F, pi). For a given task set it is possible that more than one frame size satisfies all the three constraints. In such cases, it is better to choose the shortest frame size. This is because of the fact that the schedulability of a task set increases as more frames become available over a major cycle. It should however be remembered that the mere fact that a suitable frame size can be determined does not mean that a feasible schedule would be found. It may so happen that there is not enough number of frames available in a major cycle to be assigned to all the task instances. We now illustrate how an appropriate frame size can be selected for cyclic schedulers through a few examples.
1.5.5. Examples
Example 1: A cyclic scheduler is to be used to run the following set of periodic tasks on a uniprocessor: T1 = (e1=1, p1=4), T2 = (e2=, p2=5), T3 = (e3=1, p3=20), T4 = (e4=2, p4=20). Select an appropriate frame size. Solution: For the given task set, an appropriate frame size is the one that satisfies all the three required constraints. In the following, we determine a suitable frame size F which satisfies all the three required constraints. Constraint 1: Let F be an appropriate frame size, then max {ei, F}. From this constraint, we get F 1.5. Constraint 2: The major cycle M for the given task set is given by M = LCM(4,5,20) = 20. M should be an integral multiple of the frame size F, i.e., M mod F = 0. This consideration implies that F can take on the values 2, 4, 5, 10, 20. Frame size of 1 has been ruled out since it would violate the constraint 1. Constraint 3: To satisfy this constraint, we need to check whether a selected frame size F satisfies the inequality: 2F gcd(F, pi) < di for each pi. Let us first try frame size 2. For F = 2 and task T1: 2 2 gcd(2, 4) 4 4 2 4 Therefore, for p1 the inequality is satisfied. Let us try for F = 2 and task T2: 2 2 gcd(2, 5) 5 4 1 5 Therefore, for p2 the inequality is satisfied. Let us try for F = 2 and task T3: Version 2 EE IIT, Kharagpur 16
2 2 gcd(2, 20) 20 4 2 20 Therefore, for p3 the inequality is satisfied. For F = 2 and task T4: 2 2 gcd(2, 20) 20 4 2 20 For p4 the inequality is satisfied. Thus, constraint 3 is satisfied by all tasks for frame size 2. So, frame size 2 satisfies all the three constraints. Hence, 2 is a feasible frame size. Let us try frame size 4. For F = 4 and task T1: 2 4 gcd(4, 4) 4 8 4 4 Therefore, for p1 the inequality is satisfied. Let us try for F = 4 and task T2: 2 4 gcd(4, 5) 5 8 1 5 For p2 the inequality is not satisfied. Therefore, we need not look any further. Clearly, F = 4 is not a suitable frame size. Let us now try frame size 5, to check if that is also feasible. For F = 5 and task T1, we have 2 5 gcd(5, 4) 4 10 1 4 The inequality is not satisfied for T1. We need not look any further. Clearly, F = 5 is not a suitable frame size. Let us now try frame size 10. For F = 10 and task T1, we have 2 10 gcd(10, 4) 4 20 2 4 The inequality is not satisfied for T1. We need not look any further. Clearly, F=10 is not a suitable frame size. Let us try if 20 is a feasible frame size. For F = 20 and task T1, we have 2 20 gcd(20, 4) 4 40 4 4 Therefore, F = 20 is also not suitable. So, only the frame size 2 is suitable for scheduling. Even though for Example 1 we could successfully find a suitable frame size that satisfies all the three constraints, it is quite probable that a suitable frame size may not exist for many problems. In such cases, to find a feasible frame size we might have to split the task (or a few tasks) that is (are) causing violation of the constraints into smaller sub-tasks that can be scheduled in different frames. Example 2: Consider the following set of periodic real-time tasks to be scheduled by a cyclic scheduler: T1 = (e1=1, p1=4), T2 = (e2=2, p2=5), T3 = (e3=5, p3=20). Determine a suitable frame size for the task set. Solution: Using the first constraint, we have F 5. Using the second constraint, we have the major cycle M = LCM(4, 5, 20) = 20. So, the permissible values of F are 5, 10 and 20. Checking for a frame size that satisfies the third constraint, we can find that no value of F is suitable. To overcome this problem, we need to split the task that is making the task-set not Version 2 EE IIT, Kharagpur 17
schedulable. It is easy to observe that the task T3 has the largest execution time, and consequently due to constraint 1, makes the feasible frame sizes quite large. We try splitting T3 into two or three tasks. After splitting T3 into three tasks, we have: T3.1 = (20, 1, 20), T3.2 = (20, 2, 20), T3.3 = (20, 2, 20). The possible values of F now are 2 and 4. We can check that now after splitting the tasks, F=2 and F=4 become feasible frame sizes. It is very difficult to come up with a clear set of guidelines to identify the exact task that is to be split, and the parts into which it needs to be split. Therefore, this needs to be done by trial and error. Further, as the number of tasks to be scheduled increases, this method of trial and error becomes impractical since each task needs to be checked separately. However, when the task set consists of only a few tasks we can easily apply this technique to find a feasible frame size for a set of tasks otherwise not schedulable by a cyclic scheduler.
1.5.6. A Generalized Task Scheduler

We have already stated that cyclic schedulers are overwhelmingly popular in low-cost realtime applications. However, our discussion on cyclic schedulers was so far restricted to scheduling periodic real-time tasks. On the other hand, many practical applications typically consist of a mixture of several periodic, aperiodic, and sporadic tasks. In this section, we discuss how aperiodic and sporadic tasks can be accommodated by cyclic schedulers. Recall that the arrival times of aperiodic and sporadic tasks are expressed statistically. Therefore, there is no way to assign aperiodic and sporadic tasks to frames without significantly lowering the overall achievable utilization of the system. In a generalized scheduler, initially a schedule (assignment of tasks to frames) for only periodic tasks is prepared. The sporadic and aperiodic tasks are scheduled in the slack times that may be available in the frames. Slack time in a frame is the time left in the frame after a periodic task allocated to the frame completes its execution. Non-zero slack time in a frame can exist only when the execution time of the task allocated to it is smaller than the frame size. A sporadic task is taken up for scheduling only if enough slack time is available for the arriving sporadic task to complete before its deadline. Therefore, a sporadic task on its arrival is subjected to an acceptance test. The acceptance test checks whether the task is likely to be completed within its deadline when executed in the available slack times. If it is not possible to meet the tasks deadline, then the scheduler rejects it and the corresponding recovery routines for the task are run. Since aperiodic tasks do not have strict deadlines, they can be taken up for scheduling without any acceptance test and best effort can be made to schedule them in the slack times available. Though for aperiodic tasks no acceptance test is done, but no guarantee is given for a tasks completion time and best effort is made to complete the task as early as possible. An efficient implementation of this scheme is that the slack times are stored in a table and during acceptance test this table is used to check the schedulability of the arriving tasks. Another popular alternative is that the aperiodic and sporadic tasks are accepted without any acceptance test, and best effort is made to meet their respective deadlines. Pseudo-code for a Generalized Scheduler: The following is the pseudo-code for a generalized cyclic scheduler we discussed, which schedules periodic, aperiodic and sporadic tasks. It is assumed that pre-computed schedule for periodic tasks is stored in a schedule table,
and if required the sporadic tasks have already been subjected to an acceptance test and only those which have passed the test are available for scheduling.
cyclic-scheduler() { current-task T = Schedule-Table[k]; k = k + 1; k = k mod N; //N is the total number of tasks in the schedule table dispatch-current-task(T); schedule-sporadic-tasks(); //Current task T completed early, // sporadic tasks can be taken up schedule-aperiodic-tasks(); //At the end of the frame, the running task // is pre-empted if not complete idle(); //No task to run, idle }
The cyclic scheduler routine cyclic-scheduler () is activated at the end of every frame by a periodic timer. If the current task is not complete by the end of the frame, then it is suspended and the task to be run in the next frame is dispatched by invoking the routine cyclic-scheduler(). If the task scheduled in a frame completes early, then any existing sporadic or aperiodic task is taken up for execution.
1.5.7. Comparison of Cyclic with Table-Driven Scheduling

Both table-driven and cyclic schedulers are important clock-driven schedulers. A scheduler needs to set a periodic timer only once at the application initialization time. This timer continues to give an interrupt exactly at every frame boundary. But in table-driven scheduling, a timer has to be set every time a task starts to run. The execution time of a typical real-time task is usually of the order of a few milliseconds. Therefore, a call to a timer is made every few mill Seconds. This represents a significant overhead and results in degraded system performance. Therefore, a cyclic scheduler is more efficient than a table-driven scheduler. This probably is a reason why cyclic schedulers are so overwhelmingly popular especially in embedded applications. However, if the overhead of setting a timer can be ignored, a table-driven scheduler is more proficient than a cyclic scheduler because the size of the frame that needs to be chosen should be at least as long as the size of the largest execution time of a task in the task set. This is a source of inefficiency, since this results in processor time being wasted in case of those tasks whose execution times are smaller than the chosen frame size.
1.6. Exercises
1. State whether the following assertions are True or False. Write one or two sentences to justify your choice in each case. a. Average response time is an important performance metric for real-time operating systems handling running of hard real-time tasks. b. Unlike table-driven schedulers, cyclic schedulers do not require to store a precomputed schedule. Version 2 EE IIT, Kharagpur 19
2.
3. 4.
5.
c. The minimum period for which a table-driven scheduler scheduling n periodic tasks needs to pre-store the schedule is given by max{p1, p2, , pn}, where pi is the period of the task Ti. d. A cyclic scheduler is more proficient than a pure table-driven scheduler for scheduling a set of hard real-time tasks. e. A suitable figure of merit to compare the performance of different hard real-time task scheduling algorithms can be the average task response times resulting from each algorithm. f. Cyclic schedulers are more proficient than table-driven schedulers. g. While using a cyclic scheduler to schedule a set of real-time tasks on a uniprocessor, when a suitable frame size satisfying all the three required constraints has been found, it is guaranteed that the task set would be feasibly scheduled by the cyclic scheduler. h. When more than one frame satisfies all the constraints on frame size while scheduling a set of hard real-time periodic tasks using a cyclic scheduler, the largest of these frame sizes should be chosen. i. In table-driven scheduling of three periodic tasks T1,T2,T3, the scheduling table must have schedules for all tasks drawn up to the time interval [0,max(p1,p2,p3)], where pi is the period of the task Ti. j. When a set of hard real-time periodic tasks are being scheduled using a cyclic scheduler, if a certain frame size is found to be not suitable, then any frame size smaller than this would not also be suitable for scheduling the tasks. k. When a set of hard real-time periodic tasks are being scheduled using a cyclic scheduler, if a candidate frame size exceeds the execution time of every task and squarely divides the major cycle, then it would be a suitable frame size to schedule the given set of tasks. l. Finding an optimal schedule for a set of independent periodic hard real-time tasks without any resource- sharing constraints under static priority conditions is an NPcomplete problem. Real-time tasks are normally classified into periodic, aperiodic, and sporadic real-time task. a. What are the basic criteria based on which a real-time task can be determined to belong to one of the three categories? b. Identify some characteristics that are unique to each of the three categories of tasks. c. Give examples of tasks in practical systems which belong to each of the three categories. What do you understand by an optimal scheduling algorithm? Is it true that the time complexity of an optimal scheduling algorithm for scheduling a set of real-time tasks in a uniprocessor is prohibitively expensive to be of any practical use? Explain your answer. Suppose a set of three periodic tasks is to be scheduled using a cyclic scheduler on a uniprocessor. Assume that the CPU utilization due to the three tasks is less than 1. Also, assume that for each of the three tasks, the deadlines equals the respective periods. Suppose that we are able to find an appropriate frame size (without having to split any of the tasks) that satisfies the three constraints of minimization of context switches, minimization of schedule table size, and satisfaction of deadlines. Does this imply that it is possible to assert that we can feasibly schedule the three tasks using the cyclic scheduler? If you answer affirmatively, then prove your answer. If you answer negatively, then show an example involving three tasks that disproves the assertion. Consider a real-time system which consists of three tasks T1, T2, and T3, which have been characterized in the following table. Version 2 EE IIT, Kharagpur 20
Task T1 T2 T3
Phase mSec 20 40 70
Execution Time mSec 10 10 20
Relative Deadline mSec 20 50 80
Period mSec 20 50 80
6.
If the tasks are to be scheduled using a table-driven scheduler, what is the length of time for which the schedules have to be stored in the pre-computed schedule table of the scheduler. A cyclic real-time scheduler is to be used to schedule three periodic tasks T1, T2, and T3 with the following characteristics: Task T1 T2 T3 Phase mSec 0 0 0 Execution Time mSec 20 20 30 Relative Deadline mSec 100 80 150 Period mSec 100 80 150
7.
Suggest a suitable frame size that can be used. Show all intermediate steps in your calculations. Consider the following set of three independent real-time periodic tasks. Task T1 T2 T3 Start Time mSec 20 40 60 Processing Time mSec 25 10 50 Period mSec 150 50 200 Deadline mSec 100 30 150
Suppose a cyclic scheduler is to be used to schedule the task set. What is the major cycle of the task set? Suggest a suitable frame size and provide a feasible schedule (task to frame assignment for a major cycle) for the task set.
Module 6
Lesson 30
Real-Time Task Scheduling Part 2

At the end of this lesson, the student would be able to: Get an introduction to event-driven schedulers Understand the basics of Foreground-Background schedulers Get an overview of Earliest Deadline First (EDF) Algorithm Work out solutions to problems based on EDF Know the shortcomings of EDF Get an overview of Rate Monotonic Algorithm (RMA) Know the necessary and sufficient conditions for a set of real-time tasks to be RMAschedulable Work out solutions to problems based on EDF Infer the maximum achievable CPU utilization Understand the Advantages and Disadvantages of RMA Get an overview of Deadline Monotonic Algorithm (DMA) Understand the phenomenon of Context-Switching and Self-Suspension
1. Event-driven Scheduling An Introduction

In this lesson, we shall discuss the various algorithms for event-driven scheduling. From the previous lesson, we may recollect the following points: The clock-driven schedulers are those in which the scheduling points are determined by the interrupts received from a clock. In the event-driven ones, the scheduling points are defined by certain events which precludes clock interrupts. The hybrid ones use both clock interrupts as well as event occurrences to define their scheduling points Cyclic schedulers are very efficient. However, a prominent shortcoming of the cyclic schedulers is that it becomes very complex to determine a suitable frame size as well as a feasible schedule when the number of tasks increases. Further, in almost every frame some processing time is wasted (as the frame size is larger than all task execution times) resulting in sub-optimal schedules. Event-driven schedulers overcome these shortcomings. Further, eventdriven schedulers can handle aperiodic and sporadic tasks more proficiently. On the flip side, event-driven schedulers are less efficient as they deploy more complex scheduling algorithms. Therefore, event-driven schedulers are less suitable for embedded applications as these are required to be of small size, low cost, and consume minimal amount of power. It should now be clear why event-driven schedulers are invariably used in all moderate and large-sized applications having many tasks, whereas cyclic schedulers are predominantly used in small applications. In event-driven scheduling, the scheduling points are defined by task completion and task arrival events. This class of schedulers is normally preemptive, i.e., when a higher priority task becomes ready, it preempts any lower priority task that may be running.
1.1. Types of Event Driven Schedulers

We discuss three important types of event-driven schedulers: Simple priority-based Rate Monotonic Analysis (RMA) Earliest Deadline First (EDF) The simplest of these is the foreground-background scheduler, which we discuss next. In section 3.4, we discuss EDF and in section 3.5, we discuss RMA.
1.2. Foreground-Background Scheduler

A foreground-background scheduler is possibly the simplest priority-driven preemptive scheduler. In foreground-background scheduling, the real-time tasks in an application are run as foreground tasks. The sporadic, aperiodic, and non-real-time tasks are run as background tasks. Among the foreground tasks, at every scheduling point the highest priority task is taken up for scheduling. A background task can run when none of the foreground tasks is ready. In other words, the background tasks run at the lowest priority. Let us assume that in a certain real-time system, there are n foreground tasks which are denoted as: T1,T2,...,Tn. As already mentioned, the foreground tasks are all periodic. Let TB be the only background task. Let eB be the processing time requirement of TB. In this case, the completion time (ctB) for the background task is given by: n ctB = eB / (1i=1 ei / pi) (3.1/2.7) This expression is easy to interpret. When any foreground task is executing, the background task waits. The average CPU utilization due to the foreground task Ti is ei/pi, since ei amount of processing time is required over every pi period. It follows that all foreground tasks together n would result in CPU utilization of i=1 ei / pi. Therefore, the average time available for execution
B B
of the background tasks in every unit of time is 1i=1 ei / pi. Hence, Expr. 2.7 follows easily. We now illustrate the applicability of Expr. 2.7 through the following three simple examples.
1.3. Examples
Example 1: Consider a real-time system in which tasks are scheduled using foregroundbackground scheduling. There is only one periodic foreground task Tf : (f =0, pf =50 msec, ef =100 msec, df =100 msec) and the background task be TB = (eB =1000 msec). Compute the completion time for background task.
B B
Solution: By using the expression (2.7) to compute the task completion time, we have ctB = 1000 / (150/100) = 2000 msec So, the background task TB would take 2000 milliseconds to complete.
B B
Example 2: In a simple priority-driven preemptive scheduler, two periodic tasks T1 and T2 and a background task are scheduled. The periodic task T1 has the highest priority and executes once every 20 milliseconds and requires 10 milliseconds of execution time each time. T2 requires 20 milliseconds of processing every 50 milliseconds. T3 is a background task and requires 100 milliseconds to complete. Assuming that all the tasks start at time 0, determine the time at which T3 will complete. Version 2 EE IIT, Kharagpur 4
Solution: The total utilization due to the foreground tasks: i=1 ei / pi = 10/20 + 20/50 = 90/100. This implies that the fraction of time remaining for the background task to execute is given by: 2 1i=1 ei / pi = 10/100. Therefore, the background task gets 1 millisecond every 10 milliseconds. Thus, the background task would take 10(100/1) = 1000 milliseconds to complete. Example 3: Suppose in Example 1, an overhead of 1 msec on account of every context switch is to be taken into account. Compute the completion time of TB.
B
Context Switching Time Back Foreground ground Foreground Back Foreground ground 01 51 52 100 Fig. 30.1 Task Schedule for Example 3 Solution: The very first time the foreground task runs (at time 0), it incurs a context switching overhead of 1 msec. This has been shown as a shaded rectangle in Fig. 30.1. Subsequently each time the foreground task runs, it preempts the background task and incurs one context switch. On completion of each instance of the foreground task, the background task runs and incurs another context switch. With this observation, to simplify our computation of the actual completion time of TB, we can imagine that the execution time of every foreground task is increased by two context switch times (one due to itself and the other due to the background task running after each time it completes). Thus, the net effect of context switches can be imagined to be causing the execution time of the foreground task to increase by 2 context switch times, i.e. to 52 milliseconds from 50 milliseconds. This has pictorially been shown in Fig. 30.1. Now, using Expr. 2.7, we get the time required by the background task to complete: 1000/(152/100) = 2083.4 milliseconds In the following two sections, we examine two important event-driven schedulers: EDF (Earliest Deadline First) and RMA (Rate Monotonic Algorithm). EDF is the optimal dynamic priority real-time task scheduling algorithm and RMA is the optimal static priority real-time task scheduling algorithm.
B
Time in milli secs
1.4. Earliest Deadline First (EDF) Scheduling

In Earliest Deadline First (EDF) scheduling, at every scheduling point the task having the shortest deadline is taken up for scheduling. This basic principles of this algorithm is very intuitive and simple to understand. The schedulability test for EDF is also simple. A task set is schedulable under EDF, if and only if it satisfies the condition that the total processor utilization due to the task set is less than 1. For a set of periodic real-time tasks {T1, T2, , Tn}, EDF schedulability criterion can be expressed as: n n ei / pi = i=1 ui 1 i=1 (3.2/2.8) Version 2 EE IIT, Kharagpur 5
where ui is average utilization due to the task Ti and n is the total number of tasks in the task set. Expr. 3.2 is both a necessary and a sufficient condition for a set of tasks to be EDF schedulable. EDF has been proven to be an optimal uniprocessor scheduling algorithm. This means that, if a set of tasks is not schedulable under EDF, then no other scheduling algorithm can feasibly schedule this task set. In the simple schedulability test for EDF (Expr. 3.2), we assumed that the period of each task is the same as its deadline. However, in practical problems the period of a task may at times be different from its deadline. In such cases, the schedulability test needs to be changed. If pi > di, then each task needs ei amount of computing time every min(pi, di) duration of time. Therefore, we can rewrite Expr. 3.2 as: n ei / min(pi, di) 1 (3.3/2.9) i=1 However, if pi < di, it is possible that a set of tasks is EDF schedulable, even when the task set fails to meet the Expr 3.3. Therefore, Expr 3.3 is conservative when pi < di, and is not a necessary condition, but only a sufficient condition for a given task set to be EDF schedulable. Example 4: Consider the following three periodic real-time tasks to be scheduled using EDF on a uniprocessor: T1 = (e1=10, p1=20), T2 = (e2=5, p2=50), T3 = (e3=10, p3=35). Determine whether the task set is schedulable. Solution: The total utilization due to the three tasks is given by: 3 ei / pi = 10/20 + 5/50 + 10/35 = 0.89 i=1 This is less than 1. Therefore, the task set is EDF schedulable. Though EDF is a simple as well as an optimal algorithm, it has a few shortcomings which render it almost unusable in practical applications. The main problems with EDF are discussed in Sec. 3.4.3. Next, we discuss the concept of task priority in EDF and then discuss how EDF can be practically implemented.
1.4.1. Is EDF Really a Dynamic Priority Scheduling Algorithm?

We stated in Sec 3.3 that EDF is a dynamic priority scheduling algorithm. Was it after all correct on our part to assert that EDF is a dynamic priority task scheduling algorithm? If EDF were to be considered a dynamic priority algorithm, we should be able determine the precise priority value of a task at any point of time and also be able to show how it changes with time. If we reflect on our discussions of EDF in this section, EDF scheduling does not require any priority value to be computed for any task at any time. In fact, EDF has no notion of a priority value for a task. Tasks are scheduled solely based on the proximity of their deadline. However, the longer a task waits in a ready queue, the higher is the chance (probability) of being taken up for scheduling. So, we can imagine that a virtual priority value associated with a task keeps increasing with time until the task is taken up for scheduling. However, it is important to understand that in EDF the tasks neither have any priority value associated with them, nor does the scheduler perform any priority computations to determine the schedulability of a task at either run time or compile time.
1.4.2. Implementation of EDF

A naive implementation of EDF would be to maintain all tasks that are ready for execution in a queue. Any freshly arriving task would be inserted at the end of the queue. Every node in the Version 2 EE IIT, Kharagpur 6
queue would contain the absolute deadline of the task. At every preemption point, the entire queue would be scanned from the beginning to determine the task having the shortest deadline. However, this implementation would be very inefficient. Let us analyze the complexity of this scheme. Each task insertion will be achieved in O(1) or constant time, but task selection (to run next) and its deletion would require O(n) time, where n is the number of tasks in the queue. A more efficient implementation of EDF would be as follows. EDF can be implemented by maintaining all ready tasks in a sorted priority queue. A sorted priority queue can efficiently be implemented by using a heap data structure. In the priority queue, the tasks are always kept sorted according to the proximity of their deadline. When a task arrives, a record for it can be inserted into the heap in O(log2 n) time where n is the total number of tasks in the priority queue. At every scheduling point, the next task to be run can be found at the top of the heap. When a task is taken up for scheduling, it needs to be removed from the priority queue. This can be achieved in O(1) time. A still more efficient implementation of the EDF can be achieved as follows under the assumption that the number of distinct deadlines that tasks in an application can have are restricted. In this approach, whenever task arrives, its absolute deadline is computed from its release time and its relative deadline. A separate FIFO queue is maintained for each distinct relative deadline that tasks can have. The scheduler inserts a newly arrived task at the end of the corresponding relative deadline queue. Clearly, tasks in each queue are ordered according to their absolute deadlines. To find a task with the earliest absolute deadline, the scheduler only needs to search among the threads of all FIFO queues. If the number of priority queues maintained by the scheduler is Q, then the order of searching would be O(1). The time to insert a task would also be O(1).
1.4.3. Shortcomings of EDF

In this subsection, we highlight some of the important shortcomings of EDF when used for scheduling real-time tasks in practical applications. Transient Overload Problem: Transient overload denotes the overload of a system for a very short time. Transient overload occurs when some task takes more time to complete than what was originally planned during the design time. A task may take longer to complete due to many reasons. For example, it might enter an infinite loop or encounter an unusual condition and enter a rarely used branch due to some abnormal input values. When EDF is used to schedule a set of periodic real-time tasks, a task overshooting its completion time can cause some other task(s) to miss their deadlines. It is usually very difficult to predict during program design which task might miss its deadline when a transient overload occurs in the system due to a low priority task overshooting its deadline. The only prediction that can be made is that the task (tasks) that would run immediately after the task causing the transient overload would get delayed and might miss its (their) respective deadline(s). However, at different times a task might be followed by different tasks in execution. However, this lead does not help us to find which task might miss its deadline. Even the most critical task might miss its deadline due to a very low priority task overshooting its planned completion time. So, it should be clear that under EDF any amount of careful design will not guarantee that the most critical task would not miss its deadline under transient overload. This is a serious drawback of the EDF scheduling algorithm.
Resource Sharing Problem: When EDF is used to schedule a set of real-time tasks, unacceptably high overheads might have to be incurred to support resource sharing among the tasks without making tasks to miss their respective deadlines. We examine this issue in some detail in the next lesson. Efficient Implementation Problem: The efficient implementation that we discussed in Sec. 3.4.2 is often not practicable as it is difficult to restrict the number of tasks with distinct deadlines to a reasonable number. The efficient implementation that achieves O(1) overhead assumes that the number of relative deadlines is restricted. This may be unacceptable in some situations. For a more flexible EDF algorithm, we need to keep the tasks ordered in terms of their deadlines using a priority queue. Whenever a task arrives, it is inserted into the priority queue. The complexity of insertion of an element into a priority queue is of the order log2 n, where n is the number of tasks to be scheduled. This represents a high runtime overhead, since most real-time tasks are periodic with small periods and strict deadlines.
1.5. Rate Monotonic Algorithm(RMA)

We had already pointed out that RMA is an important event-driven scheduling algorithm. This is a static priority algorithm and is extensively used in practical applications. RMA assigns priorities to tasks based on their rates of occurrence. The lower the occurrence rate of a task, the lower is the priority assigned to it. A task having the highest occurrence rate (lowest period) is accorded the highest priority. RMA has been proved to be the optimal static priority real-time task scheduling algorithm. In RMA, the priority of a task is directly proportional to its rate (or, inversely proportional to its period). That is, the priority of any task Ti is computed as: priority = k / pi, where pi is the period of the task Ti and k is a constant. Using this simple expression, plots of priority values of tasks under RMA for tasks of different periods can be easily obtained. These plots have been shown in Fig. 30.10(a) and Fig. 30.10(b). It can be observed from these figures that the priority of a task increases linearly with the arrival rate of the task and inversely with its period.
Priority
Priority
Rate (a)
Period (b)
Fig. 30.2 Priority Assignment to Tasks in RMA
1.5.1. Schedulability Test for RMA

An important problem that is addressed during the design of a uniprocessor-based real-time system is to check whether a set of periodic real-time tasks can feasibly be scheduled under RMA. Schedulability of a task set under RMA can be determined from a knowledge of the Version 2 EE IIT, Kharagpur 8
worst-case execution times and periods of the tasks. A pertinent question at this point is how can a system developer determine the worst-case execution time of a task even before the system is developed. The worst-case execution times are usually determined experimentally or through simulation studies. The following are some important criteria that can be used to check the schedulability of a set of tasks set under RMA.
1.5.1.1 Necessary Condition

A set of periodic real-time tasks would not be RMA schedulable unless they satisfy the following necessary condition: n n ei / pi = i=1 ui 1 i=1 where ei is the worst case execution time and pi is the period of the task Ti, n is the number of tasks to be scheduled, and ui is the CPU utilization due to the task Ti. This test simply expresses the fact that the total CPU utilization due to all the tasks in the task set should be less than 1.
1.5.1.2 Sufficient Condition

The derivation of the sufficiency condition for RMA schedulability is an important result and was obtained by Liu and Layland in 1973. A formal derivation of the Liu and Laylands results from first principles is beyond the scope of this discussion. We would subsequently refer to the sufficiency as the Liu and Laylands condition. A set of n real-time periodic tasks are schedulable under RMA, if n ui n(21/n 1) (3.4/2.10) i=1 where ui is the utilization due to task Ti. Let us now examine the implications of this result. If a set of tasks satisfies the sufficient condition, then it is guaranteed that the set of tasks would be RMA schedulable. Consider the case where there is only one task in the system, i.e. n = 1. Substituting n = 1 in Expr. 3.4, we get, 1 1 ui 1(21/1 1) or i=1 ui 1 i=1 Similarly for n = 2, we get, 2 2 ui 2(21/2 1) or i=1 ui 0.828 i=1 For n = 3, we get, 3 3 ui 3(21/3 1) or i=1 ui 0.78 i=1 For n , we get, ui 3(21/ 1) or i=1 ui .0 i=1
1 ui 0.692 (1,0)
Number of tasks
Fig. 30.3 Achievable Utilization with the Number of Tasks under RMA Evaluation of Expr. 3.4 when n involves an indeterminate expression of the type .0. By applying LHospitals rule, we can verify that the right hand side of the expression evaluates to loge2 = 0.692. From the above computations, it is clear that the maximum CPU utilization that can be achieved under RMA is 1. This is achieved when there is only a single task in the system. As the number of tasks increases, the achievable CPU utilization falls and as n , the achievable utilization stabilizes at loge2, which is approximately 0.692. This is pictorially shown in Fig. 30.3. We now illustrate the applicability of the RMA schedulability criteria through a few examples.
1.5.2. Examples
Example 5: Check whether the following set of periodic real-time tasks is schedulable under RMA on a uniprocessor: T1 = (e1=20, p1=100), T2 = (e2=30, p2=150), T3 = (e3=60, p3=200). Solution: Let us first compute the total CPU utilization achieved due to the three given tasks. 3 ui = 20/100 + 30/150 + 60/200 = 0.7 i=1 This is less than 1; therefore the necessary condition for schedulability of the tasks is satisfied. Now checking for the sufficiency condition, the task set is schedulable under RMA if Liu and Laylands condition given by Expr. 3.4 is satisfied Checking for satisfaction of Expr. 3.4, the maximum achievable utilization is given by: 1/3 3(2 1) = 0.78 The total utilization has already been found to be 0.7. Now substituting these in Liu and Laylands criterion: 3 1/3 ui 3(2 1) i=1 Therefore, we get 0.7 < 0.78. Expr. 3.4, a sufficient condition for RMA schedulability, is satisfied. Therefore, the task set is RMA-schedulable Example 6: Check whether the following set of three periodic real-time tasks is schedulable under RMA on a uniprocessor: T1 = (e1=20, p1=100), T2 = (e2=30, p2=150), T3 = (e3=90, p3=200). Solution: Let us first compute the total CPU utilization due to the given task set: 3 ui = 20/100 + 30/150 + 90/200 = 0.7 i=1 Version 2 EE IIT, Kharagpur 10
Now checking for Liu and Layland criterion: 3 ui 0.78 i=1 Since 0.85 is not 0.78, the task set is not RMA-schedulable. Liu and Layland test (Expr. 2.10) is pessimistic in the following sense. If a task set passes the Liu and Layland test, then it is guaranteed to be RMA schedulable. On the other hand, even if a task set fails the Liu and Layland test, it may still be RMA schedulable. It follows from this that even when a task set fails Liu and Laylands test, we should not conclude that it is not schedulable under RMA. We need to test further to check if the task set is RMA schedulable. A test that can be performed to check whether a task set is RMA schedulable when it fails the Liu and Layland test is the Lehoczkys test. Lehoczkys test has been expressed as Theorem 3.
1.5.3. Theorem 3
A set of periodic real-time tasks is RMA schedulable under any task phasing, iff all the tasks meet their respective first deadlines under zero phasing.
T1 10
T2 30
T1 40
T2 60
T1 70
T2 90 time in msec
(a) T1 is in phase with T2
T2 20
T1 30
T2 50
T1 60
T2 80 time in msec
(b) T1 has a 20 msec phase with respect to T2 Fig. 30.4 Worst Case Response Time for a Task Occurs When It is in Phase with Its Higher Priority Tasks A formal proof of this Theorem is beyond the scope of this discussion. However, we provide an intuitive reasoning as to why Theorem 3 must be true. Intuitively, we can understand this result from the following reasoning. First let us try to understand the following fact.
The worst case response time for a task occurs when it is in phase with its higher To see why this statement must be true, consider the following statement. Under RMA whenever a higher priority task is ready, the lower priority tasks can not execute and have to wait. This implies that, a lower priority task will have to wait for the entire duration of execution of each higher priority task that arises during the execution of the lower priority task. More number of instances of a higher priority task will occur, when a task is in phase with it, when it is in phase with it rather than out of phase with it. This has been illustrated through a simple example in Fig. 30.4. In Fig. 30.4(a), a higher priority task T1=(10,30) is in phase with a lower priority task T2=(60,120), the response time of T2 is 90 msec. However, in Fig. 30.4(b), when T1 has a 20 msec phase, the response time of T2 becomes 80. Therefore, if a task meets its first deadline under zero phasing, then they it will meet all its deadlines. Example 7: Check whether the task set of Example 6 is actually schedulable under RMA. Solution: Though the results of Liu and Laylands test were negative as per the results of Example 6, we can apply the Lehoczky test and observe the following: For the task T1: e1 < p1 holds since 20 msec < 100 msec. Therefore, it would meet its first deadline (it does not have any tasks that have higher priority). Deadline for T1
T1 20 100 (a) T1 meets its first deadline Deadline for T2
T1 20
T2 50 (b) T2 meets its first deadline 150 Deadline for T3
T1 20
T2 50
T3
T1
T3 150
T2
T3 180 190 200
100 120 (c) T3 meets its first deadline
Fig. 30.5 Checking Lehoczkys Criterion for Tasks of Example 7
For the task T2: T1 is its higher priority task and considering 0 phasing, it would occur once before the deadline of T2. Therefore, (e1 + e2) < p2 holds, since 20 + 30 = 50 msec < 150 msec. Therefore, T2 meets its first deadline. For the task T3: (2e1 + 2e2 + e3) < p3 holds, since 220 + 230 + 90 = 190msec < 200 msec. We have considered 2e1 and 2e2 since T1 and T2 occur twice within the first deadline of T3. Therefore, T3 meets its first deadline. So, the given task set is schedulable under RMA. The schedulability test for T3 has pictorially been shown in Fig. 30.5. Since all the tasks meet their first deadlines under zero phasing, they are RMA schedulable according to Lehoczkys results.
Ti(1) T1(1)
T1(2)
T1(3)
Fig. 30.6 Instances of T1 over a single instance of Ti Let us now try to derive a formal expression for this important result of Lehoczky. Let {T1, T2, ,Ti} be the set of tasks to be scheduled. Let us also assume that the tasks have been ordered in descending order of their priority. That is, task priorities are related as: pr(T1) > pr(T2) > > pr(Ti), where pr(Ti) denotes the priority of the task Ti. Observe that the task T1 has the highest priority and task Ti has the least priority. This priority ordering can be assumed without any loss of generalization since the required priority ordering among an arbitrary collection of tasks can always be achieved by a simple renaming of the tasks. Consider that the task Ti arrives at the time instant 0. Consider the example shown in Fig. 30.6. During the first instance of the task Ti, three instances of the task T1 have occurred. Each time T1 occurs, Ti has to wait since T1 has higher priority than Ti. Let us now determine the exact number of times that T1 occurs within a single instance of Ti. This is given by pi / p1. Since T1s execution time is e1, then the total execution time required due to task T1 before the deadline of Ti is pi / p1 e1. This expression can easily be generalized to consider the execution times all tasks having higher priority than Ti (i.e. T1, T2, , Ti1). Therefore, the time for which Ti will have to wait due to all its higher priority tasks can be expressed as: i-1 pi / pk ek (3.5/2.11) k=1 Expression 3.5 gives the total time required to execute Tis higher priority tasks for which Ti would have to wait. So, the task Ti would meet its first deadline, iff i-1 ei + k=1 pi / pk ek pi (3.6/2.12) That is, if the sum of the execution times of all higher priority tasks occurring before Tis first deadline, and the execution time of the task itself is less than its period pi, then Ti would complete before its first deadline. Note that in Expr. 3.6, we have implicitly assumed that the
task periods equal their respective deadlines, i.e. pi = di. If pi < di, then the Expr. 3.6 would need modifications as follows. i-1 ei + k=1 di / pk ek di (3.7/2.13) Note that even if Expr. 3.7 is not satisfied, there is some possibility that the task set may still be schedulable. This might happen because in Expr. 3.7 we have considered zero phasing among all the tasks, which is the worst case. In a given problem, some tasks may have non-zero phasing. Therefore, even when a task set narrowly fails to meet Expr 3.7, there is some chance that it may in fact be schedulable under RMA. To understand why this is so, consider a task set where one particular task Ti fails Expr. 3.7, making the task set not schedulable. The task misses its deadline when it is in phase with all its higher priority task. However, when the task has non-zero phasing with at least some of its higher priority tasks, the task might actually meet its first deadline contrary to any negative results of the expression 3.7. Let us now consider two examples to illustrate the applicability of the Lehoczkys results. Example 8: Consider the following set of three periodic real-time tasks: T1=(10,20), T2=(15,60), T3=(20,120) to be run on a uniprocessor. Determine whether the task set is schedulable under RMA. Solution: First let us try the sufficiency test for RMA schedulability. By Expr. 3.4 (Liu and Layland test), the task set is schedulable if ui 0.78. ui = 10/20 + 15/60 + 20/120 = 0.91 This is greater than 0.78. Therefore, the given task set fails Liu and Layland test. Since Expr. 3.4 is a pessimistic test, we need to test further. Let us now try Lehoczkys test. All the tasks T1, T2, T3 are already ordered in decreasing order of their priorities. Testing for task T1: Since e1 (10 msec) is less than d1 (20 msec), T1 would meet its first deadline. Testing for task T2: 15 + 60/20 10 60 or 15 + 30 = 45 60 msec The condition is satisfied. Therefore, T2 would meet its first deadline. Testing for Task T3: 20 + 120/20 10 + 120/60 15 = 20 + 60 + 30 = 110 msec This is less than T3s deadline of 120. Therefore T3 would meet its first deadline. Since all the three tasks meet their respective first deadlines, the task set is RMA schedulable according to Lehoczkys results. Example 9: RMA is used to schedule a set of periodic hard real-time tasks in a system. Is it possible in this system that a higher priority task misses its deadline, whereas a lower priority task meets its deadlines? If your answer is negative, prove your denial. If your answer is affirmative, give an example involving two or three tasks scheduled using RMA where the lower priority task meets all its deadlines whereas the higher priority task misses its deadline. Solution: Yes. It is possible that under RMA a higher priority task misses its deadline where as a lower priority task meets its deadline. We show this by constructing an example. Consider the following task set: T1 = (e1=15, p1=20), T2 = (e2=6, p2=35), T3 = (e3=3, p3=100). For the given task set, it is easy to observe that pr(T1) > pr(T2) > pr(T3). That is, T1, T2, T3 are ordered in decreasing order of their priorities. Version 2 EE IIT, Kharagpur 14
For this task set, T3 meets its deadline according to Lehoczkys test since e3 + p3 / p2 e2 + p3 / p1 e1 = 3 + ( 100/35 6) + ( 100/20 15) = 3 + (3 6) + (5 15) = 96 100 msec. But, T2 does not meet its deadline since e2 + p2 / p1 e1 = 6 + ( 35/20 15) = 6 + (2 15) = 36 msec. This is greater than the deadline of T2 (35 msec). As a consequence of the results of Example 9, by observing that the lowest priority task of a given task set meets its first deadline, we can not conclude that the entire task set is RMA schedulable. On the contrary, it is necessary to check each task individually as to whether it meets its first deadline under zero phasing. If one finds that the lowest priority task meets its deadline, and concludes that the entire task set would be feasibly scheduled under RMA, he is likely to be flawed.
1.5.4. Achievable CPU Utilization

Liu and Laylands results (Expr. 3.4) bounded the CPU utilization below which a task set would be schedulable. It is clear from Expr. 3.4 and Fig. 30.10 that the Liu and Layland schedulability criterion is conservative and restricts the maximum achievable utilization due to any task set which can be feasibly scheduled under RMA to 0.69 when the number of tasks in the task set is large. However, (as you might have already guessed) this is a pessimistic figure. In fact, it has been found experimentally that for a large collection of tasks with independent periods, the maximum utilization below which a task set can feasibly be scheduled is on the average close to 88%. For harmonic tasks, the maximum achievable utilization (for a task set to have a feasible schedule) can still be higher. In fact, if all the task periods are harmonically related, then even a task set having 100% utilization can be feasibly scheduled. Let us first understand when are the periods of a task set said to be harmonically related. The task periods in a task set are said to be harmonically related, iff for any two arbitrary tasks Ti and Tk in the task set, whenever pi > pk, it should imply that pi is an integral multiple of pk. That is, whenever pi > pk, it should be possible to express pi as n pk for some integer n > 1. In other words, pk should squarely divide pi. An example of a harmonically related task set is the following: T1 = (5, 30), T2 = (8, 120), T3 = (12, 60). It is easy to prove that a harmonically related task set with even 100% utilization can feasibly be scheduled.
1.5.5. Theorem 4
For a set of harmonically related tasks HS = {Ti}, the RMA schedulability criterion is given n by i=1 ui 1. Proof: Let us assume that T1, T2, , Tn be the tasks in the given task set. Let us further assume that the tasks in the task set T1, T2, , Tn have been arranged in increasing order of their periods. That is, for any i and j, pi < pj whenever i < j. If this relationship is not satisfied, then a simple renaming of the tasks can achieve this. Now, according to Expr. 3.6, a task Ti meets its i-1 deadline, if ei + k=1 pi / pk ek pi. Version 2 EE IIT, Kharagpur 15
However, since the task set is harmonically related, pi can be written as m pk for some m. pi / pk = pi / pk. Now, Expr. 3.6 can be written as: Using this, i-1 ei + k=1 (pi / pk) ek pi For Ti = Tn, we can write, en + k=1 (pn / pk) ek pn. Dividing both sides of this expression by pn, we get the required result. n n Hence, the task set would be schedulable iff k=1 ek / pk 1 or i=1 ui 1.
n-1
1.5.6. Advantages and Disadvantages of RMA

In this section, we first discuss the important advantages of RMA over EDF. We then point out some disadvantages of using RMA. As we had pointed out earlier, RMA is very commonly used for scheduling real-time tasks in practical applications. Basic support is available in almost all commercial real-time operating systems for developing applications using RMA. RMA is simple and efficient. RMA is also the optimal static priority task scheduling algorithm. Unlike EDF, it requires very few special data structures. Most commercial real-time operating systems support real-time (static) priority levels for tasks. Tasks having real-time priority levels are arranged in multilevel feedback queues (see Fig. 30.7). Among the tasks in a single level, these commercial real-time operating systems generally provide an option of either time-slicing and round-robin scheduling or FIFO scheduling. RMA Transient Overload Handling: RMA possesses good transient overload handling capability. Good transient overload handling capability essentially means that, when a lower priority task does not complete within its planned completion time, it can not make any higher priority task to miss its deadline. Let us now examine how transient overload would affect a set of tasks scheduled under RMA. Will a delay in completion by a lower priority task affect a higher priority task? The answer is: No. A lower priority task even when it exceeds its planned execution time cannot make a higher priority task wait according to the basic principles of RMA whenever a higher priority task is ready, it preempts any executing lower priority task. Thus, RMA is stable under transient overload and a lower priority task overshooting its completion time can not make a higher priority task to miss its deadline.
Task Queue
Priority Level 1 2 3 4 5 6
Fig. 30.7 Multi-Level Feedback Queue The disadvantages of RMA include the following: It is very difficult to support aperiodic and sporadic tasks under RMA. Further, RMA is not optimal when task periods and deadlines differ.
1.6. Deadline Monotonic Algorithm (DMA)

RMA no longer remains an optimal scheduling algorithm for the periodic real-time tasks, when task deadlines and periods differ (i.e. d i pi) for some tasks in the task set to be scheduled. For such task sets, Deadline Monotonic Algorithm (DMA) turns out to be more proficient than RMA. DMA is essentially a variant of RMA and assigns priorities to tasks based on their deadlines, rather than assigning priorities based on task periods as done in RMA. DMA assigns higher priorities to tasks with shorter deadlines. When the relative deadline of every task is proportional to its period, RMA and DMA produce identical solutions. When the relative deadlines are arbitrary, DMA is more proficient than RMA in the sense that it can sometimes produce a feasible schedule when RMA fails. On the other hand, RMA always fails when DMA fails. We now illustrate our discussions using an example task set that is DMA schedulable but not RMA schedulable. Example 10: Is the following task set schedulable by DMA? Also check whether it is schedulable using RMA. T1 = (e1=10, p1=50, d1=35), T2 = (e2=15, p2=100, d1=20), T3 = (e3=20, p3=200, d1=200) [time in msec]. Solution: First, let us check RMA schedulability of the given set of tasks, by checking the Lehoczkys criterion. The tasks are already ordered in descending order of their priorities. Checking for T1: 10 msec < 35 msec. Hence, T1 would meet its first deadline. Checking for T2: (10 + 15) > 20 (exceeds deadline)
Thus, T2 will miss its first deadline. Hence, the given task set can not be feasibly scheduled under RMA. Now let us check the schedulability using DMA: Under DMA, the priority ordering of the tasks is as follows: pr(T2) > pr(T1) > pr(T3). Checking for T2: 15 msec < 20 msec. Hence, T2 will meet its first deadline. Checking for T1: (15 + 10) < 35 Hence T1 will meet its first deadline. Checking for T3: (20 + 30 + 40) < 200 Therefore, T3 will meet its deadline. Therefore, the given task set is schedulable under DMA but not under RMA.
1.7. Context Switching Overhead

So far, while determining schedulability of a task set, we had ignored the overheads incurred on account of context switching. Let us now investigate the effect of context switching overhead on schedulability of tasks under RMA. It is easy to realize that under RMA, whenever a task arrives, it preempts at most one task the task that is currently running. From this observation, it can be concluded that in the worstcase, each task incurs at most two context switches under RMA. One when it preempts the currently running task. And the other when it completes possibly the task that was preempted or some other task is dispatched to run. Of course, a task may incur just one context switching overhead, if it does not preempt any task. For example, it arrives when the processor is idle or when a higher priority task was running. However, we need to consider two context switches for every task, if we try to determine the worst-case context switching overhead. For simplicity we can assume that context switching time is constant, and equals c milliseconds where c is a constant. From this, it follows that the net effect of context switches is to increase the execution time ei of each task Ti to at most ei + 2c. It is therefore clear that in order to take context switching time into consideration, in all schedulability computations, we need to replace ei by ei + 2c for each Ti. Example 11: Check whether the following set of periodic real-time tasks is schedulable under RMA on a uniprocessor: T1 = (e1=20, p1=100), T2 = (e2=30, p2=150), T3 = (e3=90, p3=200). Assume that context switching overhead does not exceed 1 msec, and is to be taken into account in schedulability computations. Solution: The net effect of context switches is to increase the execution time of each task by two context switching times. Therefore, the utilization due to the task set is: 3 ui = 22/100 + 32/150 + 92/200 = 0.89 i=1 Since i=1 ui > 0.78, the task set is not RMA schedulable according to the Liu and Layland test. Let us try Lehoczkys test. The tasks are already ordered in descending order of their priorities. Checking for task T1: 22 < 100 Version 2 EE IIT, Kharagpur 18
3
The condition is satisfied; therefore T1 meets its first deadline. Checking for task T2: (222) + 32 < 150 The condition is satisfied; therefore T2 meets its first deadline. Checking for task T3: (222) + (322) + 90 < 200. The condition is satisfied; therefore T3 meets its first deadline. Therefore, the task set can be feasibly scheduled under RMA even when context switching overhead is taken into consideration.
1.8. Self Suspension

A task might cause its self-suspension, when it performs its input/output operations or when it waits for some events/conditions to occur. When a task self suspends itself, the operating system removes it from the ready queue, places it in the blocked queue, and takes up the next eligible task for scheduling. Thus, self-suspension introduces an additional scheduling point, which we did not consider in the earlier sections. Accordingly, we need to augment our definition of a scheduling point given in Sec. 2.3.1 (lesson 2). In event-driven scheduling, the scheduling points are defined by task completion, task arrival, and self-suspension events. Let us now determine the effect of self-suspension on the schedulability of a task set. Let us consider a set of periodic real-time tasks {T1, T2, , Tn}, which have been arranged in the increasing order of their priorities (or decreasing order of their periods). Let the worst case self-suspension time of a task Ti is bi. Let the delay that the task Ti might incur due to its own self-suspension and the self-suspension of all higher priority tasks be bti. Then, bti can be expressed as: i-1 bti = bi + k=1 min(ek, bk) (3.8/2.15) Self-suspension of a higher priority task Tk may affect the response time of a lower priority task Ti by as much as its execution time ek if ek < bk. This worst case delay might occur when the higher priority task after self-suspension starts its execution exactly at the time instant the lower priority task would have otherwise executed. That is, after self-suspension, the execution of the higher priority task overlaps with the lower priority task, with which it would otherwise not have overlapped. However, if ek > bk, then the self suspension of a higher priority task can delay a lower priority task by at most bk, since the maximum overlap period of the execution of a higher priority task due to self-suspension is restricted to bk. Note that in a system where some of the tasks are non preemptable, the effect of self suspension is much more severe than that computed by Expr.3.8. The reason is that, every time a processor self suspends itself, it loses the processor. It may be blocked by a non-preemptive lower priority task after the completion of self-suspension. Thus, in a non-preemptable scenario, a task incurs delays due to self-suspension of itself and its higher priority tasks, and also the delay caused due to non-preemptable lower priority tasks. Obviously, a task can not get delayed due to the self-suspension of a lower priority non-preemptable task. The RMA task schedulability condition of Liu and Layland (Expr. 3.4) needs to change when we consider the effect of self-suspension of tasks. To consider the effect of self-suspension in Expr. 3.4, we need to substitute ei by (ei + bti). If we consider the effect of self-suspension on task completion time, the Lehoczky criterion (Expr. 3.6) would also have to be generalized: Version 2 EE IIT, Kharagpur 19
ei + bti + k=1 pi / pk ek pi (3.9/2.16) We have so far implicitly assumed that a task undergoes at most a single self suspension. However, if a task undergoes multiple self-suspensions, then expression 3.9 we derived above, would need to be changed. We leave this as an exercise for the reader. Example 14: Consider the following set of periodic real-time tasks: T1 = (e1=10, p1=50), T2 = (e2=25, p2=150), T3 = (e3=50, p3=200) [all in msec]. Assume that the self-suspension times of T1, T2, and T3 are 3 msec, 3 msec, and 5 msec, respectively. Determine whether the tasks would meet their respective deadlines, if scheduled using RMA. Solution: The tasks are already ordered in descending order of their priorities. By using the generalized Lehoczkys condition given by Expr. 3.9, we get: For T1 to be schedulable: (10 + 3) < 50 Therefore T1 would meet its first deadline. For T2 to be schedulable: (25 + 6 + 103) < 150 Therefore, T2 meets its first deadline. For T3 to be schedulable: (50 + 11 + (104 + 252)) < 200 This inequality is also satisfied. Therefore, T3 would also meet its first deadline. It can therefore be concluded that the given task set is schedulable under RMA even when self-suspension of tasks is considered.
i-1
1.9. Self Suspension with Context Switching Overhead

Let us examine the effect of context switches on the generalized Lehoczkys test (Expr.3.9) for schedulability of a task set, which takes self-suspension by tasks into account. In a fixed priority preemptable system, each task preempts at most one other task if there is no self suspension. Therefore, each task suffers at most two context switches one context switch when it starts and another when it completes. It is easy to realize that any time when a task selfsuspends, it causes at most two additional context switches. Using a similar reasoning, we can determine that when each task is allowed to self-suspend twice, additional four context switching overheads are incurred. Let us denote the maximum context switch time as c. The effect of a single self-suspension of tasks is to effectively increase the execution time of each task Ti in the worst case from ei to (ei + 4c). Thus, context switching overhead in the presence of a single self-suspension of tasks can be taken care of by replacing the execution time of a task Ti by (ei + 4c) in Expr. 3.9. We can easily extend this argument to consider two, three, or more selfsuspensions.
1.10.Exercises
1. State whether the following assertions are True or False. Write one or two sentences to justify your choice in each case. a. When RMA is used for scheduling a set of hard real-time periodic tasks, the upper bound on achievable utilization improves as the number in tasks in the system being developed increases. Version 2 EE IIT, Kharagpur 20
b.
2.
If a set of periodic real-time tasks fails Lehoczkys test, then it can safely be concluded that this task set can not be feasibly scheduled under RMA. c. A time-sliced round-robin scheduler uses preemptive scheduling. d. RMA is an optimal static priority scheduling algorithm to schedule a set of periodic real-time tasks on a non-preemptive operating system. e. Self-suspension of tasks impacts the worst case response times of the individual tasks much more adversely when preemption of tasks is supported by the operating system compared to the case when preemption is not supported. f. When a set of periodic real-time tasks is being scheduled using RMA, it can not be the case that a lower priority task meets its deadline, whereas some higher priority task does not. g. EDF (Earliest Deadline First) algorithm possesses good transient overload handling capability. h. A time-sliced round robin scheduler is an example of a non-preemptive scheduler. i. EDF algorithm is an optimal algorithm for scheduling hard real-time tasks on a uniprocessor when the task set is a mixture of periodic and aperiodic tasks. j. In a non-preemptable operating system employing RMA scheduling for a set of realtime periodic tasks, self-suspension of a higher priority task (due to I/O etc.) may increase the response time of a lower priority task. k. The worst-case response time for a task occurs when it is out of phase with its higher priority tasks. l. Good real-time task scheduling algorithms ensure fairness to real-time tasks while scheduling. State whether the following assertions are True or False. Write one or two sentences to justify your choice in each case. a. The EDF algorithm is optimal for scheduling real-time tasks in a uniprocessor in a non-preemptive environment. b. When RMA is used to schedule a set of hard real-time periodic tasks in a uniprocessor environment, if the processor becomes overloaded any time during system execution due to overrun by the lowest priority task, it would be very difficult to predict which task would miss its deadline. c. While scheduling a set of real-time periodic tasks whose task periods are harmonically related, the upper bound on the achievable CPU utilization is the same for both EDF and RMA algorithms. d. In a non-preemptive event-driven task scheduler, scheduling decisions are made only at the arrival and completion of tasks. e. The following is the correct arrangement of the three major classes of real-time scheduling algorithms in ascending order of their run-time overheads. static priority preemptive scheduling algorithms table-driven algorithms dynamic priority algorithms f. While scheduling a set of independent hard real-time periodic tasks on a uniprocessor, RMA can be as proficient as EDF under some constraints on the task set. g. RMA should be preferred over the time-sliced round-robin algorithm for scheduling a set of soft real-time tasks on a uniprocessor.
3.
4. 5.
6.
7.
Under RMA, the achievable utilization of a set of hard real-time periodic tasks would drop when task periods are multiples of each other compared to the case when they are not. i. RMA scheduling of a set of real-time periodic tasks using the Liu and Layland criterion might produce infeasible schedules when the task periods are different from the task deadlines. What do you understand by scheduling point of a task scheduling algorithm? How are the scheduling points determined in (i) clock-driven, (ii) event-driven, (iii) hybrid schedulers? How will your definition of scheduling points for the three classes of schedulers change when (a) self-suspension of tasks, and (b) context switching overheads of tasks are taken into account. What do you understand by jitter associated with a periodic task? How are these jitters caused? Is EDF algorithm used for scheduling real-time tasks a dynamic priority scheduling algorithm? Does EDF compute any priority value of tasks any time? If you answer affirmatively, then explain when is the priority computed and how is it computed. If you answer in negative, then explain the concept of priority in EDF. What is the sufficient condition for EDF schedulability of a set of periodic tasks whose period and deadline are different? Construct an example involving a set of three periodic tasks whose period differ from their respective deadlines such that the task set fails the sufficient condition and yet is EDF schedulable. Verify your answer. Show all your intermediate steps. A preemptive static priority real-time task scheduler is used to schedule two periodic tasks T1 and T2 with the following characteristics: Task T1 T2 Phase mSec 0 0 Execution Time mSec 10 20 Relative Deadline mSec 20 50 Period mSec 20 50
h.
8.
Assume that T1 has higher priority than T2. A background task arrives at time 0 and would require 1000mSec to complete. Compute the completion time of the background task assuming that context switching takes no more than 0.5 mSec. Assume that a preemptive priority-based system consists of three periodic foreground tasks T1, T2, and T3 with the following characteristics: Task T1 T2 T3 Phase mSec 0 0 0 Execution Time mSec 20 30 30 Relative Deadline mSec 100 150 300 Period mSec 100 150 300
T1 has higher priority than T2 and T2 has higher priority than T3. A background task Tb arrives at time 0 and would require 2000mSec to complete. Compute the completion time of the background task Tb assuming that context switching time takes no more than 1 mSec. Version 2 EE IIT, Kharagpur 22
9.
Consider the following set of four independent real-time periodic tasks. Task T1 T2 T3 T4 Start Time msec 20 40 20 60 Processing Time msec 25 10 15 50 Period msec 150 50 50 200
10.
11. 12.
Assume that task T3 is more critical than task T2. Check whether the task set can be feasibly scheduled using RMA. What is the worst case response time of the background task of a system in which the background task requires 1000 msec to complete? There are two foreground tasks. The higher priority foreground task executes once every 100mSec and each time requires 25mSec to complete. The lower priority foreground task executes once every 50 msec and requires 15 msec to complete. Context switching requires no more than 1 msec. Construct an example involving more than one hard real-time periodic task whose aggregate processor utilization is 1, and yet schedulable under RMA. Determine whether the following set of periodic tasks is schedulable on a uniprocessor using DMA (Deadline Monotonic Algorithm). Show all intermediate steps in your computation. Task T1 T2 T3 T4 Start Time mSec 20 60 40 25 Processing Time mSec 25 10 20 10 Period mSec 150 60 200 80 Deadline mSec 140 40 120 25
13.
Consider the following set of three independent real-time periodic tasks. Task T1 T2 T3 Start Time mSec 20 60 40 Processing Time mSec 25 10 50 Period mSec 150 50 200 Deadline mSec 100 30 150
14.
Determine whether the task set is schedulable on a uniprocessor using EDF. Show all intermediate steps in your computation. Determine whether the following set of periodic real-time tasks is schedulable on a uniprocessor using RMA. Show the intermediate steps in your computation. Is RMA optimal when the task deadlines differ from the task periods? Version 2 EE IIT, Kharagpur 23
Task T1 T2 T3 T4 15.
Start Time mSec 20 40 60 25
Processing Time mSec 25 7 10 10
Period mSec 150 40 60 30
Deadline mSec 100 40 50 20
16.
17.
18.
19.
Construct an example involving two periodic real-time tasks which can be feasibly scheduled by both RMA and EDF, but the schedule generated by RMA differs from that generated by EDF. Draw the two schedules on a time line and highlight how the two schedules differ. Consider the two tasks such that for each task: a. the period is the same as deadline b. period is different from deadline Can multiprocessor real-time task scheduling algorithms be used satisfactorily in distributed systems. Explain the basic difference between the characteristics of a real-time task scheduling algorithm for multiprocessors and a real-time task scheduling algorithm for applications running on distributed systems. Construct an example involving a set of hard real-time periodic tasks that are not schedulable under RMA but could be feasibly scheduled by DMA. Verify your answer, showing all intermediate steps. Three hard real-time periodic tasks T1 = (50, 100, 100), T2 = (70, 200, 200), and T3 = (60, 400, 400) [time in msec] are to be scheduled on a uniprocessor using RMA. Can the task set be feasibly be scheduled? Suppose context switch overhead of 1 millisecond is to be taken into account, determine the schedulability. Consider the following set of three real-time periodic tasks. Task T1 T2 T3 a. b. Start Time mSec 20 40 60 Processing Time mSec 25 10 50 Period mSec 150 50 200 Deadline mSec 100 50 200
c.
d.
Check whether the three given tasks are schedulable under RMA. Show all intermediate steps in your computation. Assuming that each context switch incurs an overhead of 1 msec, determine whether the tasks are schedulable under RMA. Also, determine the average context switching overhead per unit of task execution. Assume that T1, T2, and T3 self-suspend for 10 msec, 20 msec, and 15 msec respectively. Determine whether the task set remains schedulable under RMA. The context switching overhead of 1 msec should be considered in your result. You can assume that each task undergoes self-suspension only once during each of its execution. Assuming that T1 and T2 are assigned the same priority value, determine the additional delay in response time that T2 would incur compared to the case when they are assigned distinct priorities. Ignore the self-suspension times and the context switch overhead for this part of the question. Version 2 EE IIT, Kharagpur 24
Module 6
Lesson 31
Concepts in Real-Time Operating Systems

At the end of this lesson, the student would be able to: Know the clock and time services provided by a Real-Time OS Get an overview of the features that a Real-Time OS is required to support Investigate Unix as a Real-Time operating System Know the shortcomings on traditional Unix in Real-Time applications Know the different approaches taken to make Unix suitable for real-time applications Investigate Windows as a Real-Time operating System Know the features of Windows NT desirable for Real-Time applications Know the shortcomings of Windows NT Compare Windows with Unix OS
1. Introduction
In the last three lessons, we discussed the important real-time task scheduling techniques. We highlighted that timely production of results in accordance to a physical clock is vital to the satisfactory operation of a real-time system. We had also pointed out that real-time operating systems are primarily responsible for ensuring that every real-time task meets its timeliness requirements. A real-time operating system in turn achieves this by using appropriate task scheduling techniques. Normally real-time operating systems provide flexibility to the programmers to select an appropriate scheduling policy among several supported policies. Deployment of an appropriate task scheduling technique out of the supported techniques is therefore an important concern for every real-time programmer. To be able to determine the suitability of a scheduling algorithm for a given problem, a thorough understanding of the characteristics of various real-time task scheduling algorithms is important. We therefore had a rather elaborate discussion on real-time task scheduling techniques and certain related issues such as sharing of critical resources and handling task dependencies. In this lesson, we examine the important features that a real-time operating system is expected to support. We start by discussing the time service supports provided by the real-time operating systems, since accurate and high precision clocks are very important to the successful operation any real- time application. Next, we point out the important features that a real-time operating system needs to support. Finally, we discuss the issues that would arise if we attempt to use a general purpose operating system such as UNIX or Windows in real-time applications.
1.1. Time Services

Clocks and time services are among some of the basic facilities provided to programmers by every real-time operating system. The time services provided by an operating system are based on a software clock called the system clock maintained by the operating system. The system clock is maintained by the kernel based on the interrupts received from the hardware clock. Since hard real-time systems usually have timing constraints in the micro seconds range, the
system clock should have sufficiently fine resolution 1 to support the necessary time services. However, designers of real-time operating systems find it very difficult to support very fine resolution system clocks. In current technology, the resolution of hardware clocks is usually finer than a nanosecond (contemporary processor speeds exceed 3GHz). But, the clock resolution being made available by modern real-time operating systems to the programmers is of the order of several milliseconds or worse. Let us first investigate why real-time operating system designers find it difficult to maintain system clocks with sufficiently fine resolution. We then examine various time services that are built based on the system clock, and made available to the real-time programmers. The hardware clock periodically generates interrupts (often called time service interrupts). After each clock interrupt, the kernel updates the software clock and also performs certain other work (explained in Sec 4.1.1). A thread can get the current time reading of the system clock by invoking a system call supported by the operating system (such as the POSIX clock-gettime()). The finer the resolution of the clock, the more frequent need to be the time service interrupts and larger is the amount of processor time the kernel spends in responding to these interrupts. This overhead places a limitation on how fine is the system clock resolution a computer can support. Another issue that caps the resolution of the system clock is the response time of the clock-gettime() system call is not deterministic. In fact, every system call (or for that matter, a function call) has some associated jitter. The problem gets aggravated in the following situation. The jitter is caused on account of interrupts having higher priority than system calls. When an interrupt occurs, the processing of a system call is stalled. Also, the preemption time of system calls can vary because many operating systems disable interrupts while processing a system call. The variation in the response time (jitter) introduces an error in the accuracy of the time value that the calling thread gets from the kernel. Remember that jitter was defined as the difference between the worst-case response time and the best case response time (see Sec. 2.3.1). In commercially available operating systems, jitters associated with system calls can be several milliseconds. A software clock resolution finer than this error, is therefore not meaningful. We now examine the different activities that are carried out by a handler routine after a clock interrupt occurs. Subsequently, we discuss how sufficient fine resolution can be provided in the presence of jitter in function calls.
1.1.1. Clock Interrupt Processing

Expiration time t4=15 t3=3 t2=3 t1=1
Handler Handler 2 1
Fig. 31.1 Structure of a Timer Queue
Clock resolution denotes the time granularity provided by the clock of a computer. It corresponds to the duration of time that elapses between two successive clock ticks.
Each time a clock interrupt occurs, besides incrementing the software clock, the handler routine carries out the following activities: Process timer events: Real-time operating systems maintain either per-process timer queues or a single system-wide timer queue. The structure of such a timer queue has been shown in Fig. 31.1. A timer queue contains all timers arranged in order of their expiration times. Each timer is associated with a handler routine. The handler routine is the function that should be invoked when the timer expires. At each clock interrupt, the kernel checks the timer data structures in the timer queue to see if any timer event has occurred. If it finds that a timer event has occurred, then it queues the corresponding handler routine in the ready queue. Update ready list: Since the occurrence of the last clock event, some tasks might have arrived or become ready due to the fulfillment of certain conditions they were waiting for. The tasks in the wait queue are checked, the tasks which are found to have become ready, are queued in the ready queue. If a task having higher priority than the currently running task is found to have become ready, then the currently running task is preempted and the scheduler is invoked. Update execution budget: At each clock interrupt, the scheduler decrements the time slice (budget) remaining for the executing task. If the remaining budget becomes zero and the task is not complete, then the task is preempted, the scheduler is invoked to select another task to run.
1.1.2. Providing High Clock Resolution

We had pointed out in Sec. 4.1 that there are two main difficulties in providing a high resolution timer. First, the overhead associated with processing the clock interrupt becomes excessive. Secondly, the jitter associated with the time lookup system call (clock-gettime()) is often of the order of several milliseconds. Therefore, it is not useful to provide a clock with a resolution any finer than this. However, some real-time applications need to deal with timing constraints of the order of a few nanoseconds. Is it at all possible to support time measurement with nanosecond resolution? A way to provide sufficiently fine clock resolution is by mapping a hardware clock into the address space of applications. An application can then read the hardware clock directly (through a normal memory read operation) without having to make a system call. On a Pentium processor, a user thread can be made to read the Pentium time stamp counter. This counter starts at 0 when the system is powered up and increments after each processor cycle. At todays processor speed, this means that during every nanosecond interval, the counter increments several times. However, making the hardware clock readable by an application significantly reduces the portability of the application. Processors other than Pentium may not have a high resolution counter, and certainly the memory address map and resolution would differ.
1.1.3. Timers
We had pointed out that timer service is a vital service that is provided to applications by all real-time operating systems. Real-time operating systems normally support two main types of timers: periodic timers and aperiodic (or one shot) timers. We now discuss some basic concepts about these two types of timers. Version 2 EE IIT, Kharagpur 5
Periodic Timers: Periodic timers are used mainly for sampling events at regular intervals or performing some activities periodically. Once a periodic timer is set, each time after it expires the corresponding handler routine is invoked, it gets reinserted into the timer queue. For example, a periodic timer may be set to 100 msec and its handler set to poll the temperature sensor after every 100 msec interval. Aperiodic (or One Shot) Timers: These timers are set to expire only once. Watchdog timers are popular examples of one shot timers. f(){ wd_start(t1, exception-handler);
start
t1
wd_tickle ( ); } Fig. 31.2 Use of a Watchdog Timer
end
Watchdog timers are used extensively in real-time programs to detect when a task misses its deadline, and then to initiate exception handling procedures upon a deadline miss. An example use of a watchdog timer has been illustrated in Fig. 31.2. In Fig. 31.2, a watchdog timer is set at the start of a certain critical function f() through a wd_start(t1) call. The wd_start(t1) call sets the watch dog timer to expire by the specified deadline (t1) of the starting of the task. If the function f() does not complete even after t1 time units have elapsed, then the watchdog timer fires, indicating that the task deadline must have been missed and the exception handling procedure is initiated. In case the task completes before the watchdog timer expires (i.e. the task completes within its deadline), then the watchdog timer is reset using a wd_ tickle() call.
1.2. Features of a Real-Time Operating System

Before discussing about commercial real-time operating systems, we must clearly understand the features normally expected of a real-time operating system and also let us compare different real-time operating systems. This would also let us understand the differences between a traditional operating system and a real-time operating system. In the following, we identify some important features required of a real-time operating system, and especially those that are normally absent in traditional operating systems. Clock and Timer Support: Clock and timer services with adequate resolution are one of the most important issues in real-time programming. Hard real-time application development often requires support of timer services with resolution of the order of a few microseconds. And even finer resolution may be required in case of certain special applications. Clocks and timers are a vital part of every real-time operating system. On the other hand, traditional operating systems often do not provide time services with sufficiently high resolution.
Real-Time Priority Levels: A real-time operating system must support static priority levels. A priority level supported by an operating system is called static, when once the programmer assigns a priority value to a task, the operating system does not change it by itself. Static priority levels are also called real-time priority levels. This is because, as we discuss in section 4.3, all traditional operating systems dynamically change the priority levels of tasks from programmer assigned values to maximize system throughput. Such priority levels that are changed by the operating system dynamically are obviously not static priorities. Fast Task Preemption: For successful operation of a real-time application, whenever a high priority critical task arrives, an executing low priority task should be made to instantly yield the CPU to it. The time duration for which a higher priority task waits before it is allowed to execute is quantitatively expressed as the corresponding task preemption time. Contemporary real-time operating systems have task preemption times of the order of a few micro seconds. However, in traditional operating systems, the worst case task preemption time is usually of the order of a second. We discuss in the next section that this significantly large latency is caused by a non-preemptive kernel. It goes without saying that a real-time operating system needs to have a preemptive kernel and should have task preemption times of the order of a few micro seconds. Predictable and Fast Interrupt Latency: Interrupt latency is defined as the time delay between the occurrence of an interrupt and the running of the corresponding ISR (Interrupt Service Routine). In real-time operating systems, the upper bound on interrupt latency must be bounded and is expected to be less than a few micro seconds. The way low interrupt latency is achieved, is by performing bulk of the activities of ISR in a deferred procedure call (DPC). A DPC is essentially a task that performs most of the ISR activity. A DPC is executed later at a certain priority value. Further, support for nested interrupts are usually desired. That is, a realtime operating system should not only be preemptive while executing kernel routines, but should be preemptive during interrupt servicing as well. This is especially important for hard real-time applications with sub-microsecond timing requirements. Support for Resource Sharing Among Real-Time Tasks: If real- time tasks are allowed to share critical resources among themselves using the traditional resource sharing techniques, then the response times of tasks can become unbounded leading to deadline misses. This is one compelling reason as to why every commercial real-time operating system should at the minimum provide the basic priority inheritance mechanism. Support of priority ceiling protocol (PCP) is also desirable, if large and moderate sized applications are to be supported. Requirements on Memory Management: As far as general-purpose operating systems are concerned, it is rare to find one that does not support virtual memory and memory protection features. However, embedded real-time operating systems almost never support these features. Only those that are meant for large and complex applications do. Real-time operating systems for large and medium sized applications are expected to provide virtual memory support, not only to meet the memory demands of the heavy weight tasks of the application, but to let the memory demanding non-real-time applications such as text editors, e-mail software, etc. to also run on the same platform. Virtual memory reduces the average memory access time, but degrades the worst-case memory access time. The penalty of using virtual memory is the overhead associated with storing the address translation table and performing the virtual to physical address translations. Moreover, fetching pages from the secondary memory on demand incurs significant latency. Therefore, operating systems supporting virtual memory must provide the real-time Version 2 EE IIT, Kharagpur 7
applications with some means of controlling paging, such as memory locking. Memory locking prevents a page from being swapped from memory to hard disk. In the absence of memory locking feature, memory access times of even critical real-time tasks can show large jitter, as the access time would greatly depend on whether the required page is in the physical memory or has been swapped out. Memory protection is another important issue that needs to be carefully considered. Lack of support for memory protection among tasks leads to a single address space for the tasks. Arguments for having only a single address space include simplicity, saving memory bits, and light weight system calls. For small embedded applications, the overhead of a few Kilo Bytes of memory per process can be unacceptable. However, when no memory protection is provided by the operating system, the cost of developing and testing a program without memory protection becomes very high when the complexity of the application increases. Also, maintenance cost increases as any change in one module would require retesting the entire system. Embedded real-time operating systems usually do not support virtual memory. Embedded real-time operating systems create physically contiguous blocks of memory for an application upon request. However, memory fragmentation is a potential problem for a system that does not support virtual memory. Also, memory protection becomes difficult to support a non-virtual memory management system. For this reason, in many embedded systems, the kernel and the user processes execute in the same space, i.e. there is no memory protection. Hence, a system call and a function call within an application are indistinguishable. This makes debugging applications difficult, since a run away pointer can corrupt the operating system code, making the system freeze. Additional Requirements for Embedded Real-Time Operating Systems: Embedded applications usually have constraints on cost, size, and power consumption. Embedded real-time operating systems should be capable of diskless operation, since many times disks are either too bulky to use, or increase the cost of deployment. Further, embedded operating systems should minimize total power consumption of the system. Embedded operating systems usually reside on ROM. For certain applications which require faster response, it may be necessary to run the realtime operating system on a RAM. Since the access time of a RAM is lower than that of a ROM, this would result in faster execution. Irrespective of whether ROM or RAM is used, all ICs are expensive. Therefore, for real-time operating systems for embedded applications it is desirable to have as small a foot print (memory usage) as possible. Since embedded products are typically manufactured large scale, every rupee saved on memory and other hardware requirements impacts millions in profit.
1.3. Unix as a Real-Time Operating System

Unix is a popular general purpose operating system that was originally developed for the mainframe computers. However, UNIX and its variants have now permeated to desktop and even handheld computers. Since UNIX and its variants inexpensive and are widely available, it is worthwhile to investigate whether Unix can be used in real-time applications. This investigation would lead us to some significant findings and would give us some crucial insights into the current Unix-based real-time operating systems that are currently commercially available. The traditional UNIX operating system suffers from several shortcomings when used in realtime applications. We elaborate these problems in the following two subsections. Version 2 EE IIT, Kharagpur 8
The two most troublesome problems that a real-time programmer faces while using Unix for real-time applications include non-preemptive Unix kernel and dynamically changing priority of tasks.
1.3.1. Non-Preemptive Kernel

One of the biggest problems that real-time programmers face while using Unix for real-time application development is that Unix kernel cannot be preempted. That is, all interrupts are disabled when any operating system routine runs. To set things in proper perspective, let us elaborate this issue. Application programs invoke operating system services through system calls. Examples of system calls include the operating system services for creating a process, interprocess communication, I/O operations, etc. After a system call is invoked by an application, the arguments given by the application while invoking the system call are checked. Next, a special instruction called a trap (or a software interrupt) is executed. As soon as the trap instruction is executed, the handler routine changes the processor state from user mode to kernel mode (or supervisor mode), and the execution of the required kernel routine starts. The change of mode during a system call has schematically been depicted in Fig. 31.3.
System call
Check parameters OS Service (Kernel mode) Trap
Application Program (user mode) System call Next statement
Fig. 31.3 Invocation of an Operating System Service through System Call At the risk of digressing from the focus of this discussion, let us understand an important operating systems concept. Certain operations such as handling devices, creating processes, file operations, etc., need to be done in the kernel mode only. That is, application programs are prevented from carrying out these operations, and need to request the operating system (through a system call) to carry out the required operation. This restriction enables the kernel to enforce discipline among different programs in accessing these objects. In case such operations are not performed in the kernel mode, different application programs might interfere with each others operation. An example of an operating system where all operations were performed in user mode is the once popular operating system DOS (though DOS is nearly obsolete now). In DOS, application programs are free to carry out any operation in user mode 2 , including crashing the system by deleting the system files. The instability this can bring about is clearly unacceptable in real-time environment, and is usually considered insufficient in general applications as well.
2
In fact, in DOS there is only one mode of operation, i.e. kernel mode and user mode are indistinguishable.
A process running in kernel mode cannot be preempted by other processes. In other words, the Unix kernel is non-preemptive. On the other hand, the Unix system does preempt processes running in the user mode. A consequence of this is that even when a low priority process makes a system call, the high priority processes would have to wait until the system call completes. The longest system calls may take up to several hundreds of milliseconds to complete. Worst-case preemption times of several hundreds of milliseconds can easily cause, high priority tasks with short deadlines of the order of a few milliseconds to miss their deadlines. Let us now investigate, why the Unix kernel was designed to be non-preemptive in the first place. Whenever an operating system routine starts to execute, all interrupts are disabled. The interrupts are enabled only after the operating system routine completes. This was a very efficient way of preserving the integrity of the kernel data structures. It saved the overheads associated with setting and releasing locks and resulted in lower average task preemption times. Though a non-preemptive kernel results in worst-case task response time of upto a second, it was acceptable to Unix designers. At that time, the Unix designers did not foresee usage of Unix in real-time applications. Of course, it could have been possible to ensure correctness of kernel data structures by using locks at appropriate places rather than disabling interrupts, but it would have resulted in increasing the average task preemption time. In Sec. 4.4.4 we investigate how modern real-time operating systems make the kernel preemptive without unduly increasing the task preemption time.
1.3.2. Dynamic Priority Levels

In Unix systems real-time tasks can not be assigned static priority values. Soon after a programmer sets a priority value, the operating system alters it. This makes it very difficult to schedule real-time tasks using algorithms such as RMA or EDF, since both these schedulers assume that once task priorities are assigned, it should not be altered by any other parts of the operating system. It is instructive to understand why Unix dynamically changes the priority values of tasks in the first place. Unix uses round-robin scheduling with multilevel feedback. This scheduler arranges tasks in multilevel queues as shown in Fig. 31.4. At every preemption point, the scheduler scans the multilevel queue from the top (highest priority) and selects the task at the head of the first nonempty queue. Each task is allowed to run for a fixed time quantum (or time slice) at a time. Unix normally uses one second time slice. That is, if the running process does not block or complete within one second of its starting execution, it is preempted and the scheduler selects the next task for dispatching. Unix system however allows configuring the default one second time slice during system generation. The kernel preempts a process that does not complete within its assigned time quantum, recomputes its priority, and inserts it back into one of the priority queues depending on the recomputed priority value of the task.
Tasks 1
Priority Level
2 Task Queues 3 4 5 6 Fig. 31.4 Multi-Level Feedback Queues Unix periodically computes the priority of a task based on the type of the task and its execution history. The priority of a task (Ti) is recomputed at the end of its j-th time slice using the following two expressions: Pr(Ti, j) = Base(Ti) + CPU(Ti, j) + nice(Ti) (4.1) CPU(Ti, j) = U(Ti, j1) / 2 + CPU(Ti, j1) / 2 (4.2) where Pr(Ti, j) is the priority of the task Ti at the end of its j-th time slice; U(Ti , j) is the utilization of the task Ti for its j-th time slice, and CPU(Ti , j) is the weighted history of CPU utilization of the task Ti at the end of its j-th time slice. Base(Ti) is the base priority of the task Ti and nice(Ti) is the nice value associated with Ti. User processes can have non-negative nice values. Thus, effectively the nice value lowers the priority value of a process (i.e. being nice to the other processes). Expr. 4.2 has been recursively defined. Unfolding the recursion, we get: CPU(Ti, j) = U(Ti, j1) / 2 + U(Ti, j2) / 4 + (4.3) It can be easily seen from Expr. 4.3 that, in the computation of the weighted history of CPU utilization of a task, the activity (i.e. processing or I/O) of the task in the immediately concluded interval is given the maximum weightage. If the task used up CPU for the full duration of the slice (i.e. 100% CPU utilization), then CPU(Ti, j) gets a higher value indicating a lower priority. Observe that the activities of the task in the preceding intervals get progressively lower weightage. It should be clear that CPU(Ti, j) captures the weighted history of CPU utilization of the task Ti at the end of its j-th time slice. Now, substituting Expr 4.3 in Expr. 4.1, we get: Pr(Ti, j) = Base(Ti) + U(Ti, j1) / 2 + U(Ti, j2) / 4 + + nice(Ti) (4.4) The purpose of the base priority term in the priority computation expression (Expr. 4.4) is to divide all tasks into a set of fixed bands of priority levels. The values of U(Ti , j) and nice components are restricted to be small enough to prevent a process from migrating from its assigned band. The bands have been designed to optimize I/O, especially Version 2 EE IIT, Kharagpur 11
block I/O. The different priority bands under Unix in decreasing order of priorities are: swapper, block I/O, file manipulation, character I/O and device control, and user processes. Tasks performing block I/O are assigned the highest priority band. To give an example of block I/O, consider the I/O that occurs while handling a page fault in a virtual memory system. Such block I/O use DMA-based transfer, and hence make efficient use of I/O channel. Character I/O includes mouse and keyboard transfers. The priority bands were designed to provide the most effective use of the I/O channels. Dynamic re-computation of priorities was motivated from the following consideration. Unix designers observed that in any computer system, I/O is the bottleneck. Processors are extremely fast compared to the transfer rates of I/O devices. I/O devices such as keyboards are necessarily slow to cope up with the human response times. Other devices such as printers and disks deploy mechanical components that are inherently slow and therefore can not sustain very high rate of data transfer. Therefore, effective use of the I/O channels is very important to increase the overall system throughput. The I/O channels should be kept as busy as possible for letting the interactive tasks to get good response time. To keep the I/O channels busy, any task performing I/O should not be kept waiting for CPU. For this reason, as soon as a task blocks for I/O, its priority is increased by the priority re-computation rule given in Expr. 4.4. However, if a task makes full use of its last assigned time slice, it is determined to be computation-bound and its priority is reduced. Thus the basic philosophy of Unix operating system is that the interactive tasks are made to assume higher priority levels and are processed at the earliest. This gives the interactive users good response time. This technique has now become an accepted way of scheduling soft real-time tasks across almost all available general purpose operating systems. We can now state from the above observations that the overall effect of recomputation of priority values using Expr. 4.4 as follows: In Unix, I/O intensive tasks migrate to higher and higher priorities, whereas CPUintensive tasks seek lower priority levels. No doubt that the approach taken by Unix is very appropriate for maximizing the average task throughput, and does indeed provide good average responses time to interactive (soft realtime) tasks. In fact, almost every modern operating system does very similar dynamic recomputation of the task priorities to maximize the overall system throughput and to provide good average response time to the interactive tasks. However, for hard real-time tasks, dynamic shifting of priority values is clearly not appropriate.
1.3.3. Other Deficiencies of Unix

We have so far discussed two glaring shortcomings of Unix in handling the requirements of real-time applications. We now discuss a few other deficiencies of Unix that crop up while trying to use Unix in real-time applications. Insufficient Device Driver Support: In Unix, (remember that we are talking of the original Unix System V) device drivers run in kernel mode. Therefore, if support for a new device is to be added, then the driver module has to be linked to the kernel modules necessitating a system generation step. As a result, providing support for a new device in an already deployed application is cumbersome.
Lack of Real-Time File Services: In Unix, file blocks are allocated as and when they are requested by an application. As a consequence, while a task is writing to a file, it may encounter an error when the disk runs out of space. In other words, no guarantee is given that disk space would be available when a task writes a block to a file. Traditional file writing approaches also result in slow writes since required space has to be allocated before writing a block. Another problem with the traditional file systems is that blocks of the same file may not be contiguously located on the disk. This would result in read operations taking unpredictable times, resulting in jitter in data access. In real-time file systems significant performance improvement can be achieved by storing files contiguously on the disk. Since the file system pre-allocates space, the times for read and write operations are more predictable. Inadequate Timer Services Support: In Unix systems, real-time timer support is insufficient for many hard real-time applications. The clock resolution that is provided to applications is 10 milliseconds, which is too coarse for many hard real-time applications.
1.4. Unix-based Real-Time Operating Systems

We have already seen in the previous section that traditional Unix systems are not suitable for being used in hard real-time applications. In this section, we discuss the different approaches that have been undertaken to make Unix suitable for real-time applications.
1.4.1. Extensions To The Traditional Unix Kernel

A naive attempted in the past to make traditional Unix suitable for real-time applications was by adding some real-time capabilities over the basic kernel. These additionally implemented capabilities included real-time timer support, a real-time task scheduler built over the Unix scheduler, etc. However, these extensions do not address the fundamental problems with the Unix system that were pointed out in the last section; namely, non-preemptive kernel and dynamic priority levels. No wonder that superficial extensions to the capabilities of the Unix kernel without addressing the fundamental deficiencies of the Unix system would fall wide short of the requirements of hard real-time applications.
1.4.2. Host-Target Approach

Host-target operating systems are popularly being deployed in embedded applications. In this approach, the real- time application development is done on a host machine. The host machine is either a traditional Unix operating system or an Windows system. The real-time application is developed on the host and the developed application is downloaded onto a target board that is to be embedded in a real-time system. A ROM-resident small real-time kernel is used in the target board. This approach has schematically been shown in Fig. 31.5.
ICP/IP Host System Target Board Fig. 31.5 Schematic Representation of a Host-Target System The main idea behind this approach is that the real-time operating system running on the target board be kept as small and simple as possible. This implies that the operating system on the target board would lack virtual memory management support, neither does it support any utilities such as compilers, program editors, etc. The processor on the target board would run the real-time operating system. The host system must have the program development environment, including compilers, editors, library, cross-compilers, debuggers etc. These are memory demanding applications that require virtual memory support. The host is usually connected to the target using a serial port or a TCP/IP connection (see Fig. 31.5). The real-time program is developed on the host. It is then cross-compiled to generate code for the target processor. Subsequently, the executable module is downloaded to the target board. Tasks are executed on the target board and the execution is controlled at the host side using a symbolic cross-debugger. Once the program works successfully, it is fused on a ROM or flash memory and becomes ready to be deployed in applications. Commercial examples of host-target real-time operating systems include PSOS, VxWorks, and VRTX. We examine these commercial products in lesson 5. We would point out that these operating systems, due to their small size, limited functionality, and optimal design achieve much better performance figures than full-fledged operating systems. For example, the task preemption times of these systems are of the order of few microseconds compared to several hundreds of milliseconds for traditional Unix systems.
1.4.3. Preemption Point Approach

We have already pointed out that one of the major shortcomings of the traditional Unix V code is that during a system call, all interrupts are masked(disabled) for the entire duration of execution of the system call. This leads to unacceptable worst case task response time of the order of second, making Unix-based systems unacceptable for most hard real-time applications. An approach that has been taken by a few vendors to improve the real-time performance of non-preemptive kernels is the introduction of preemption points in system routines. Preemption points in the execution of a system routine are the instants at which the kernel data structures are consistent. At these points, the kernel can safely be preempted to make way for any waiting higher priority real-time tasks without corrupting any kernel data structures. In this approach, when the execution of a system call reaches a preemption point, the kernel checks to see if any higher priority tasks have become ready. If there is at least one, it preempts Version 2 EE IIT, Kharagpur 14
the processing of the kernel routine and dispatches the waiting highest priority task immediately. The worst-case preemption latency in this technique therefore becomes the longest time between two consecutive preemption points. As a result, the worst-case response times of tasks are now several folds lower than those for traditional operating systems without preemption points. This makes the preemption point-based operating systems suitable for use in many categories hard real-time applications, though still not suitable for applications requiring preemption latency of the order of a few micro seconds or less. Another advantage of this approach is that it involves only minor changes to be made to the kernel code. Many operating systems have taken the preemption point approach in the past, a prominent example being HP-UX.
1.4.4. Self-Host Systems

Unlike the host-target approach where application development is carried out on a separate host system machine running traditional Unix, in self-host systems a real-time application is developed on the same system on which the real-time application would finally run. Of course, while deploying the application, the operating system modules that are not essential during task execution are excluded during deployment to minimize the size of the operating system in the embedded application. Remember that in host-target approach, the target real-time operating system was a lean and efficient system that could only run the application but did not include program development facilities; program development was carried out on the host system. This made application development and debugging difficult and required cross-compiler and cross-debugger support. Self-host approach takes a different approach where the real-time application is developed on the full-fledged operating system, and once the application runs satisfactorily it is fused on the target board on a ROM or flash memory along with a stripped down version of the same operating system. Most of the self-host operating systems that are available now are based on micro-kernel architecture. Use of microkernel architecture for a self-host operating system entails several advantages. In microkernel architecture, only the core functionalities such as interrupt handling and process management are implemented as kernel routines. All other functionalities such as memory management, file management, device management, etc are implemented as add-on modules which operate in user mode. As a result, it becomes very easy to configure the operating system. Also, the micro kernel is lean and therefore becomes much more efficient. A monolithic operating system binds most drivers, file systems, and protocol stacks to the operating system kernel and all kernel processes share the same address space. Hence a single programming error in any of these components can cause a fatal kernel fault. In microkernel-based operating systems, these components run in separate memory-protected address spaces. So, system crashes on this count are very rare, and microkernel-based operating systems are very reliable. We had discussed earlier that any Unix-based system has to overcome the following two main shortcomings of the traditional Unix kernel in order to be useful in hard real-time applications: non-preemptive kernel and dynamic priority values. We now examine how these problems are overcome in self-host systems. Non-preemptive kernel: We had identified the genesis of the problem of non-preemptive Unix kernel in Sec.4.3.1. We had remarked that in order to preserve the integrity of the kernel data structures, all interrupts are disabled as long as a system call does not complete. This was
done from efficiency considerations and worked well for non-real-time and uniprocessor applications. Masking interrupts during kernel processing makes to even very small critical routines to have worst case response times of the order of a second. Further, this approach would not work in multiprocessor environments. In multiprocessor environments masking the interrupts for one processor does not help, as the tasks running on other processors can still corrupt the kernel data structure. It is now clear that in order to make the kernel preemptive, locks must be used at appropriate places in the kernel code. In fully preemptive Unix systems, normally two types of locks are used: kernel-level locks, and spin locks. T2 Busy wait Spin lock Critical Resource T1
Fig. 31.6 Operation of a Spin Lock A kernel-level lock is similar to a traditional lock. When a task waits for a kernel level lock to be released, it is blocked and undergoes a context switch. It becomes ready only after the required lock is released by the holding task and becomes available. This type of locks is inefficient when critical resources are required for short durations of the order of a few milliseconds or less. In some situations such context switching overheads are not acceptable. Consider that some task requires the lock for carrying out very small processing (possibly a single arithmetic operation) on some critical resource. Now, if a kernel level lock is used, another task requesting the lock at that time would be blocked and a context switch would be incurred, also the cache contents, pages of the task etc. may be swapped. Here a context switching time is comparable to the time for which a task needs a resource even greater than it. In such a situation, a spin lock would be appropriate. Now let us understand the operation of a spin lock. A spin lock has been schematically shown in Fig. 31.6. In Fig. 31.6, a critical resource is required by the tasks T1 and T2 for very short times (comparable to a context switching time). This resource is protected by a spin lock. The task T1 has acquired the spin lock guarding the resource. Meanwhile, the task T2 requests the resource. When task T2 cannot get access to the resource, it just busy waits (shown as a loop in the figure) and does not block and suffer context switch. T2 gets the resource as soon as T1 relinquishes the resource. Real-Time Priorities: Let us now examine how self-host systems address the problem of dynamic priority levels of the traditional Unix systems. In Unix based real-time operating systems, in addition to dynamic priorities, real-time and idle priorities are supported. Fig. 31.7 schematically shows the three available priority levels.
0 Real-time Priorities 127 Dynamic Priorities Idle Non-Migrating Priority
254 255
Fig. 31.7 Priority Changes in Self-host Unix Systems Idle(Non-Migrating): This is the lowest priority. The task that runs when there are no other tasks to run (idle), runs at this level. Idle priorities are static and are not recomputed periodically. Dynamic: Dynamic priorities are recomputed periodically to improve the average response time of soft real-time tasks. Dynamic re-computation of priorities ensures that I/O bound tasks migrate to higher priorities and CPU-bound tasks operate at lower priority levels. As shown in Fig. 31.7, dynamic priority levels are higher than the idle priority, but are lower than the real-time priorities. Real-Time: Real-time priorities are static priorities and are not recomputed. Hard real-time tasks operate at these levels. Tasks having real-time priorities operate at higher priorities than the tasks with dynamic priority levels.
1.5. Windows As A Real-Time Operating System

Microsofts Windows operating systems are extremely popular in desktop computers. Windows operating systems have evolved over the years last twenty five years from the naive DOS (Disk Operating System). Microsoft developed DOS in the early eighties. Microsoft kept on announcing new versions of DOS almost every year and kept on adding new features to DOS in the successive versions. DOS evolved to the Windows operating systems, whose main distinguishing feature was a graphical front-end. As several new versions of Windows kept on appearing by way of upgrades, the Windows code was completely rewritten in 1998 to develop the Windows NT system. Since the code was completely rewritten, Windows NT system was much more stable (does not crash) than the earlier DOS-based systems. The later versions of Microsofts operating systems were descendants of the Windows NT; the DOS-based systems were scrapped. Fig. 31.8 shows the genealogy of the various operating systems from the Microsoft stable. Because stability is a major requirement for hard real-time applications, we consider only the Windows NT and its descendants in our study and do not include the DOS line of products.
DOS Windows 3.1 Windows 95 New code Windows 98
Windows NT Windows 2000
Windows XP
Fig. 31.8 Genealogy of Operating Systems from Microsofts Stable An organization owning Windows NT systems might be interested to use it for its real-time applications on account of either cost saving or convenience. This is especially true in prototype application development and also when only a limited number of deployments are required. In the following, we critically analyze the suitability of Windows NT for real-time application development. First, we highlight some features of Windows NT that are very relevant and useful to a real-time application developer. In the subsequent subsection, we point out some of the lacuna of Windows NT when used in real-time application development.
1.5.1. Features of Windows NT

Windows NT has several features which are very desirable for real-time applications such as support for multithreading, real-time priority levels, and timer. Moreover, the clock resolutions are sufficiently fine for most real-time applications. Windows NT supports 32 priority levels (see Fig. 31.9). Each process belongs to one of the following priority classes: idle, normal, high, real-time. By default, the priority class at which an application runs is normal. Both normal and high are variable type where the priority is recomputed periodically. NT uses priority-driven preemptive scheduling and threads of real-time priorities have precedence over all other threads including kernel threads. Processes such as screen saver use priority class idle. NT lowers the priority of a task (belonging to variable type) if it used all of its last time slice. It raises the priority of a task if it blocked for I/O and could not use its last time slice in full. However, the change of a task from its base priority is restricted to 2.
Real-time critical Real-time 16-31 Real-time normal
31
Real-time idle 16 Dynamic-time critical 15 Dynamic 1-15 Dynamic normal Dynamic idle Idle 1 0
Fig. 31.9 Task Priorities in Windows NT
1.5.2. Shortcomings of Windows NT

In spite of the impressive support that Windows provides for real-time program development as discussed in Section 4.5.1, a programmer trying to use Windows in real-time system development has to cope up with several problems. Of these, the following two main problems are the most troublesome. 1. Interrupt Processing: Priority level of interrupts is always higher than that of the userlevel threads; including the threads of real-time class. When an interrupt occurs, the handler routine saves the machines state and makes the system execute an Interrupt Service Routine (ISR). Only critical processing is performed in ISR and the bulk of the processing is done as a Deferred Procedure Call(DPC). DPCs for various interrupts are queued in the DPC queue in a FIFO manner. While this separation of ISR and DPC has the advantage of providing quick response to further interrupts, it has the disadvantage of maintaining the all DPCs at the same priorities. A DPC can not be preempted by another DPC but by an interrupt. DPCs are executed in FIFO order at a priority lower than the hardware interrupt priorities but higher than the priority of the scheduler/dispatcher. Further, it is not possible for a user-level thread to execute at a priority higher than that of ISRs or DPCs. Therefore, even ISRs and DPCs corresponding to very low priority tasks can preempt real-time processes. Therefore, the potential blocking of real-time tasks due to DPCs can be large. For example, interrupts due to page faults generated by low priority tasks would get processed faster than real-time processes. Also, ISRs and DPCs generated due to key board and mouse interactions would operate at higher priority levels compared to real-time tasks. If there are processes doing network or disk I/O, the effect of system-wide FIFO queues may lead to unbounded response times for even real-time threads.
These problems have been avoided by Windows CE operating system through a priority inheritance mechanism. 2. Support for Resource Sharing Protocols: We had discussed in Chapter 3 that unless appropriate resource sharing protocols are used, tasks while accessing shared resources may suffer unbounded priority inversions leading to deadline misses and even system failure. Windows NT does not provide any support (such as priority inheritance, etc.) to support real-time tasks to share critical resource among themselves. This is a major shortcoming of Windows NT when used in real-time applications. Since most real-time applications do involve resource sharing among tasks we outline below the possible ways in which user-level functionalities can be added to the Windows NT system. The simplest approach to let real-time tasks share critical resources without unbounded priority inversions is as follows. As soon as a task is successful in locking a nonpreemptable resource, its priority can be raised to the highest priority (31). As soon as a task releases the required resource, its priority is restored. However, we know that this arrangement would lead to large inheritance-related inversions. Another possibility is to implement the priority ceiling protocol (PCP). To implement this protocol, we need to restrict the real-time tasks to have even priorities (i.e. 16, 18, ..., 30). The reason for this restriction is that NT does not support FIFO scheduling among equal priority tasks. If the highest priority among all tasks needing a resource is 2n, then the ceiling priority of the resource is 2n+1. In Unix, FIFO option among equal priority tasks is available; therefore all available priority levels can be used.
1.6. Windows vs Unix

Table 31.1 Windows NT versus Unix Real-Time Feature DPCs Real-Time priorities Locking virtual memory Timer precision Asynchronous I/O Windows NT Yes Yes Yes 1 msec Yes Unix V No No Yes 10 msec No
Though Windows NT has many of the features desired of a real-time operating system, its implementation of DPCs together its lack of protocol support for resource sharing among equal priority tasks makes it unsuitable for use in safety-critical real-time applications. A comparison of the extent to which some of the basic features required for real-time programming are provided by Windows NT and Unix V is indicated in Table 1. With careful programming, Windows NT may be useful for applications that can tolerate occasional deadline misses, and have deadlines of the order of hundreds of milliseconds than microseconds. Of course, to be used in such applications, the processor utilization must be kept sufficiently low and priority inversion control must be provided at the user level.
1.7. Exercises
1. State whether the following assertions are True or False. Justify your answer in each case. a. When RMA is used for scheduling a set of hard real-time periodic tasks, the upper bound on achievable utilization improves as the number in tasks in the system being developed increases. b. Under the Unix operating system, computation intensive tasks dynamically gravitate towards higher priorities. c. Normally, task switching time is larger than task preemption time. d. Suppose a real-time operating system does not support memory protection, then a procedure call and a system call are indistinguishable in that system. e. Watchdog timers are typically used to start certain tasks at regular intervals. f. For the memory of same size under segmented and virtual addressing schemes, the segmented addressing scheme would in general incur lower memory access jitter compared to the virtual addressing scheme. Even though clock frequency of modern processors is of the order of several GHz, why do many modern real-time operating systems not support nanosecond or even microsecond resolution clocks? Is it possible for an operating system to support nanosecond resolution clocks in operating systems at present? Explain how this can be achieved. Give an example of a real-time application for which a simple segmented memory management support by the RTOS is preferred and another example of an application for which virtual memory management support is essential. Justify your choices. Is it possible to meet the service requirements of hard real-time applications by writing additional layers over the Unix System V kernel? If your answer is no, explain the reason. If your answer is yes, explain what additional features you would implement in the external layer of Unix System V kernel for supporting hard real-time applications. Briefly indicate how Unix dynamically recomputes task priority values. Why is such recomputation of task priorities required? What are the implications of such priority recomputations on real-time application development? Why is Unix V non-preemptive in kernel mode? How do fully preemptive kernels based on Unix (e.g. Linux) overcome this problem? Briefly describe an experimental set up that can be used to determine the preemptability of different operating systems by high-priority real-time tasks when a low priority task has made a system call. Explain how interrupts are handled in Windows NT. Explain how the interrupt processing scheme of Windows NT makes it unsuitable for hard real-time applications. How has this problem been overcome in WinCE? Would you recommend Unix System V to be used for a few real-time tasks for running a data acquisition application? Assume that the computation time for these tasks is of the order of few hundreds of milliseconds and the deadline of these tasks is of the order of several tens of seconds. Justify your answer. Explain the problems that you would encounter if you try to develop and run a hard realtime system on the Windows NT operating system. Briefly explain why the traditional Unix kernel is not suitable to be used in a multiprocessor environments. Define a spin lock and a kernel-level lock and explain their use in realizing a preemptive kernel.
2.
3.
4.
5.
6.
7.
8.
9. 10.
11. 12.
13.
What do you understand by a microkernel-based operating system? Explain the advantages of a microkernel- based real-time operating system over a monolithic operating system. What is the difference between a self-host and a host-target based embedded operating system? Give at least one example of a commercial operating system from each category. What problems would a real-time application developer might face while using RT-Linux for developing hard real-time applications? What are the important features required in a real-time operating system? Analyze to what extent these features are provided by Windows NT and Unix V.
Module 6
Lesson 32
Commercial Real-Time Operating Systems

At the end of this lesson, the student would be able to: Get an understanding of open software Know the historical background under which POSIX was developed Get an overview of POSIX Understand the Real-Time POSIX standard Get an insight into the features of some of the popular Real-Time OS: PSOS, VRTX, VxWorks, QNX, C/OS-II, RT-Linux, Lynx, Windows CE
1. Introduction
Many real-time operating systems are at present available commercially. In this lesson, we analyze some of the popular real-time operating systems and investigate why these popular systems cannot be used across all applications. We also examine the POSIX standards for RTOS and their implications.
1.1. POSIX
POSIX stands for Portable Operating System Interface. X has been suffixed to the abbreviation to make it sound Unix-like. Over the last decade, POSIX has become an important standard in the operating systems area including real-time operating systems. The importance of POSIX can be gauzed from the fact that nowadays it has become uncommon to come across a commercial operating system that is not POSIX-compliant. POSIX started as an open software initiative. Since POSIX has now become overwhelmingly popular, we discuss the POSIX requirements on real-time operating systems. We start with a brief introduction to open software movement and then trace the historical events that have led to the emergence of POSIX. Subsequently, we highlight the important requirements of real-time POSIX.
1.2. Open Software

An open system is a vendor neutral environment, which allows users to intermix hardware, software, and networking solutions from different vendors. Open systems are based on open standards and are not copyrighted, saving users from expensive intellectual property right (IPR) law suits. The most important characteristics of open systems are: interoperability and portability. Interoperability means systems from multiple vendors can exchange information among each other. A system is portable if it can be moved from one environment to another without modifications. As part of the open system initiative, open software movement has become popular. Advantages of open software include the following: It reduces cost of development and time to market a product. It helps increase the availability of add-on software packages. It enhances the ease of programming. It facilitates easy integration of separately developed modules. POSIX is an off-shoot of the open software movement. Open Software standards can be divided into three categories: Version 2 EE IIT, Kharagpur 3
Open Source: Provides portability at the source code level. To run an application on a new platform would require only compilation and linking. ANSI and POSIX are important open source standards. Open Object: This standard provides portability of unlinked object modules across different platforms. To run an application in a new environment, relinking of the object modules would be required. Open Binary: This standard provides complete software portability across hardware platforms based on a common binary language structure. An open binary product can be portable at the executable code level. At the moment, no open binary standards. The main goal of POSIX is application portability at the source code level. Before we discuss about RT-POSIX, let us explore the historical background under which POSIX was developed.
1.3. Genesis of POSIX

Before we discuss the different features of the POSIX standard in the next subsection, let us understand the historical developments that led to the development of POSIX. Unix was originally developed by AT&T Bell Labs. Since AT&T was primarily a telecommunication company, it felt that Unix was not commercially important for it. Therefore, it distributed Unix source code free of cost to several universities. UCB (University of California at Berkeley) was one of the earliest recipient of Unix source code. AT&T later got interested in computers, realized the potential of Unix and started developing Unix further and came up with Unix V. Meanwhile, UCB had incorporated TCP/IP into Unix through a large DARPA (Defense Advanced Research Project Agency of USA) project and had come up with BSD 4.3 and C Shell. With this, the commercial importance of Unix started to grow rapidly. As a result, many vendors implemented and extended Unix services in different ways: IBM with its AIX, HP with its HP-UX, Sun with its Solaris, Digital with its Ultrix, and SCO with SCO-Unix. Since there were so many variants of Unix, portability of applications across Unix platforms became a problem. It resulted in a situation where a program written on one Unix platform would not run on another platform. The need for a standard Unix was recognized by all. The first effort towards standardization of Unix was taken by AT&T in the form of its SVID (System V Interface Definition). However, BSD and other vendors ignored this initiative. The next initiative was taken under ANSI/IEEE, which yielded POSIX.
1.4. Overview of POSIX

POSIX is an off-shoot of the open software movement, and portability of applications across different variants of Unix operating systems was the major concern. POSIX standard defines only interfaces to operating system services and the semantics of these services, but does not specify how exactly the services are to be implemented. For example, the standard does not specify whether an operating system kernel must be single threaded or multithreaded or at what priority level the kernel services are to be executed. The POSIX standard has several parts. The important parts of POSIX and the aspects that they deal with, are the following:
Open Source: Provides portability at the source code level. To run an application on a new platform would require only compilation and linking. ANSI and POSIX are important open source standards.

POSIX 1 : POSIX 2 : POSIX 3 : POSIX 4 :
system interfaces and system call parameters shells and utilities test methods for verifying conformance to POSIX real-time extensions
1.5. Real-Time POSIX Standard

POSIX.4 deals with real-time extensions to POSIX and is also known as POSIX-RT. For an operating system to be POSIX-RT compliant, it must meet the different requirements specified in the POSIX-RT standard. The main requirements of the POSIX-RT are the following:

Execution scheduling: An operating system to be POSIX-RT compliant must provide support for real-time (static) priorities. Performance requirements on system calls: It specifies the worst case execution times required for most real-time operating services. Priority levels: The number of priority levels supported should be at least 32. Timers: Periodic and one shot timers (also called watch dog timer) should be supported. The system clock is called CLOCK REALTIME when the system supports real-time POSIX. Real-time files: Real-time file system should be supported. A real-time file system can pre-allocate storage for files and should be able to store file blocks contiguously on the disk. This enables to have predictable delay in file access in virtual memory system. Memory locking: Memory locking should be supported. POSIX-RT defines the operating system services: mlockall() to lock all pages of a process, mlock() to lock a range of pages, and mlockpage() to lock only the current page. The unlock services are munlockall(), munlock(), and munlockpage. Memory locking services have been introduced to support deterministic memory access. Multithreading support: Real-time threading support is mandated. Real-time threads are schedulable entities of a real-time application that have individual timeliness constraints and may have collective timeliness constraints when belonging to a runnable set of threads.
1.6. A Survey of Contemporary Real-Time Operating Systems

In this section, we briefly survey the important feature of some of the popular real-time operating systems that are being used in commercial applications.
1.6.1. PSOS
PSOS is a popular real-time operating system that is being primarily used in embedded applications. It is available from Wind River Systems, a large player in the real-time operating system arena. It is a host-target type of real- time operating system. PSOS is being used in Version 2 EE IIT, Kharagpur 5
several commercial embedded products. An example application of PSOS is in the base stations of the cellular systems.
Legend: XRAY+: Source level Debgguer PROBE: Target Debgger Editor Crosscompiler XRAY+ Libraries Host Computer TCP/IP
Application P N A + PSOS+ PHILE
PROBE
Target
Fig. 32.1 PSOS-based Development of Embedded Software PSOS-based application development has schematically been shown in Fig. 32.1. The host computer is typically a desktop. Both Unix and Windows hosts are supported. The target board contains the embedded processor, ROM, RAM, etc. The host computer runs the editor, crosscompiler, source-level debugger, and library routines. On the target board PSOS+, and other optional modules such as PNA+, PHILE, and PROBE are installed on a RAM. PNA+ is the network manager. It provides TCP/IP communication over Ethernet and FDDI. It conforms to Unix 4.3 (BSD) socket syntax and is compatible with other TCP/IP-based networking standards such as ftp and NFS. Using these, PNA+ provides efficient downloading and debugging communication between the target and the host. PROBE+ is the target debugger and XRAY+ is the source-level debugger. The application development is done on the host machine and is downloaded to the target board. The application is debugged using the source debugger (XRAY+). Once the application runs satisfactorily, it is fused on a ROM and installed on the target board. We now highlight some important features of PSOS. PSOS consists of 32 priority levels. In the minimal configuration, the foot print of the operating system is only 12KBytes. For sharing critical resources among real-time tasks, it supports priority inheritance and priority ceiling protocols. It support segmented memory management. It allocates tasks to memory regions. A memory region is a physically contiguous block of memory. A memory region is created by the operating system in response to a call from an application. In most modern operating systems, the control jumps to the kernel when an interrupt occurs. PSOS takes a different approach. The device drivers are outside the kernel and can be loaded and removed at the run time. When an interrupt occurs, the processor jumps directly to the ISR (interrupt service routine) pointed to by the vector table. The intention is not only to gain speed, but also to give the application developer complete control over interrupt handling.
1.6.2. VRTX
VRTX is a POSIX-RT compliant operating system from Mentor Graphics. VRTX has been certified by the US FAA (Federal Aviation Agency) for use in mission and life critical applications such as avionics. VRTX has two multitasking kernels: VRTXsa and VRTXmc. VRTXsa is used for large and medium applications. It supports virtual memory. It has a POSIX-compliant library and supports priority inheritance. Its system calls are deterministic and fully preemptable. VRTXmc is optimized for power consumption and ROM and RAM sizes. It has therefore a very small foot print. The kernel typically requires only 4 to 8 Kbytes of ROM and 1KBytes of RAM. It does not support virtual memory. This version is targeted for cell phones and other small hand-held devices.
1.6.3. VxWorks
VxWorks is a product from Wind River Systems. It is host-target system. The host can be either a Windows or a Unix machine. It supports most POSIX-RT functionalities. VxWorks comes with an integrated development environment (IDE) called Tornado. In addition to the standard support for program development tools such as editor, cross-compiler, cross-debugger, etc. Tornado contains VxSim and WindView. VxSim simulates a VxWorks target for use as a prototyping and testing environment. WindView provides debugging tools for the simulator environment. VxMP is the multiprocessor version of VxWorks. VxWorks was deployed in the Mars Pathfinder which was sent to Mars in 1997. Pathfinder landed in Mars, responded to ground commands, and started to send science and engineering data. However, there was a hitch: it repeatedly reset itself. Remotely using trace generation, logging, and debugging tools of VxWorks, it was found that the cause was unbounded priority inversion. The unbounded priority inversion caused real-time tasks to miss their deadlines, and as a result, the exception handler reset the system each time. Although VxWorks supports priority inheritance, using the remote debugging tool, it was found to have been disabled in the configuration file. The problem was fixed by enabling it.
1.6.4. QNX
QNX is a product from QNX Software System Ltd. QNX Neutrino offers POSIX-compliant APIs and is implemented using microkernel architecture. The microkernel architecture of QNX is shown in Fig. 32.2. Because of the fine grained scalability of the microkernel architecture, it can be configured to a very small size a critical advantage in high volume devices, where even a 1% reduction in memory costs can return millions of dollars in profit.
File System Micro Kernel Application
Device Driver Message Passing
TCP/IP Manager
Fig. 32.2 Microkernel Architecture of QNX
1.6.5. C/OS-II
C/OS-II is a free RTOS, easily available on Internet. It is written in ANSI C and contains small portion of assembly code. The assembly language portion has been kept to a minimum to make it easy to port it to different processors. To date, C/OS-II has been ported to over 100 different processor architectures ranging from 8-bit to 64-bit microprocessors, microcontrollers, and DSPs. Some important features of C/OS-II are highlighted in the following.
C/OS-II was designed so that the programmer can use just a few of the offered services or select the entire range of services. This allows the programmer to minimize the amount of memory needed by C/OS-II on a per-product basis. C/OS-II has a fully preemptive kernel. This means that C/OS-II always ensures that the highest priority task that is ready would be taken up for execution. C/OS-II allows up to 64 tasks to be created. Each task operates at a unique priority level. There are 64 priority levels. This means that round-robin scheduling is not supported. The priority levels are used as the PID (Process Identifier) for the tasks. C/OS-II uses a partitioned memory management. Each memory partition consists of several fixed sized blocks. A task obtains memory blocks from the memory partition and the task must create a memory partition before it can be used. Allocation and deallocation of fixed-sized memory blocks is done in constant time and is deterministic. A task can create and use multiple memory partitions, so that it can use memory blocks of different sizes. C/OS-II has been certified by Federal Aviation Administration (FAA) for use in commercial aircraft by meeting the demanding requirements of its standard for software used in avionics. To meet the requirements of this standard it was demonstrated through documentation and testing that it is robust and safe.
1.6.6. RT Linux
Linux is by large a free operating system. It is robust, feature rich, and efficient. Several realtime implementations of Linux (RT-Linux) are available. It is a self-host operating system (see Fig. 32.3). RT-Linux runs along with a Linux system. The real-time kernel sits between the hardware and the Linux system. The RT kernel intercepts all interrupts generated by the hardware. Fig. 32.12 schematically shows this aspect. If an interrupt is to cause a real-time task
to run, the real-time kernel preempts Linux, if Linux is running at that time, and lets the real-time task run. Thus, in effect Linux runs as a task of RT-Linux. Linux RT Linux
Hardware
Fig. 32.3 Structure of RT Linux The real-time applications are written as loadable kernel modules. In essence, real-time applications run in the kernel space. In the approach taken by RT Linux, there are effectively two independent kernels: real-time kernel and Linux kernel. Therefore, this approach is also known as the dual kernel approach as the real-time kernel is implemented outside the Linux kernel. Any task that requires deterministic scheduling is run as a real-time task. These tasks preempt Linux whenever they need to execute and yield the CPU to Linux only when no real-time task is ready to run. Compared to the microkernel approach, the following are the shortcomings of the dual-kernel approach.
Duplicated Coding Efforts: Tasks running in the real-time kernel can not make full use of the Linux system services file systems, networking, and so on. In fact, if a real-time task invokes a Linux service, it will be subject to the same preemption problems that prohibit Linux processes from behaving deterministically. As a result, new drivers and system services must be created specifically for the real-time kernel even when equivalent services already exist for Linux. Fragile Execution Environment: Tasks running in the real-time kernel do not benefit from the MMU-protected environment that Linux provides to the regular non-real-time processes. Instead, they run unprotected in the kernel space. Consequently, any real-time task that contains a coding error such as a corrupt C pointer can easily cause a fatal kernel fault. This is serious problem since many embedded applications are safety-critical in nature. Limited Portability: In the dual kernel approach, the real-time tasks are not Linux processes at all; but programs written using a small subset of POSIX APIs. To aggravate the matter, different implementations of dual kernels use different APIs. As a result, realtime programs written using one vendors RT-Linux version may not run on anothers. Programming Difficulty: RT-Linux kernels support only a limited subset of POSIX APIs. Therefore, application development takes more effort and time.
1.6.7. Lynx
Lynx is a self host system. The currently available version of Lynx (Lynx 3.0) is a microkernel-based real-time operating system, though the earlier versions were based on monolithic design. Lynx is fully compatible with Linux. With Lynxs binary compatibility, a Linux programs binary image can be run directly on Lynx. On the other hand, for other Linux compatible operating systems such as QNX, Linux applications need to be recompiled in order to run on them. The Lynx microkernel is 28KBytes in size and provides the essential services in scheduling, interrupt dispatch, and synchronization. The other services are provided as kernel plug-ins (KPIs). By adding KPIs to the microkernel, the system can be configured to support I/O, file systems, sockets, and so on. With full configuration, it can function as a multipurpose Unix machine on which both hard and soft real-time tasks can run. Unlike many embedded real-time operating systems, Lynx supports memory protection.
1.6.8. Windows CE
Windows CE is a stripped down version of Windows, and has a minimum footprint of 400KBytes only. It provides 256 priority levels. To optimize performance, all threads are run in the kernel mode. The timer accuracy is 1 msec for sleep and wait related APIs. The different functionalities of the kernel are broken down into small non-preemptive sections. So, during system call preemption is turned off for only short periods of time. Also, interrupt servicing is preemptable. That is, it supports nested interrupts. It uses memory management unit (MMU) for virtual memory management. Windows CE uses a priority inheritance scheme to avoid priority inversion problem present in Windows NT. Normally, the kernel thread handling the page fault (i.e. DPC) runs at priority level higher than NORMAL (refer Sec. 4.5.2). When a thread with priority level NORMAL suffers a page fault, the priority of the corresponding kernel thread handling this page fault is raised to the priority of the thread causing the page fault. This ensures that a thread is not blocked by another lower priority thread even when it suffers a page fault.
1.6.9. Exercises
1. State whether the following statements are True or False. Justify your answer in each case. a. In real-time Linux (RT-Linux), real-time processes are scheduled at priorities higher than the kernel processes. b. EDF scheduling of tasks is commonly supported in commercial real-time operating systems such as PSOS and VRTX. c. POSIX 1003.4 (real-time standard) requires that real-time processes be scheduled at priorities higher than kernel processes. d. POSIX is an attempt by ANSI/IEEE to enable executable files to be portable across different Unix machines. What is the difference between block I/O and character I/O? Give examples of each. Which type of I/O is accorded higher priority by Unix? Why? List four important features that a POSIX 1003.4 (Real-Time standard) compliant operating system must support. Is preemptability of kernel processes required by POSIX 1003.4? Can a Unix-based operating system using the preemption-point technique claim to be POSIX 1003.4 compliant? Explain your answers. Version 2 EE IIT, Kharagpur 10
2. 3.
4.
5. 6. 7.
8. 9. 10.
Suppose you are the manufacturer of small embedded components used mainly in consumer electronics goods such as automobiles, MP3 players, and computer-based toys. Would you prefer to use PSOS, WinCE, or RT-Linux in your embedded component? Explain the reasons behind your answer. What is the difference between a system call and a function call? What problems, if any, might arise if the system calls are invoked as procedure calls? Explain how a real-time operating system differs from a traditional operating system. Name a few real-time operating systems that are commercially available. What is open software? Does an open software mandate portability of the executable files across different platforms? Name an open software standard for real-time operating systems. What is the advantage of using an open software operating system for real-time application development? What are the pros and cons of using an open software product in program development compared to a proprietary product? Identify at least four important advantages of using VxWorks as the operating system for real-time applications compared to using Unix V.3. What is an open source standard? How is it different from open object and open binary standards? Give some examples of popular open source software products. Can multithreading result in faster response times (compared to single threaded tasks) even in uniprocessor systems? Explain your answer and identify the reasons to support your answer.
References (Lessons 24 - 28)

1. 2. 3. 4. 5. 6. C.M. Krishna and Shin K.G., Real-Time Systems, Tata McGraw-Hill, 1999. Philip A. Laplante, Real-Time System Design and Analysis, Prentice Hall of India, 1996. Jane W.S. Liu, Real-Time Systems, Pearson Press, 2000. Alan C. Shaw, Real-Time Systems and Software, John Wiley and Sons, 2001. C. SivaRam Murthy and G. Manimaran, Resource Management in Real-Time Systems and Networks, MIT Press, 2001. B. Dasarathy, Timing Constraints of Real-Time Systems: Constructs for Expressing Them, Methods for Validating Them, IEEE Transactions on Software Engineering, January 1985, Vol. 11, No. 1, pages 80-86. Lui Sha, Ragunathan Rajkumar, John P. Lehoczky, Priority inheritance protocols: An approach to real-time synchronization,, IEEE Transactions on Computers, 1990, Vol. 39, pages 1175-1185.
7.
Module 7
Software Engineering Issues
Lesson 33
Introduction to Software Engineering

At the end of this lesson, the student would be able to: Get an introduction to software engineering Understand the need for software engineering principles Identify the causes of and solutions for software crisis Differentiate a piece of program from a software product Understand the evolution of software design techniques over last 50 years Identify the features of a structured program and its advantages Identify the features of various design techniques Differentiate between the exploratory style and modern styles of software development Explain what a life cycle model is Understand the need for a software life cycle model Identify the different phases of the classical waterfall model and related activities Identify the phase-entry and phase-exit criteria of each phase Explain what a prototype is Explain the need for prototype development State the activities carried out during each phase of a spiral model
1. Introduction
With the advancement of technology, computers have become more powerful and sophisticated. The more powerful a computer is, the more sophisticated programs it can run. Thus, programmers have been tasked to solve larger and more complex problems. They have coped with this challenge by innovating and by building on their past programming experience. All those past innovations and experience of writing good quality programs in efficient and costeffective ways have been systematically organized into a body of knowledge. This body of knowledge forms the basis of software engineering principles. Thus, we can view software engineering as a systematic collection of past experience. The experience is arranged in the form of methodologies and guidelines.
1.1. The Need for Software Engineering

Alternatively, software engineering can be viewed an engineering approach to software development. A small program can be written without using software engineering principles. But if one wants to develop a large software product, then software engineering principles are indispensable to achieve a good quality software cost effectively. These definitions can be elaborated with the help of a building construction analogy. Version 2 EE IIT, Kharagpur 3
Suppose you have a friend who asked you to build a small wall as shown in fig. 33.1. You would be able to do that using your common sense. You will get building materials like bricks; cement etc. and you will then build the wall.
Fig. 33.1 A Small Wall But what would happen if the same friend asked you to build a large multistoried building as shown in fig. 33.2?
Fig. 33.2 A Multistoried Building You don't have a very good idea about building such a huge complex. It would be very difficult to extend your idea about a small wall construction into constructing a large building. Even if you tried to build a large building, it would collapse because you would not have the requisite knowledge about the strength of materials, testing, planning, architectural design, etc. Building a small wall and building a large building are entirely different ball games. You can use your intuition and still be successful in building a small wall, but building a large building requires knowledge of civil, architectural and other engineering principles. Without using software engineering principles it would be difficult to develop large programs. In industry it is usually needed to develop large programs to accommodate multiple functions. A problem with developing such large commercial programs is that the complexity and difficulty levels of the programs increase exponentially with their sizes as shown in fig. 33.3. For example, a program of size 1,000 lines of code has some complexity. But a program with 10,000 LOC is not just 10 times more difficult to develop, but may as well turn out to be 100 times more difficult unless software engineering principles are used. In such situations software engineering techniques come to the rescue. Software engineering helps to reduce programming complexity. Software engineering principles use two important techniques to reduce problem complexity: abstraction and decomposition. The principle of abstraction (in fig. 33.4) implies that a problem can be simplified by omitting irrelevant details. Once the simpler problem is solved then the omitted details can be taken into consideration to solve the next lower level abstraction, and so on.
Fig. 33.3 Increase in development time and effort with problem size
1.1.1. Abstraction and Decomposition

3rd abstraction
The other approach to tackle problem complexity is decomposition. In this technique, a complex problem is divided into several smaller problems and then the smaller problems are solved one by one. However, in this technique any random decomposition of a problem into smaller parts will not help. The problem has to be decomposed such that each component of the decomposed problem can be solved independently and then the solution of the different Version 2 EE IIT, Kharagpur 5
Complexity, Efforts and Time taken to develop Size 3rd abstraction 1st abstraction Full problem Fig. 33.4 A hierarchy of abstraction
components can be combined to get the full solution. A good decomposition of a problem as shown in fig. 33.5 should minimize interactions among various components. If the different subcomponents are interrelated, then the different components cannot be solved separately and the desired reduction in complexity will not be realized.
Fig. 33.5 Decomposition of a large problem into a set of smaller problems
1.2. The Software Crisis

Software engineering appears to be among the few options available to tackle the present software crisis. To explain the present software crisis in simple words, consider the following. The expenses that organizations all around the world are incurring on software purchases compared to those on hardware purchases have been showing a worrying trend over the years (as shown in fig. 33.6)
Hardware cost / Software cost 1960 Year 2002 Fig. 33.6 Change in the relative cost of hardware and software over time Version 2 EE IIT, Kharagpur 6
Organizations are spending larger and larger portions of their budget on software. Not only are the software products turning out to be more expensive than hardware, but they also present a host of other problems to the customers: software products are difficult to alter, debug, and enhance; use resources non-optimally; often fail to meet the user requirements; are far from being reliable; frequently crash; and are often delivered late. Among these, the trend of increasing software costs is probably the most important symptom of the present software crisis. Remember that the cost we are talking of here is not on account of increased features, but due to ineffective development of the product characterized by inefficient resource usage, and time and cost over-runs. There are many factors that have contributed to the making of the present software crisis. Factors are larger problem sizes, lack of adequate training in software engineering, increasing skill shortage, and low productivity improvements. It is believed that the only satisfactory solution to the present software crisis can possibly come from a spread of software engineering practices among the engineers, coupled with further advancements to the software engineering discipline itself.
1.3. Program vs. Software Product

Programs are developed by individuals for their personal use. They are therefore, small in size and have limited functionality but software products are extremely large. In case of a program, the programmer himself is the sole user but on the other hand, in case of a software product, most users are not involved with the development. In case of a program, a single developer is involved but in case of a software product, a large number of developers are involved. For a program, the user interface may not be very important, because the programmer is the sole user. On the other hand, for a software product, user interface must be carefully designed and implemented because developers of that product and users of that product are totally different. In case of a program, very little documentation is expected, but a software product must be well documented. A program can be developed according to the programmers individual style of development, but a software product must be developed using the accepted software engineering principles.
2. Evolution of Program Design Techniques

During the 1950s, most programs were being written in assembly language. These programs were limited to about a few hundreds of lines of assembly code, i.e. were very small in size. Every programmer developed programs in his own individual style - based on his intuition. This type of programming was called Exploratory Programming. The next significant development which occurred during early 1960s in the area computer programming was the high-level language programming. Use of high-level language programming reduced development efforts and development time significantly. Languages like FORTRAN, ALGOL, and COBOL were introduced at that time.
2.1. Structured Programming

As the size and complexity of programs kept on increasing, the exploratory programming style proved to be insufficient. Programmers found it increasingly difficult not only to write costeffective and correct programs, but also to understand and maintain programs written by others. To cope with this problem, experienced programmers advised other programmers to pay particular attention to the design of the programs control flow structure (in late 1960s). In the late 1960s, it was found that the "GOTO" statement was the main culprit which makes control structure of a program complicated and messy. At that time most of the programmers used assembly languages extensively. They considered use of "GOTO" statements in high-level languages were very natural because of their familiarity with JUMP statements which are very frequently used in assembly language programming. So they did not really accept that they can write programs without using GOTO statements, and considered the frequent use of GOTO statements inevitable. At this time, Dijkstra [1968] published his (now famous) article GOTO Statements Considered Harmful. Expectedly, many programmers were enraged to read this article. They published several counter articles highlighting the advantages and inevitable use of GOTO statements. But, soon it was conclusively proved that only three programming constructs sequence, selection, and iteration were sufficient to express any programming logic. This formed the basis of the structured programming methodology.
2.1.1. Features of Structured Programming

A structured program uses three types of program constructs i.e. selection, sequence and iteration. Structured programs avoid unstructured control flows by restricting the use of GOTO statements. A structured program consists of a well partitioned set of modules. Structured programming uses single entry, single-exit program constructs such as if-then-else, do-while, etc. Thus, the structured programming principle emphasizes designing neat control structures for programs.
2.1.2. Advantages of Structured Programming

Structured programs are easier to read and understand. Structured programs are easier to maintain. They require less effort and time for development. They are amenable to easier debugging and usually fewer errors are made in the course of writing such programs.
2.2. Data Structure-Oriented Design

After structured programming, the next important development was data structure-oriented design. Programmers argued that for writing a good program, it is important to pay more attention to the design of data structure, of the program rather than to the design of its control structure. Data structure-oriented design techniques actually help to derive program structure from the data structure of the program. Example of a very popular data structure-oriented design technique is Jackson's Structured Programming (JSP) methodology, developed by Michael Jackson in the1970s.
2.3. Data Flow-Oriented Design

Next significant development in the late 1970s was the development of data flow-oriented design technique. Experienced programmers stated that to have a good program structure, one has to study how the data flows from input to the output of the program. Every program reads data and then processes that data to produce some output. Once the data flow structure is identified, then from there one can derive the program structure.
2.4. Object-Oriented Design

Object-oriented design (1980s) is the latest and very widely used technique. It has an intuitively appealing design approach in which natural objects (such as employees, pay-roll register, etc.) occurring in a problem are first identified. Relationships among objects (such as composition, reference and inheritance) are determined. Each object essentially acts as a data hiding entity.
2.5. Changes in Software Development Practices

An important difference is that the exploratory software development style is based on error correction while the software engineering principles are primarily based on error prevention. Inherent in the software engineering principles is the realization that it is much more costeffective to prevent errors from occurring than to correct them as and when they are detected. Even when errors occur, software engineering principles emphasize detection of errors as close to the point where the errors are committed as possible. In the exploratory style, errors are detected only during the final product testing. In contrast, the modern practice of software development is to develop the software through several well-defined stages such as requirements specification, design, coding, testing, etc., and attempts are made to detect and fix as many errors as possible in the same phase in which they occur. In the exploratory style, coding was considered synonymous with software development. For instance, exploratory programming style believed in developing a working system as quickly as possible and then successively modifying it until it performed satisfactorily. In the modern software development style, coding is regarded as only a small part of the overall software development activities. There are several development activities such as design and testing which typically require much more effort than coding. A lot of attention is being paid to requirements specification. Significant effort is now being devoted to develop a clear specification of the problem before any development activity is started. Now, there is a distinct design phase where standard design techniques are employed. Periodic reviews are being carried out during all stages of the development process. The main objective of carrying out reviews is phase containment of errors, i.e. detect and correct errors as soon as possible. Defects are usually not detected as soon as they occur, rather they are noticed much later in the life cycle. Once a defect is detected, we have to go back to the phase
where it was introduced and rework those phases - possibly change the design or change the code and so on. Today, software testing has become very systematic and standard testing techniques are available. Testing activity has also become all encompassing in the sense that test cases are being developed right from the requirements specification stage. There is better visibility of design and code. By visibility we mean production of good quality, consistent and standard documents during every phase. In the past, very little attention was paid to producing good quality and consistent documents. In the exploratory style, the design and test activities, even if carried out (in whatever way), were not documented satisfactorily. Today, consciously good quality documents are being developed during product development. This has made fault diagnosis and maintenance smoother. Now, projects are first thoroughly planned. Project planning normally includes preparation of various types of estimates, resource scheduling, and development of project tracking plans. Several techniques and tools for tasks such as configuration management, cost estimation, scheduling, etc. are used for effective software project management. Several metrics are being used to help in software project management and software quality assurance.
3. Software Life Cycle Model

A software life cycle model (also called process model) is a descriptive and diagrammatic representation of the software life cycle. A life cycle model represents all the activities required to make a software product transit through its life cycle phases. It also captures the order in which these activities are to be undertaken. In other words, a life cycle model maps the different activities performed on a software product from its inception to its retirement. Different life cycle models may map the basic development activities to phases in different ways. Thus, no matter which life cycle model is followed, the basic activities are included in all life cycle models though the activities may be carried out in different orders in different life cycle models. During any life cycle phase, more than one activity may also be carried out. For example, the design phase might consist of the structured analysis activity followed by the structured design activity.
3.1. The Need for a Life Cycle Model

The development team must identify a suitable life cycle model for the particular project and then adhere to it. Without using a particular life cycle model, the development of a software product would not be in a systematic and disciplined manner. When a software product is being developed by a team there must be a clear understanding among team members about when and what to do. Otherwise it would lead to chaos and project failure. Let us try to illustrate this problem using an example. Suppose a software development problem is divided into several parts and the parts are assigned to the team members. From then on, suppose the team members are allowed the freedom to develop the parts assigned to them in whatever way they like. It is possible that one member might start writing the code for his part, another might decide to
prepare the test documents first, and some other engineer might begin with the design phase of the parts assigned to him. This would be one of the perfect recipes for project failure. A software life cycle model defines entry and exit criteria for every phase. A phase can start only if its phase-entry criteria have been satisfied. So without a software life cycle model, the entry and exit criteria for a phase cannot be recognized. Without models (such as classical waterfall model, iterative waterfall model, prototyping model, evolutionary model, spiral model etc.), it becomes difficult for software project managers to monitor the progress of the project. Many life cycle models have been proposed so far. Each of them has some advantages as well as some disadvantages. A few important and commonly used life cycle models are as follows:
Classical Waterfall Model Iterative Waterfall Model Prototyping Model Evolutionary Model Spiral Model
3.2. Classical Waterfall Model

The classical waterfall model is intuitively the most obvious way to develop software. Though the classical waterfall model is elegant and intuitively obvious, we will see that it is not a practical model in the sense that it can not be used in actual software development projects. Thus, we can consider this model to be a theoretical way of developing software. But all other life cycle models are essentially derived from the classical waterfall model. So, in order to be able to appreciate other life cycle models, we must first learn the classical waterfall model. Classical waterfall model divides the life cycle into the following phases as shown in fig. 33.7:
Feasibility study Requirements analysis and specification Design Coding and unit testing Integration and system testing Maintenance
Feasibility Study Requirement analysis and specification Design Coding Testing Maintenance Fig. 33.7 Classical Waterfall Model
3.2.1. Feasibility Study

The main aim of feasibility study is to determine whether it would be financially and technically feasible to develop the product
At first project managers or team leaders try to have a rough understanding of what is required to be done by visiting the client side. They study different input data to the system and output data to be produced by the system. They study what kind of processing is needed to be done on these data and they look at the various constraints on the behaviour of the system. After they have an overall understanding of the problem, they investigate the different solutions that are possible. Then they examine each of the solutions in terms of what kinds of resources are required, what would be the cost of development and what would be the development time for each solution. Based on this analysis, they pick the best solution and determine whether the solution is feasible financially and technically. They check whether the customer budget would meet the cost of the product and whether they have sufficient technical expertise in the area of development.
The following is an example of a feasibility study undertaken by an organization. It is intended to give one a feel of the activities and issues involved in the feasibility study phase of a typical software project.
Case Study A mining company named Galaxy Mining Company Ltd. (GMC) has mines located at various places in India. It has about fifty different mine sites spread across eight states. The company employs a large number of mines at each mine site. Mining being a risky profession, the company intends to operate a special provident fund, which would exist in addition to the standard provident fund that the miners already enjoy. The main objective of having the special provident fund (SPF) would be to quickly distribute some compensation before the standard provident amount is paid. According to this scheme, each mine site would deduct SPF instalments from each miner every month and deposit the same with the CSPFC (Central Special Provident Fund Commissioner). The CSPFC will maintain all details regarding the SPF instalments collected from the miners. GMC employed a reputed software vendor Adventure Software Inc. to undertake the task of developing the software for automating the maintenance of SPF records of all employees. GMC realized that besides saving manpower on bookkeeping work, the software would help in speedy settlement of claim cases. GMC indicated that the amount it could afford for this software to be developed and installed was 1 million rupees. Adventure Software Inc. deputed their project manager to carry out the feasibility study. The project manager discussed the matter with the top managers of GMC to get an overview of the project. He also discussed the issues involved with the several field PF officers at various mine sites to determine the exact details of the project. The project manager identified two broad approaches to solve the problem. One was to have a central database which could be accessed and updated via a satellite connection to various mine sites. The other approach was to have local databases at each mine site and to update the central database periodically through a dial-up connection. These periodic updates could be done on a daily or hourly basis depending on the delay acceptable to GMC in invoking various functions of the software. The project manager found that the second approach was very affordable and more fault-tolerant as the local mine sites could still operate even when the communication link to the central database temporarily failed. The project manager quickly analyzed the database functionalities required, the userinterface issues, and the software handling communication with the mine sites. He arrived at a cost to develop from the analysis. He found that the solution involving maintenance of local databases at the mine sites and periodic updating of a central database was financially and technically feasible. The project manager discussed his solution with the GMC management and found that the solution was acceptable to them as well.
3.2.2. Requirements Analysis and Specification

The aim of the requirements analysis and specification phase is to understand the exact requirements of the customer and to document them properly. This phase consists of two distinct activities, namely
Requirements gathering and analysis, and Requirements specification
The goal of the requirements gathering activity is to collect all relevant information from the customer regarding the product to be developed with a view to clearly understand the customer requirements and weed out the incompleteness and inconsistencies in these requirements.
The requirements analysis activity is begun by collecting all relevant data regarding the product to be developed from the users of the product and from the customer through interviews and discussions. For example, to perform the requirements analysis of a business accounting software required by an organization, the analyst might interview all the accountants of the organization to ascertain their requirements. The data collected from such a group of users usually contain several contradictions and ambiguities, since each user typically has only a partial and incomplete view of the system. Therefore it is necessary to identify all ambiguities and contradictions in the requirements and resolve them through further discussions with the customer. After all ambiguities, inconsistencies, and incompleteness have been resolved and all the requirements properly understood, the requirements specification activity can start. During this activity, the user requirements are systematically organized into a Software Requirements Specification (SRS) document. The customer requirements identified during the requirements gathering and analysis activity are organized into an SRS document. The important components of this document are functional requirements, the non-functional requirements, and the goals of implementation.
3.2.3. Design
The goal of the design phase is to transform the requirements specified in the SRS document into a structure that is suitable for implementation in some programming language. In technical terms, during the design phase the software architecture is derived from the SRS document. Two distinctly different approaches are available: the traditional design approach and the objectoriented design approach. Traditional design approach: Traditional design consists of two different activities; first a structured analysis of the requirements specification is carried out where the detailed structure of the problem is examined. This is followed by a structured design activity. During structured design, the results of structured analysis are transformed into the software design. Object-oriented design approach: In this technique, various objects that occur in the problem domain and the solution domain are first identified, and the different relationships that exist among these objects are identified. The object structure is further refined to obtain the detailed design.
3.2.4. Coding and Unit Testing

The purpose of the coding and unit testing phase (sometimes called the implementation phase) of software development is to translate the software design into source code. Each component of the design is implemented as a program module. The end-product of this phase is a set of program modules that have been individually tested. During this phase, each module is unit tested to determine the correct working of all the individual modules. It involves testing each module in isolation as this is the most efficient way to debug the errors identified at this stage.
3.2.5. Integration and System Testing

Integration of different modules is undertaken once they have been coded and unit tested. During the integration and system testing phase, the modules are integrated in a planned manner. Version 2 EE IIT, Kharagpur 14
The different modules making up a software product are almost never integrated in one shot. Integration is normally carried out incrementally over a number of steps. During each integration step, the partially integrated system is tested and a set of previously planned modules are added to it. Finally, when all the modules have been successfully integrated and tested, system testing is carried out. The goal of system testing is to ensure that the developed system conforms to the requirements laid out in the SRS document. System testing usually consists of three different kinds of testing activities:

testing: It is the system testing performed by the development team. testing: It is the system testing performed by a friendly set of customers. Acceptance testing: It is the system testing performed by the customer himself after product delivery to determine whether to accept or reject the delivered product.
System testing is normally carried out in a planned manner according to the system test plan document. The system test plan identifies all testing-related activities that must be performed, specifies the schedule of testing, and allocates resources. It also lists all the test cases and the expected outputs for each test case.
3.2.6. Maintenance
Maintenance of a typical software product requires much more than the effort necessary to develop the product itself. Many studies carried out in the past confirm this and indicate that the relative effort of development of a typical software product to its maintenance effort is roughly in the 40:60 ratio. Maintenance involves performing any one or more of the following three kinds of activities:

Correcting errors that were not discovered during the product development phase. This is called corrective maintenance. Improving the implementation of the system, and enhancing the functionalities of the system according to the customers requirements. This is called perfective maintenance. Porting the software to work in a new environment. For example, porting may be required to get the software to work on a new computer platform or with a new operating system. This is called adaptive maintenance.
3.2.7. Shortcomings of the Classical Waterfall Model

The classical waterfall model is an idealistic one since it assumes that no development error is ever committed by the engineers during any of the life cycle phases. However, in practical development environments, the engineers do commit a large number of errors in almost every phase of the life cycle. The source of the defects can be many: oversight, wrong assumptions, use of inappropriate technology, communication gap among the project engineers, etc. These defects usually get detected much later in the life cycle. For example, a design defect might go unnoticed till we reach the coding or testing phase. Once a defect is detected, the engineers need to go back to the phase where the defect had occurred and redo some of the work done during that phase and the subsequent phases to correct the defect and its effect on the later phases. Therefore, in any practical software development work, it is not possible to strictly follow the classical waterfall model. Version 2 EE IIT, Kharagpur 15
3.2.8. Phase-Entry and Phase-Exit Criteria

At the start of the feasibility study, project managers or team leaders try to understand what the actual problem is, by visiting the client side. At the end of that phase, they pick the best solution and determine whether the solution is feasible financially and technically. At the start of requirements analysis and specification phase, the required data is collected. After that requirement specification is carried out. Finally, SRS document is produced. At the start of design phase, context diagram and different levels of DFDs are produced according to the SRS document. At the end of this phase module structure (structure chart) is produced. During the coding phase each module (independently compilation unit) of the design is coded. Then each module is tested independently as a stand-alone unit and debugged separately. After this each module is documented individually. The end product of the implementation phase is a set of program modules that have been tested individually but not tested together. After the implementation phase, different modules which have been tested individually are integrated in a planned manner. After all the modules have been successfully integrated and tested, system testing is carried out. Software maintenance denotes any changes made to a software product after it has been delivered to the customer. Maintenance is inevitable for almost any kind of product. However, most products need maintenance due to the wear and tear caused by use.
3.3. Prototyping Model

A prototype is a toy implementation of the system. A prototype usually exhibits limited functional capabilities, low reliability, and inefficient performance compared to the actual software. A prototype is usually built using several shortcuts. The shortcuts might involve using inefficient, inaccurate, or dummy functions. The shortcut implementation of a function, for example, may produce the desired results by using a table look-up instead of performing the actual computations. A prototype usually turns out to be a very crude version of the actual system.
3.3.1. The Need for a Prototype

There are several uses of a prototype. An important purpose is to illustrate the input data formats, messages, reports, and the interactive dialogues to the customer. This is a valuable mechanism for gaining better understanding of the customers needs.

how screens might look like how the user interface would behave how the system would produce outputs, etc.
This is something similar to what the architectural designers of a building do; they show a prototype of the building to their customer. The customer can evaluate whether he likes it or not and the changes that he would need in the actual product. A similar thing happens in the case of a software product and its prototyping model. 32.
3.4. Spiral Model
The Spiral model of software development is shown in fig. 33.8. The diagrammatic representation of this model appears like a spiral with many loops. The exact number of loops in the spiral is not fixed. Each loop of the spiral represents a phase of the software process. For example, the innermost loop might be concerned with feasibility study; the next loop with requirements specification; the next one with design, and so on. Each phase in this model is split into four sectors (or quadrants) as shown in fig. 33.8. The following activities are carried out during each phase of a spiral model. First quadrant (Objective Setting): During the first quadrant, we need to identify the objectives of the phase. Examine the risks associated with these objectives Second quadrant (Risk Assessment and Reduction): A detailed analysis is carried out for each identified project risk. Steps are taken to reduce the risks. For example, if there is a risk that the requirements are inappropriate, a prototype system may be developed Third quadrant (Objective Setting): Develop and validate the next level of the product after resolving the identified risks. Fourth quadrant (Objective Setting): Review the results achieved so far with the customer and plan the next iteration around the spiral. With each iteration around the spiral, progressively a more complete version of the software gets built.
3.4.1. A Meta Model

The spiral model is called a meta-model since it encompasses all other life cycle models. Risk handling is inherently built into this model. The spiral model is suitable for development of Version 2 EE IIT, Kharagpur 17
technically challenging software products that are prone to several kinds of risks. However, this model is much more complex than the other models. This is probably a factor deterring its use in ordinary projects.
3.5. Comparison of Different Life Cycle Models

The classical waterfall model can be considered as the basic model and all other life cycle models as embellishments of this model. However, the classical waterfall model can not be used in practical development projects, since this model supports no mechanism to handle the errors committed during any of the phases. This problem is overcome in the iterative waterfall model. The iterative waterfall model is probably the most widely used software development model evolved so far. This model is simple to understand and use. However, this model is suitable only for well-understood problems; it is not suitable for very large projects and for projects that are subject to many risks. The prototyping model is suitable for projects for which either the user requirements or the underlying technical aspects are not well understood. This model is especially popular for development of the user-interface part of the projects. The evolutionary approach is suitable for large problems which can be decomposed into a set of modules for incremental development and delivery. This model is also widely used for object-oriented development projects. Of course, this model can only be used if the incremental delivery of the system is acceptable to the customer. The spiral model is called a meta-model since it encompasses all other life cycle models. Risk handling is inherently built into this model. The spiral model is suitable for development of technically challenging software products that are prone to several kinds of risks. However, this model is much more complex than the other models. This is probably a factor deterring its use in ordinary projects. The different software life cycle models can be compared from the viewpoint of the customer. Initially, customer confidence in the development team is usually high irrespective of the development model followed. During the long development process, customer confidence normally drops, as no working product is immediately visible. Developers answer customer queries using technical slang, and delays are announced. This gives rise to customer resentment. On the other hand, an evolutionary approach lets the customer experiment with a working product much earlier than the monolithic approaches. Another important advantage of the incremental model is that it reduces the customers trauma of getting used to an entirely new system. The gradual introduction of the product via incremental phases provides time to the customer to adjust to the new product. Also, from the customers financial viewpoint, incremental development does not require a large upfront capital outlay. The customer can order the incremental versions as and when he can afford them.
3.6. Exercises
1. Mark the following as True or False. Justify your answer. a. All software engineering principles are backed by either scientific basis or theoretical proof. b. There are well defined steps through which a problem is solved using an exploratory style. c. Evolutionary life cycle model is ideally suited for development of very small software products typically requiring a few months of development effort. Version 2 EE IIT, Kharagpur 18
d.
2.
Prototyping life cycle model is the most suitable one for undertaking a software development project susceptible to schedule slippage. e. Spiral life cycle model is not suitable for products that are vulnerable to a large number of risks. For the following, mark all options which are true. a. Which of the following problems can be considered to be contributing to the present software crisis? large problem size lack of rapid progress of software engineering lack of intelligent engineers shortage of skilled manpower b. Which of the following are essential program constructs (i.e. it would not be possible to develop programs for any given problem without using the construct)? Sequence Selection Jump Iteration c. In a classical waterfall model, which phase precedes the design phase? Coding and unit testing Maintenance Requirements analysis and specification Feasibility study d. Among development phases of software life cycle, which phase typically consumes the maximum effort? Requirements analysis and specification Design Coding Testing e. Among all the phases of software life cycle, which phase consumes the maximum effort? Design Maintenance Testing Coding f. In the classical waterfall model, during which phase is the Software Requirement Specification (SRS) document produced? Design Maintenance Requirements analysis and specification Coding g. Which phase is the last development phase in the classical waterfall software life cycle? Design Maintenance Testing Coding Version 2 EE IIT, Kharagpur 19
3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25.
Which development phase in classical waterfall life cycle immediately follows coding phase? Design Maintenance Testing Requirement analysis and specification Identify the problem one would face, if he tries to develop a large software product without using software engineering principles. Identify the two important techniques that software engineering uses to tackle the problem of exponential growth of problem complexity with its size. State five symptoms of the present software crisis. State four factors that have contributed to the making of the present software crisis. Suggest at least two possible solutions to the present software crisis. Identify at least four basic characteristics that differentiate a simple program from a software product. Identify two important features of that a program must satisfy to be called as a structured program. Explain exploratory program development style. Show at least three important drawbacks of the exploratory programming style. Identify at least two advantages of using high-level languages over assembly languages. State at least two basic differences between control flow-oriented and data flow-oriented design techniques. State at least five advantages of object-oriented design techniques. State at least three differences between the exploratory style and modern styles of software development. Explain the problems that might be faced by an organization if it does not follow any software life cycle model. Differentiate between structured analysis and structured design. Identify at least three activities undertaken in an object-oriented software design approach. State why it is a good idea to test a module in isolation from other modules. Identify why different modules making up a software product are almost never integrated in one shot. Identify the necessity of integration and system testing. Identify six different phases of a classical waterfall model. Mention the reasons for which classical waterfall model can be considered impractical and cannot be used in real projects. Explain what a software prototype is. Identify three reasons for the necessity of developing a prototype during software development. Explain the situations under which it is beneficial to develop a prototype during software development. Identify the activities carried out during each phase of a spiral model. Discuss the advantages of using spiral model.
h.
Module 7
Lesson 34
Requirements Analysis and Specification

At the end of this lesson, the student would be able to: Get an overview of Requirements Gathering And Analysis Identify the important parts and properties of a SRS document Identify and document the functional and non-functional requirements from a problem description Identify the problems that an organization would face if it does not develop SRS documents Identify the problems that an unstructured specification would create during software development Understand the basics of decision trees and decision table Understand formal techniques and formal specification languages Differentiate between model-oriented and property-oriented approaches Explain the operational semantics of a formal method Identify the merits and limitations of formal requirements specification Explain and develop axiomatic specification and algebraic specification Identify the basic properties of a good algebraic specification State the properties of a structured specification State the advantages and disadvantages of algebraic specifications State the features of an executable specification language (4GL) with suitable examples
1. Introduction
The requirements analysis and specification phase starts once the feasibility study phase is complete and the project is found to be technically sound and feasible. The goal of the requirements analysis and specification phase is to clearly understand customer requirements and to systematically organize these requirements in a specification document. This phase consists of the following two activities: Requirements Gathering And Analysis Requirements Specification
2. Requirements Gathering And Analysis

The analyst starts requirements gathering and analysis activity by collecting all information from the customer which could be used to develop the requirements of the system. He then analyses the collected information to obtain a clear and thorough understanding of the product to be developed, with a view to removing all ambiguities and inconsistencies from the initial customer perception of the problem. The following basic questions pertaining to the project should be clearly understood by the analyst in order to obtain a good grasp of the problem:

What is the problem? Why is it important to solve the problem? What are the possible solutions to the problem? What exactly are the data input to the system and what exactly are the data output by the system? Version 2 EE IIT, Kharagpur 3
What are the likely complexities that might arise while solving the problem? If there are external software or hardware with which the developed software has to interface, then what exactly would the data interchange formats with the external system be?
After the analyst has understood the exact customer requirements, he proceeds to identify and resolve the various requirements problems. The most important requirements problems that the analyst has to identify and eliminate are the problems of anomalies, inconsistencies, and incompleteness. When the analyst detects any inconsistencies, anomalies or incompleteness in the gathered requirements, he resolves them by carrying out further discussions with the endusers and the customers.
3. SRS Document
After the analyst has collected all the requirements information regarding the software to be developed, and has removed all the incompleteness, in consistencies, and anomalies from the specification, he starts to systematically organize the requirements in the form of an SRS document. The important parts of SRS document are: Functional requirements of the system Non-functional requirements of the system, and Goals of implementation
3.1.1. Functional Requirements

The functional requirements part discusses the functionalities required from the system. Here we list all high-level functions {fi} that the system performs. Each high-level function fi, as shown in fig. 34.1, is considered as a transformation of a set of input data to some corresponding output data. The user can get some meaningful piece of work done using a high-level function.
Input data fi Fig. 34.1 Function fi
Output data
3.1.2. Non-Functional Requirements

Non-functional requirements deal with the characteristics of the system which can not be expressed as functions - such as the maintainability of the system, portability of the system, usability of the system, etc. Non-functional requirements may include: Reliability issues Accuracy of results Human-computer interface issues Constraints on the system implementation, etc. Version 2 EE IIT, Kharagpur 4
3.1.3. Goals of Implementation

The goals of implementation part documents some general suggestions regarding development. These suggestions guide trade-off among design goals. The goals of implementation section might document issues such as revisions to the system functionalities that may be required in the future, new devices to be supported in the future, reusability issues, etc. These are the items which the developers might keep in their mind during development so that the developed system may meet some aspects that are not required immediately.
3.1.4. Identify Functional Requirements

The high-level functional requirements often need to be identified either from an informal problem description document or from a conceptual understanding of the problem. Each highlevel requirement characterizes a way of system usage by some user to perform some meaningful piece of work. There can be many types of users of a system and their requirements from the system may be very different. So, it is often useful to identify the different types of users who might use the system and then try to identify the requirements from each users perspective. Here we list all functions {fi} that the system performs. Each function fi, as shown in fig. 34.1, is considered as a transformation of a set of input data to some corresponding output data. Example Consider the case of the library system, where F1: Search Book function (fig. 34.2) Details of the authors books and the location of these books in the library
Input: An authors name Output:
Author name f1 Fig. 34.2 Book Function
Book details
So, the function Search Book (F1) takes the author's name and transforms it into book details. Functional requirements actually describe a set of high-level requirements, where each highlevel requirement takes some data from the user and provides some data to the user as an output. Also each high-level requirement might consist of several other functions.
3.1.5. Document Functional Requirements

For documenting the functional requirements, we need to specify the set of functionalities supported by the system. A function can be specified by identifying the state at which the data is Version 2 EE IIT, Kharagpur 5
to be input to the system, its input data domain, the output data domain, and the type of processing to be carried on the input data to obtain the output data. Let us first try to document the withdraw-cash function of an ATM (Automated Teller Machine) system. The withdraw-cash is a high-level requirement. It has several sub-requirements corresponding to the different user interactions. These different interaction sequences capture the different scenarios. Example: Withdraw Cash from ATM R1: withdraw cash Description: The withdraw cash function first determines the type of account that the user has and the account number from which the user wishes to withdraw cash. It checks the balance to determine whether the requested amount is available in the account. If enough balance is available, it outputs the required cash, otherwise it generates an error message. R1.1: select withdraw amount option Input: withdraw amount option Output: user prompted to enter the account type R1.2: select account type Input: user option Output: prompt to enter amount R1.3: get required amount Input: amount to be withdrawn in integer values greater than 100 and less than 10,000 in multiples of 100. Output: The requested cash and printed transaction statement. Processing: the amount is debited from the users account if sufficient balance is available, otherwise an error message displayed.
3.1.6. Properties of a Good SRS Document

The important properties of a good SRS document are the following: Concise: The SRS document should be concise and at the same time unambiguous, consistent, and complete. Verbose and irrelevant descriptions reduce readability and also increase error possibilities. Structured: It should be well-structured. A well-structured document is easy to understand and modify. In practice, the SRS document undergoes several revisions to cope up with the customer requirements. Often, the customer requirements evolve over a period of time. Therefore, in order to make the modifications to the SRS document easy, it is important to make the document wellstructured. Black-box view: It should only specify what the system should do and refrain from stating how to do these. This means that the SRS document should specify the external behaviour of the system and not discuss the implementation issues. The SRS document should view the system to be developed as black box, and should specify the externally visible behaviour of the system. For this reason, the SRS document is also called the black-box specification of a system. Conceptual integrity: It should show conceptual integrity so that the reader can easily understand it. Version 2 EE IIT, Kharagpur 6
Response to undesired events: It should characterize acceptable responses to undesired events. These are called system response to exceptional conditions. Verifiable: All requirements of the system as documented in the SRS document should be verifiable. This means that it should be possible to determine whether or not requirements have been met in an implementation.
3.1.7. Problems without a SRS Document

The important problems that an organization would face if it does not develop an SRS document are as follows: Without developing the SRS document, the system would not be implemented according to customer needs. Software developers would not know whether what they are developing is what exactly is required by the customer. Without SRS document, it will be very difficult for the maintenance engineers to understand the functionality of the system. It will be very difficult for user document writers to write the users manuals properly without understanding the SRS document.
3.1.8. Identify Non-Functional Requirements

Non-functional requirements may include: Reliability issues Performance issues Human - computer interface issues Interface with other external systems Security and maintainability of the system, etc.
3.1.9. Problems with An Unstructured Specification

The problems that an unstructured specification would create during software development are as follows: It would be very difficult to understand that document. It would be very difficult to modify that document. Conceptual integrity in that document would not be shown. The SRS document might be ambiguous and inconsistent.
3.1.10.
Techniques for Representing Complex Logic
A good SRS document should properly characterize the conditions under which different scenarios of interaction occur. Sometimes such conditions are complex and several alternative interaction and processing sequences may exist. There are two main techniques available to analyze and represent complex processing logic: decision trees and decision tables.
1. Decision Trees A decision tree gives a graphic view of the processing logic involved in decision making and the corresponding actions taken. The edges of a decision tree represent conditions and the leaf nodes represent the actions to be performed depending on the outcome of testing the condition. Example Consider Library Membership Automation Software (LMS) where it should support the following three options: New member Renewal Cancel membership New member option Decision: When the 'new member' option is selected, the software asks details about the member like member's name, address, phone number etc. Action: If proper information is entered, then a membership record for the member is created and a bill is printed for the annual membership charge plus the security deposit payable. Renewal option Decision: If the 'renewal' option is chosen, the LMS asks for the member's name and his membership number to check whether he is a valid member or not. Action: If the membership is valid then membership expiry date is updated and the annual membership bill is printed, otherwise an error message is displayed. Cancel membership option Decision: If the 'cancel membership' option is selected, then the software asks for member's name and his membership number. Action: The membership is cancelled, a cheque for the balance amount due to the member is printed and finally the membership record is deleted from the database. Decision tree representation of the above example The following tree (fig. 34.3) shows the graphical representation of the above example. After getting information from the user, the system makes a decision and then performs the corresponding actions.
New member Renewal
Action Get details Create records Print bills Get details Update record Print bills
User output Cancellation Get details Print cheque Delete record Invalid Option Print error massage Fig. 34.3 Decision tree for LMS 2. Decision Tables A decision table is used to represent the complex processing logic in a tabular or a matrix form. The upper rows of the table specify the variables or conditions to be evaluated. The lower rows of the table specify the actions to be taken when the corresponding conditions are satisfied. Example Consider the previously discussed LMS example. The decision table shown in fig. 34.4 shows how to represent the problem in a tabular form. Here the table is divided into two parts. The upper part shows the conditions and the lower part shows what actions are taken. Each column of the table is a rule.
Conditions Valid selection New member Renewal Cancellation Actions Display error message Ask member's details Build customer record Generate bill Update expiry date Print cheque Delete record
No Yes Yes Yes - Yes No No - No Yes No - No No Yes x x x x x x x x x x
Ask member's name & membership number -
Fig. 34.4 Decision table for LMS From the above table you can easily understand that, if the valid selection condition is false, then the action taken for this condition is 'display error message' and so on.
4. Formal Requirements Specification

A formal technique is a mathematical method to specify a hardware and/or software system, verify whether a specification is realizable, verify that an implementation satisfies its specification, prove properties of a system without necessarily running the system, etc. The mathematical basis of a formal method is provided by the specification language.
4.1. Formal Specification Language

A formal specification language consists of two sets syn and sem, and a relation sat between them. The set syn is called the syntactic domain, the set sem is called the semantic domain, and the relation sat is called the satisfaction relation. For a given specification syn, and model of the system sem, if sat(syn, sem) as shown in fig.34.5, then syn is said to be the specification of sem, and sem is said to be the specificand of syn.
SYN SAT Fig. 34.5 sat (syn, sem)
SEM
4.1.1. Syntactic Domains

The syntactic domain of a formal specification language consists of an alphabet of symbols and set of formation rules to construct well-formed formulae from the alphabet. The well-formed formulae are used to specify a system.
4.1.2. Semantic Domains

Formal techniques can have considerably different semantic domains. Abstract data type specification languages are used to specify algebras, theories, and programs. Programming languages are used to specify functions from input to output values. Concurrent and distributed system specification languages are used to specify state sequences, event sequences, statetransition sequences, synchronization trees, partial orders, state machines, etc.
4.1.3. Satisfaction Relation

Given the model of a system, it is important to determine whether an element of the semantic domain satisfies the specifications. This satisfaction is determined by using a homomorphism known as semantic abstraction function. The semantic abstraction function maps the elements of the semantic domain into equivalent classes. There can be different specifications describing different aspects of a system model, possibly using different specification languages. Some of these specifications describe the systems behaviour and the others describe the systems structure. Consequently, two broad classes of semantic abstraction functions are defined: those that preserve a systems behaviour and those that preserve a systems structure.
4.2. Model-Oriented Vs. Property-Oriented Approach

Formal methods are usually classified into two broad categories model oriented and property oriented approaches. In a model-oriented style, one defines a systems behaviour directly by constructing a model of the system in terms of mathematical structures such as tuples, relations, functions, sets, sequences, etc. In the property-oriented style, the system's behaviour is defined indirectly by stating its properties, usually in the form of a set of axioms that the system must satisfy. Example Let us consider a simple producer/consumer example. In a property-oriented style, we would probably start by listing the properties of the system like: the consumer can start consuming only after the producer has produced an item, the producer starts to produce an item only after the consumer has consumed the last item, etc. Examples of property-oriented specification styles are axiomatic specification and algebraic specification. In a modeloriented approach, we start by defining the basic operations, p (produce) and c (consume). Then we can state that S1 + p S, S + c S1. Thus the model-oriented approaches essentially specify a program by writing another, presumably simpler program. Examples of popular model-oriented specification techniques are Z, CSP, CCS, etc. In the property-oriented style, the system's behaviour is defined indirectly by stating its properties, usually in the form of a set of axioms that the system must satisfy. Version 2 EE IIT, Kharagpur 11
Model-oriented approaches are more suited to use in later phases of life cycle because here even minor changes to a specification may lead to drastic changes to the entire specification. They do not support logical conjunctions (AND) and disjunctions (OR). Property-oriented approaches are suitable for requirements specification because they can be easily changed. They specify a system as a conjunction of axioms and you can easily replace one axiom with another one.
4.3. Operational Semantics

Informally, the operational semantics of a formal method is the way computations are represented. There are different types of operational semantics according to what is meant by a single run of the system and how the runs are grouped together to describe the behaviour of the system. Some commonly used operational semantics are as follows:
4.3.1. Linear Semantics

In this approach, a run of a system is described by a sequence (possibly infinite) of events or states. The concurrent activities of the system are represented by non-deterministic inter-leavings of the automatic actions. For example, a concurrent activity ab is represented by the set of sequential activities a;b and b;a. This is a simple but rather unnatural representation of concurrency. The behaviour of a system in this model consists of the set of all its runs. To make this model realistic, usually justice and fairness restrictions are imposed on computations to exclude the unwanted interleaving.
4.3.2. Branching Semantics

In this approach, the behaviour of a system is represented by a directed graph as shown in the fig. 34.6. The nodes of the graph represent the possible states in the evolution of a system. The descendants of each node of the graph represent the states which can be generated by any of the atomic actions enabled at that state. Although this semantic model distinguishes the branching points in a computation, still it represents concurrency by interleaving.
Fig. 34.6 Branching semantics Version 2 EE IIT, Kharagpur 12
4.3.3. Maximally Parallel Semantics

In this approach, all the concurrent actions enabled at any state are assumed to be taken together. This is again not a natural model of concurrency since it implicitly assumes the availability of all the required computational resources.
4.3.4. Partial Order Semantics

Under this view, the semantics ascribed to a system is a structure of states satisfying a partial order relation among the states (events). The partial order represents a precedence ordering among events, and constrains some events to occur only after some other events have occurred; while the occurrence of other events (called concurrent events) is considered to be incomparable. This fact identifies concurrency as a phenomenon not translatable to any interleaved representation. A
B D
E Fig. 34.7 Partial order semantics
For example, Fig. 34.7 shows that we can compare node B with node D, but we can't compare node D with node A.
4.4. Merits of Formal Requirements Specification

Formal methods possess several positive features, some of which are discussed below.
Formal specifications encourage rigour. Often, the very process of construction of a
rigorous specification is more important than the formal specification itself. The construction of a rigorous specification clarifies several aspects of system behaviour that are not obvious in an informal specification.
Formal methods usually have a well-founded mathematical basis. Thus, formal
specifications are not only more precise, but also mathematically sound and can be used to reason about the properties of a specification and to rigorously prove that an implementation satisfies its specifications.
Formal methods have well-defined semantics. Therefore, ambiguity in specifications
is automatically avoided when one formally specifies a system. Version 2 EE IIT, Kharagpur 13
The mathematical basis of the formal methods facilitates automating the analysis of
specifications. For example, a tableau-based technique has been used to automatically check the consistency of specifications. Also, automatic theorem proving techniques can be used to verify that an implementation satisfies its specifications. The possibility of automatic verification is one of the most important advantages of formal methods.
Formal specifications can be executed to obtain immediate feedback on the features of
the specified system. This concept of executable specifications is related to rapid prototyping. Informally, a prototype is a toy working model of a system that can provide immediate feedback on the behaviour of the specified system, and is especially useful in checking the completeness of specifications.
4.5. Limitations of Formal Requirements Specification

It is clear that formal methods provide mathematically sound frameworks using which systems can be specified, developed and verified in a systematic manner. However, formal methods suffer from several shortcomings, some of which are the following:
Formal methods are difficult to learn and use. The basic incompleteness results of first-order logic suggest that it is impossible to
check absolute correctness of systems using theorem proving techniques.

Formal techniques are not able to handle complex problems. This shortcoming results
from the fact that, even moderately complicated problems blow up the complexity of formal specification and their analysis. Also, a large unstructured set of mathematical formulae is difficult to comprehend.
5. Axiomatic Specification
In axiomatic specification of a system, first-order logic is used to write the pre and postconditions to specify the operations of the system in the form of axioms. The pre-conditions basically capture the conditions that must be satisfied before an operation can successfully be invoked. In essence, the pre-conditions capture the requirements on the input parameters of a function. The post-conditions are the conditions that must be satisfied when a function completes execution and the function is considered to have been executed successfully. Thus, the postconditions are essentially the constraints on the results produced for the function execution to be considered successful.
5.1. Steps to Develop an Axiomatic Specification

The following are the sequence of steps that can be followed to systematically develop the axiomatic specifications of a function:
Establish the range of input values over which the function should behave correctly. Also find out other constraints on the input parameters and write them in the form of a predicate. Specify a predicate defining the conditions which must hold on the output of the function if it behaved properly. Version 2 EE IIT, Kharagpur 14
Establish the changes made to the functions input parameters after execution of the function. Pure mathematical functions do not change their input and therefore this type of assertion is not necessary for pure functions. Combine all of the above into pre and post conditions of the function.
5.2. Examples
Example 1 Specify the pre- and post-conditions of a function that takes a real number as argument and returns half the input value if the input is less than or equal to 100, or else returns double the value. f (x : real) : real pre : x R post : {(x100) (f(x) = x/2)} {(x>100) (f(x) = 2x)} Example 2 Axiomatically specify a function named search which takes an integer array and an integer key value as its arguments and returns the index in the array where the key value is present. search(X : IntArray, key : Integer) : Integer pre : i [Xfirst.Xlast], X[i] = key post : {(X[search(X, key)] = key) (X = X)} Here, the convention that has been followed is that, if a function changes any of its input parameters, and if that parameter is named X, then it has been referred that after the function completes execution as X.
6. Algebraic Specification
In the algebraic specification technique an object class or type is specified in terms of relationships existing between the operations defined on that type. It was first brought into prominence by Guttag [1980, 1985] in specification of abstract data types. Various notations of algebraic specifications have evolved, including those based on OBJ and Larch languages.
6.1. Representation of Algebraic Specification

Essentially, algebraic specifications define a system as a heterogeneous algebra. A heterogeneous algebra is a collection of different sets on which several operations are defined. Traditional algebras are homogeneous. A homogeneous algebra consists of a single set and several operations; {I, +, -, *, /}. In contrast, alphabetic strings together with operations of concatenation and length {A, I, con, len}, is not a homogeneous algebra, since the range of the length operation is the set of integers. To define a heterogeneous algebra, we first need to specify its signature, the involved operations, and their domains and ranges. Using algebraic specification, we define the meaning of a set of interface procedure by using equations. An algebraic specification is usually presented in four sections. 1. Types section In this section, the sorts (or the data types) being used are specified. Version 2 EE IIT, Kharagpur 15
2. Exceptions section This section gives the names of the exceptional conditions that might occur when different operations are carried out. These exception conditions are used in the later sections of an algebraic specification. 3. Syntax section This section defines the signatures of the interface procedures. The collection of sets that form input domain of an operator and the sort where the output is produced are called the signature of the operator. For example, PUSH takes a stack and an element and returns a new stack. stack x element stack 4. Equations section This section gives a set of rewrite rules (or equations) defining the meaning of the interface procedures in terms of each other. In general, this section is allowed to contain conditional expressions.
6.2. Operators
By convention, each equation is implicitly universally quantified over all possible values of the variables. Names not mentioned in the syntax section such r or e are variables. The first step in defining an algebraic specification is to identify the set of required operations. After having identified the required operators, it is helpful to classify them as basic constructor operators, extra constructor operators, basic inspector operators, or extra inspection operators. The definition of these categories of operators is as follows:
1. Basic construction operators: These operators are used to create or modify entities of a
type. The basic construction operators are essential to generate all possible element of the type being specified. For example, create and append are basic construction operators.
2. Extra construction operators: These are the construction operators other than the basic
construction operators. For example, the operator remove is an extra construction operator, because even without using remove it is possible to generate all values of the type being specified.
3. Basic inspection operators: These operators evaluate attributes of a type without
modifying them, e.g., eval, get, etc. Let S be the set of operators whose range is not the data type being specified. The set of the basic operators S1 is a subset of S, such that each operator from S-S1 can be expressed in terms of the operators from S1.
4. Extra inspection operators. These are the inspection operators that are not basic
inspectors.
6.3. Writing Algebraic Specifications

A good rule of thumb while writing an algebraic specification, is to first establish which are the constructor (basic and extra) and inspection operators (basic and extra). Then write down an axiom for composition of each basic construction operator over each basic inspection operator and extra constructor operator. Also, write down an axiom for each of the extra inspector in terms of any of the basic inspectors. Thus, if there are m1 basic constructors, m2 extra constructors, n1 basic inspectors, and n2 extra inspectors, we should have m1 (m2+n1) + n2 axioms are the minimum required and many more axioms may be needed to make the Version 2 EE IIT, Kharagpur 16
specification complete. Using a complete set of rewrite rules, it is possible to simplify an arbitrary sequence of operations on the interface procedures. The first step in defining an algebraic specification is to identify the set of required operations. After having identified the required operators, it is helpful to classify them as basic constructor operators, extra constructor operators, basic inspector operators, or extra inspector operators. A simple way to determine whether an operator is a constructor (basic or extra) or an inspector (basic or extra) is to check the syntax expression for the operator. If the type being specified appears on the right hand side of the expression then it is a constructor, otherwise it is an inspection operator. For example, in case of the following example, create is a constructor because point appears on the right hand side of the expression and point is the data type being specified. But, xcoord is an inspection operator since it does not modify the point type. Example Let us specify a data type point supporting the operations create, xcoord, ycoord, and isequal where the operations have their usual meaning. Types: defines point uses boolean, integer Syntax: create : integer integer point xcoord : point integer ycoord : point integer isequal : point point Boolean Equations: xcoord(create(x, y)) = x ycoord(create(x, y)) = y isequal(create(x1, y1), create(x2, y2)) = ((x1 = x2) and (y1 = y2)) In this example, we have only one basic constructor (create), and three basic inspectors (xcoord, ycoord, and isequal). Therefore, we have only 3 equations.
6.4. Properties of Algebraic Specifications

Three important properties that every good algebraic specification should possess are: Completeness: This property ensures that using the equations, it should be possible to reduce any arbitrary sequence of operations on the interface procedures. There is no simple procedure to ensure that an algebraic specification is complete. Finite termination property: This property essentially addresses the following question: Do applications of the rewrite rules to arbitrary expressions involving the interface procedures always terminate? For arbitrary algebraic equations, convergence (finite termination) is undecidable. But, if the right hand side of each rewrite rule has fewer terms than the left, then the rewrite process must terminate. Unique termination property: This property indicates whether application of rewrite rules in different orders always result in the same answer. Essentially, to determine this property, the answer to the following question needs to be checked: Can all possible sequence of choices in application of the rewrite rules to an arbitrary expression involving the interface procedures always give the same number? Checking the unique termination property is a very difficult problem. Version 2 EE IIT, Kharagpur 17
6.5. Structured Specification

Developing algebraic specifications is time consuming. Therefore efforts have been made to device ways to ease the task of developing algebraic specifications. The following are some of the techniques that have successfully been used to reduce the effort in writing the specifications. Incremental specification: The idea behind incremental specification is to first develop the specifications of the simple types and then specify more complex types by using the specifications of the simple types. Specification instantiation: This involves taking an existing specification which has been developed using a generic parameter and instantiating it with some other sort.
6.6. Pros and Cons of Algebraic Specifications

Algebraic specifications have a strong mathematical basis and can be viewed as heterogeneous algebra. Therefore, they are unambiguous and precise. Using an algebraic specification, the effect of any arbitrary sequence of operations involving the interface procedures can automatically be studied. A major shortcoming of algebraic specifications is that they cannot deal with side effects. Therefore, algebraic specifications are difficult to interchange with typical programming languages. Also, algebraic specifications are hard to understand.
7. Executable Specification Language (4GL)

If the specification of a system is expressed formally or by using a programming language, then it becomes possible to directly execute the specification. However, executable specifications are usually slow and inefficient, 4GLs3 (4th Generation Languages) are examples of executable specification languages. 4GLs are successful because there is a lot of commonality across data processing applications. 4GLs rely on software reuse, where the common abstractions have been identified and parameterized. Careful experiments have shown that rewriting 4GL programs in higher level languages results in up to 50% lower memory usage and also the program execution time can reduce by ten folds. Example of a 4GL is Structured Query Language (SQL).
8. Exercises
1. Mark the following as True or False. Justify your answer. a. All software engineering principles are backed by either scientific basis or theoretical proof. b. Functional requirements address maintainability, portability, and usability issues. c. The edges of decision tree represent corresponding actions to be performed according to conditions. d. The upper rows of the decision table specify the corresponding actions to be taken when an evaluation test is satisfied. e. A column in a decision table is called an attribute. f. Pre-conditions of axiomatic specifications state the requirements on the parameters of the function before the function can start executing. g. Post-conditions of axiomatic specifications state the requirements on the parameters of the function when the function is completed.
h.
2.
Homogeneous algebra is a collection of different sets on which several operations are defined. i. Applications developed using 4 GLs would normally be more efficient and run faster compared to applications developed using 3 GL. For the following, mark all options which are true. j. An SRS document normally contains Functional requirements of the system Module structure Configuration management plan Non-functional requirements of the system Constraints on the system The structured specification technique that is used to reduce the effort in writing specification is Incremental specification Specification instantiation Both the above None of the above l. Examples of executable specifications are Third generation languages Fourth generation languages Second-generation languages First generation languages Identify the roles of a system analyst. Identify the important parts of an SRS document. Identify the problems an organization might face without developing an SRS document. Identify the non-functional requirement-issues that are considered for a given problem description. Discuss the problems that an unstructured specification would create during software development. Identify the necessity of using formal technique in the context of requirements specification. Identify the differences between model-oriented and property-oriented approaches in the context of requirements specification. Explain the use of operational semantic. Explain the use of algebraic specifications in the context of requirements specification. Identify the requirements of algebraic specifications to define a system. Identify the essential sections of an algebraic specification to define a system. Explain the steps for developing algebraic specification of simple problems. Identify the properties that every good algebraic specification should possess. Identify the basic properties of a structured specification. Discuss the advantages and disadvantages of algebraic specification. Write down the important features of an executable specification language with examples. k.
3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15.
Module 7
Lesson 35
Modelling Timing Constraints

At the end of this lesson, the student would be able to: Explain what an event is Classify the types of events Classify the different types of timing constraints Explain what a delay constraint is Explain what a deadline constraint is Explain what a duration constraint is Identify the different types of delay, deadline, and duration constraints associated with a system Explain how timing constraints can be modelled Explain a Finite State Machine (FSM) Explain an Extended Finite State Machine (EFSM) Explain how different timing constraints can be modelled using EFSM
1. Timing Constraints An Introduction

The correctness of real-time tasks depend both on the logical correctness of the result, as well as, on the satisfaction of the corresponding timing constraints. The timing constraints as we shall see in this section, in fact, apply to certain events in a system. These events may be generated by the tasks themselves or the environment of the system. An example of such an event is the event of activation of a motor. Remember that the results may be generated at different times and it may not be in the form of a single one-time result. We must first properly characterize the events in a system, to understand the timing behavior of real-time systems.
1.1. Events in a System

An event may be generated either by the system or its environment. Based on this consideration, events can be classified into the following two types: Stimulus Events: Stimulus events are generated by the environment and act on the system. These events can be produced asynchronously (i.e. aperiodically). For example, a user pressing a button on a telephone set generates a stimulus event to act on the telephone system. Stimulus events can also be generated periodically. As an instance, consider the periodic sensing of the temperature of the reactor in a nuclear plant. Response Events: Response events are usually produced by the system in response to some stimulus events. Response events act on the environment. For example, consider a chemical plant where as soon as the temperature exceeds 100 C, the system responds by switching off the heater. Here, the event of temperature exceeding 100 C is the stimulus and switching off of the heater is the response. Response events can either be periodic or aperiodic.
An event may either be instantaneous or may have certain duration. For example, a button press event is described by the duration for which the button was kept pressed. Some authors argue that durational events are really not a basic type of event, but can be expressed using other events. In fact, it is possible to consider a duration event as a combination of two events: a start event and an end event. For example, the button press event can be described by a combination of start button press and end button press events. However, it is often convenient to retain the notion of a durational event. In this text, we consider durational events as a special class of events. Using the preliminary notions about events discussed in this subsection, we classify various types of timing constraints in subsection 1.7.1.
1.2. Classification of Timing Constraints

A classification of the different types of timing constraints is important. Not only would it give us an insight into the different types of timing constraints that can exist in a system, but it can also help us to quickly identify the different timing constraints that can exist from a casual examination of a problem. That is, in addition to better understanding of the behavior of a system, it can also let us work out the specification of a real-time system accurately. Different timing constraints associated with a real-time system can broadly be classified into performance and behavioral constraints. Performance constraints are the constraints that are imposed on the response of the system. Behavioral constraints are the constraints that are imposed on the stimuli generated by the environment. Behavioral constraints ensure that the environment of a system is well behaved, whereas performance constraints ensure that the computer system performs satisfactorily. Each of performance and behavioral constraints can further be classified into the following three types: Delay Constraint Deadline Constraint Duration Constraint These three classes of constraints are explained in the subsequent sections.
1.2.1. Delay Constraints

A delay constraint captures the minimum time (delay) that must elapse between the occurrence of two arbitrary events e1 and e2. After e1 occurs, if e2 occurs earlier than the minimum delay, then a delay violation is said to occur. A delay constraint on the event e2 can be expressed more formally as follows: t(e2 ) t(e1 ) d where t(e2 ) and t(e1 ) are the time stamps on the events e2 and e1 respectively and d is the minimum delay specified from e2. A delay constraint on the events e2 with respect to the event e1 is shown pictorially in Fig. 35.1. In Fig. 35.1s, denotes the actual separation in time between the occurrence of the two events e1 and e2 and d is the required minimum separation between the two events (delay). It is easy to see that e2 must occur after at least d time units have elapsed since the occurrence of e1; otherwise we shall have a delay violation.
>= d t=0 t(e1) d t(e2)
Fig. 35.1 Delay Constraint between two events e1 and e2
1.2.2. Deadline Constraints

A deadline constraint captures the permissible maximum separation between any two arbitrary events e1 and e2. In other words, the second event (i.e. e2) must follow the first event (i.e. e1) within the permissible maximum separation time. Consider that t(e1 ) and t(e2 ) are the time stamps on the occurrence of the events e1 and e2 respectively and d is the deadline as shown in Fig. 35.2. In Fig. 35.2, denotes the actual separation between the time of occurrence of the two events e1 and e2, and d is the deadline. A deadline constraint implies that e2 must occur within d time units of e1s occurrence. We can alternatively state that t(e1) and t(e2) must satisfy the constraint: t(e2 ) t(e1 ) d <= d t=0 t(e1) t(e2) d
Fig. 35.2 Deadline Constraint between two events e1 and e2 The deadline and delay constraints can further be classified into two types each based on whether the constraint is imposed on the stimulus or on the response event. This has been explained with some examples in section 1.3.
1.2.3. Duration Constraints

A duration constraint on an event specifies the time period over which the event acts. A duration constraint can either be minimum type or maximum type. The minimum type duration constraint requires that once the event starts, the event must not end before a certain minimum duration; whereas a maximum type duration constraint requires that once the event starts, the event must end before a certain maximum duration elapses.
Public Switched Telephone Network
Call Initiator
Telephone system
Call Receiver
Environment
Fig. 35.3 Schematic Representation of a Telephone System
1.3. Examples of Different Types of Timing Constraints

We illustrate the different classes of timing constraints by using the examples from a telephone system discussed in. A schematic diagram of a telephone system is given in Fig. 35.3. Note that I have intentionally drawn an old styled telephone, because its operation is easier to understand! Here, the telephone handset and the Public Switched Telephone Network (PSTN) are considered as constituting the computer system and the users as forming the environment. In the following, we give a few simple example operations of the telephone system to illustrate the different types of timing constraints. Deadline constraints: In the following, we discuss four different types of deadline constraints that may be identified in a real-time system depending on whether the two events involved in a deadline constraint are stimulus type or response type. StimulusStimulus (SS): In this case, the deadline is defined between two stimuli. This is a behavioral constraint, since the constraint is imposed on the second event which is a stimulus. An example of an SS type of deadline constraint is the following: Once a user completes dialling a digit, he must dial the next digit within the next 5 seconds; otherwise an idle tone is produced. In this example, the dialing two consecutive digits represent the two stimuli to the telephone system. StimulusResponse (SR): In this case, the deadline is defined on the response event, measured from the occurrence of the corresponding stimulus event. This is a performance constraint, since the constraint is imposed on a response event. An example of an SR type of deadline constraint is the following:
Once the receiver of the hand set is lifted, the dial tone must be produced by the system within 2 seconds, otherwise a beeping sound is produced until the handset is replaced. In this example, the lifting of the receiver hand set represents a stimulus to the telephone system and production of the dial tone is the response. ResponseStimulus (RS): Here the deadline is on the production of response counted from the corresponding stimulus. This is a behavioral constraint, since the constraint is imposed on the stimulus event. An example of an RS type of deadline constraint is the following: Once the dial tone appears, the first digit must be dialed within 30 seconds, otherwise the system enters an idle state and an idle tone is produced. ResponseResponse (RR): An RR type of deadline constraint is defined on two response events. In this case, once the first response event occurs, the second response event must occur before a certain deadline. This is a performance constraint, since the timing constraint has been defined on a response event. An example of an RR type of deadline constraint is the following: Once the ring tone is given to the callee, the corresponding ring back tone must be given to the caller within two seconds, otherwise the call is terminated. Here ring back tone and the corresponding ring tone are the two response events. Delay Constraints: We can identify only one type of delay constraint (SS type) in the telephone system example that we are considering. However, in other problems it may be possible to identify different types of delay constraints. An SS type of a delay constraint is a behavioral constraint. An example of an SS type of delay constraint is the following: Once a digit is dialled, the next digit should be dialled after at least 1 second. Otherwise, a beeping sound is produced until the call initiator replaces the handset. Here the delay constraint is defined on the event of dialling of the next digit (stimulus) after a digit is dialled (also a stimulus). Duration Constraint: A duration constraint on an event specifies the time interval over which the event acts. An example of a duration constraint is the following: If you press the button of the handset for less than 15 seconds, it connects to the local operator. If you press the button for any duration lasting between 15 to 30 seconds, it connects to the international operator. If you keep the button pressed for more than 30 seconds, then on releasing it would produce the dial tone. Timing Constraints Performance Constraints Behaviorial Constraints
Delay Deadline Duration Delay
Deadline
Duration
RR SR SR
RR
RS
SS SS
RS
Fig. 35.4 Classification of Timing Constraints Version 2 EE IIT, Kharagpur 7
A classification of the different types of timing constraints that we discussed in this section is shown in Fig. 35.4. Note that a performance constraint can either be delay, deadline, or durational type. The delay or deadline constraints on performance can either be RR or RS type. Similarly, the behavioral constraints can either be delay, deadline, or durational type. The delay or deadline constraints on behavior of environment can either be RS or SS type.
2. Modelling Timing Constraints

In this section, we describe how the timing constraints identified in Sec. 1.2 can be modelled. Modelling time constraints is very important since once a model of the time constraints in a system is constructed, it can serve as a formal specification of the system. Further, if all the timing constraints in a system are modelled accurately, then it may even be used to automatically generate code from it. Besides serving as a specification, modelling time constraints can help to verify and understand a real-time system.
2.1. The Finite State Machine (FSM)

The modelling approach we discuss here is based on Finite State Machines (FSMs). An FSM is a powerful tool which has long been used to model traditional systems. In an FSM, a state is defined in terms of the values assumed by some attributes. For example, the states of an elevator may be denoted in terms of its directions of motion. Here direction is the attribute, based on which the states up, down, and stationery are defined. In an FSM model, at any point of time a system can be in any one of a (possibly infinite) number of states. A state is represented by a circle. The system changes state due to events that change the values of, or relations among the state variables. A state change is also called a state transition. A transition causing event may either be an interface event that are transmitted between the environment and the computer system or it could also be an internal event that is generated and consumed solely within the system. A transition from one state to another is represented by drawing a directed arc from the source to the destination (see Fig.35.5). The event causing a transition is annotated on the arc. We keep our discussions of FSM to the bare minimum since we assume that the reader is familiar with basic FSM modelling of traditional systems.
2.2. Extended Finite State Machine (EFSM)

We use an Extended Finite State Machine to model time constraints. EFSM extends the traditional FSM by incorporating the action of setting a timer and the expiry event of a timer. The notations we use for construction of EFSMs are simple and straightforward. Therefore rather than introducing them formally, we have illustrated them through an example in Fig. 35.5. The example shown in Fig. 35.5 describes that if an event e1 occurs when the current state of the system is s1, then an action will be taken by setting a timer to expire in the next 20 milliseconds and the system transits to state s2.
E1 / set timer (20 ms) S1 S2
Fig. 35.5 Conventions Used in Drawing an EFSM We have already discussed that events can be considered to be of two types: stimulus events and response events. We had also discussed different types of timing constraints in Section 1.3. Now we explain how these constraints can be modelled by using EFSMs.
2.2.1. Stimulus-Stimulus (SS)

Let us consider the example of an SS type of deadline constraint we had discussed in Section 1.3: Once the first digit has been dialled on the telephone handset, the next digit must be dialled within the next 5 milliseconds. This has been modelled in Fig. 35.6. In Fig.35.6, we can observe that as soon as the first digit is dialled, the system enters the Await Second Digit state and the timer is set to 20 milliseconds. If the next digit does not appear within 20 milliseconds, then the timer alarm expires and the system enters the Await Caller On-hook state and a beeping sound is produced. If the second digit occurs before 20 milliseconds, then the system transits to the Await Next Digit state. Await Next Digit
Second digit/ set timer (5 ms)
First digit/ set timer (5 ms)
Await Second Digit
Timer alarm/beeping Await Caller On-hook
Fig. 35.6 Model of an SS Type of Deadline Constraint
2.2.2. Response-Stimulus (RS)

In Sec. 1.3, we had considered the following example of an RS type of deadline constraint: Once the dial tone appears, the first digit must be dialed within 30 seconds, otherwise the system enters an idle state and an idle tone is produced.
The EFSM model for this constraint is shown in Fig. 35.7. In Fig. 35.7, as soon as dial tone appears, a timer is set to expire in 30 seconds and the system transits to the Await First Digit state. If the timer expires before the first digit arrives, then the system transits to an idle state where an idle tone is produced. Otherwise, if the digit appears first, then the system transits to the Await Second Digit state. Await Second Digit First digit
Timer alarm/idle tone Dial tone/ set timer (30 s) Await First Digit Idle
Fig. 35.7 Model of an RS Type of Deadline
2.2.3. StimulusResponse (SR)

In Sec. 1.3, we had considered the following example of an SR type of deadline constraint: Once the receiver of the hand set is lifted, the dial tone must be produced by the system within 20 seconds, otherwise a beeping sound is produced until the handset is replaced. The EFSM model for this constraint is shown in Fig. 35.8. As soon as the handset is lifted, a timer is set to expire after 2 sec and the system transits to Await Dial Tone state. If the dial tone appears first, then the system transits to Await First Digit state. Otherwise, it transits to Await Receiver On-hook state.
Await First Digit Dial tone
Hand set lift/ set timer (2 s)
Await Dial Tone
Timer alarm/beeping
Await Receiver On-hook
Fig. 35.8 Model of an SR Type of Deadline
Await First Digit Ring-back tone
Ring-tone/ set timer (2 s)
Await Ring-back Tone
Timer alarm/terminate call
Await Receiver On-hook
Fig. 35.9 Model of an RR Type of Deadline Constraint
2.2.4. ResponseResponse (RR)

In Sec. 1.3, we had considered the following example of an RR type of constraint: Once the ring tone is given to the callee, the corresponding ring back tone must be given to the caller within two seconds, otherwise the call is terminated. The EFSM model for this constraint is shown in Fig. 35.9. In Fig. 35.9, as soon as the ring tone is produced, the system transits to Await Ring-back Tone state, and a timer is set to expire Version 2 EE IIT, Kharagpur 11
in 2 seconds. If the ring-back tone appears first, the system transits to Await First Digit state, else it enters Await Receiver On-hook state, and the call is terminated.
2.2.5. Delay Constraint

A delay constraint between two events is one where after an event occurs, a minimum time must elapse before the other event can occur. We had considered the following example of delay constraint in Sec. 1.3: After a digit is dialed, the next digit should be dialed no sooner than 10 milliseconds. The EFSM model for it is shown in Fig. 35.10. In Fig. 35.10, if the next digit appears before the alarm, then the beeping sound is produced and the system transits to Await Caller On-hook state. Await Next Digit Timer alarm
First digit/ set timer (10 ms)
Await Next Event
Next digit/beeping
Await Caller On-hook
Fig. 35.10 Model of an SS Type of Delay Constraint
2.2.6. Durational Constraint

In case of a durational constraint, an event is required to occur for a specific duration. The example of a durational constraint we had considered in Sec. 1.3 is the following: If you press the button of the handset for less than 15 seconds it connects to the local operator. If you press the button for any duration lasting between 15 to 30 seconds, it connects to the international operator. If you keep the button pressed for more than 30 seconds, then on releasing it would produce the dial tone.
Local Operator Button release Button press Set alarm (15sec) Await Event 1 Button release
International Operator
Timer alarm/ Set Await alarm Event 2 (15sec) Timer alarm
Button release/ dial tone Await Button Release
Dial Tone
Fig. 35.11 A Model of a Durational Constraint The EFSM model for this example is shown in Fig. 35.11. Note that we have introduced two intermediate states Await Event 1 and Await Event 2 to model a durational constraint.
3. Exercises
1. 2. Mark the following as True or False. Justify your answer. a. A deadline constraint between two stimuli can be considered to be a behavioural constraint on the environment of the system. Identify and represent the timing constraints in the following air-defense system by means of an extended state machine diagram. Classify each constraint into either performance or behavioral constraint. Every incoming missile must be detected within 0.2 seconds of its entering the radar coverage area. The intercept missile should be engaged within 5 seconds of detection of the target missile. The intercept missile should be fired after 0.1 Seconds of its engagement but no later than 1 second. Represent a wash-machine having the following specification by means of an extended state machine diagram. The wash-machine waits for the start switch to be pressed. After the user presses the start switch, the machine fills the wash tub with either hot or cold water depending upon the setting of the HotWash switch. The water filling continues until the high level is sensed. The machine starts the agitation motor and continues agitating the wash tub until either the Version 2 EE IIT, Kharagpur 13
3.
4. 5.
6.
7.
8.
preset timer expires or the user presses the stop switch. After the agitation stops, the machine waits for the user to press the startDrying switch. After the user presses the startDrying switch, the machine starts the hot air blower and continues blowing hot air into the drying chamber until either the user presses the Stop switch or the preset timer expires. What is the difference between a performance constraint and a behavioral constraint? Give practical examples of each type of constraint. Represent the timing constraints in a collision avoidance task in an air surveillance system as an extended finite state machine (EFSM) diagram. The collision avoidance task consists of the following activities. The first subtask named radar signal processor processes the radar signal on a signal processor to generate the track record in terms of the targets location and velocity within 100 mSec of receipt of the signal. The track record is transmitted to the data processor within 1 mSec after the track record is determined. A subtask on the data processor correlates the received track record with the track records of other targets that come close to detect potential collision that might occur within the next 500 mSec. If a collision is anticipated, then the corrective action is determined within 10 mSec by another subtask running on the data processor. The corrective action is transmitted to the track correction task within 25 mSec. Consider the following (partial) specification of a real-time system: The velocity of a space-craft must be sampled by a computer on-board the spacecraft at least once every second (the sampling event is denoted by S). After sampling the velocity, the current position is computed (denoted by event C) within 100 msec, parallelly, the expected position of the space-craft is retrieved from the database within 200 msec (denoted by event R). Using these data, the deviation from the normal course of the spacecraft must be determined within 100 msec (denoted by event D) and corrective velocity adjustments must be carried out before a new velocity value is sampled in (the velocity adjustment event is denoted by A). Calculated positions must be transmitted to the earth station at least once every minute (position transmission event is denoted by the event T). Identify the different timing constraints in the system. Classify these into either performance or behavioral constraints. Construct an EFSM to model the system. Construct the EFSM model of a telephone system whose (partial) behavior is described below: After lifting the receiver handset, the dial tone should appear within 20 seconds. If a dial tone can not be given within 20 seconds, then an idle tone is produced. After the dial tone appears, the first digit should to be dialled within 10 seconds and the subsequent five digits within 5 seconds of each other. If the dialling of any of the digits is delayed, then an idle tone is produced. The idle tone continues until the receiver handset is replaced. What are the different types of timing constraints that can occur in a system? Give examples of each.
Module 7
Lesson 36
Software Design Part 1

At the end of this lesson, the student would be able to: Identify the software design activities State the desirable characteristics of a good software design Understand cohesion and coupling Explain the importance of functional independence in software design State the features of a function-oriented design approach State the features of an object-oriented design approach Differentiate between function-oriented and object-oriented design approach Identify the activities carried out during the structured analysis phase Explain the Data Flow Diagram and its importance in software design Explain the Data Dictionary and its importance Identify whether a DFD is balanced Draw the context diagram of any given problem Draw the DFD model of any given problem Develop the data dictionary for any given problem Identify common errors that can occur while constructing a DFD model Identify the shortcomings of a DFD model Differentiate between a structure chart and a flow chart Identify the activities carried out during transform analysis with examples Explain what is meant by transaction analysis
1. Introduction
The goal of the design phase is to transform the requirements specified in the SRS document into a structure that is suitable for implementation in some programming language. A good software design is seldom arrived by using a single step procedure, but requires several iterations through a series of steps. Design activities can be broadly classified into two important parts: Preliminary (or high-level) design and Detailed design High-level design means identification of different modules and the control relationships among them and the definition of the interfaces among these modules. The outcome of highlevel design is called the program structure or software architecture. During detailed design, the data structure and the algorithms of the different modules are designed. The outcome of the detailed design stage is usually known as the module-specification document.
1.1. Characteristics of a Good Software Design

However, most researchers and software engineers agree on a few desirable characteristics that every good software design for general application must possess. They are listed below: Correctness: A good design should correctly implement all the functionalities identified in the SRS document. Understandability: A good design is easily understandable. Efficiency: It should be efficient. Maintainability: It should be easily amenable to change.
1.2. Current Design Approaches

Most researchers and engineers agree that a good software design implies clean decomposition of the problem into modules, and the neat arrangement of these modules in a hierarchy. The primary characteristics of neat module decomposition are high cohesion and low coupling.
1.2.1. Cohesion
Most researchers and engineers agree that a good software design implies clean decomposition of the problem into modules, and the neat arrangement of these modules in a hierarchy. The primary characteristics of neat module decomposition are high cohesion and low coupling. Cohesion is a measure of functional strength of a module. A module having high cohesion and low coupling is said to be functionally independent of other modules. By the term functional independence, we mean that a cohesive module performs a single task or function. The different classes of cohesion that a module may possess are depicted in fig. 36.1.
Coincidental Low Logical Temporal Procedural Communicational Sequential Functional High
Fig. 36.1 Classification of Cohesion Coincidental cohesion: A module is said to have coincidental cohesion, if it performs a set of tasks that relate to each other very loosely, if at all. In this case, the module contains a random collection of functions. It is likely that the functions have been put in the module out of pure coincidence without any thought or design. Logical cohesion: A module is said to be logically cohesive, if all elements of the module perform similar operations, e.g. error handling, data input, data output, etc. An example of logical cohesion is the case where a set of print functions generating different output reports are arranged into a single module. Temporal cohesion: When a module contains functions that are related by the fact that all the functions must be executed in the same time span, the module is said to exhibit temporal
cohesion. The set of functions responsible for initialization, start-up, shutdown of some process, etc. exhibit temporal cohesion. Procedural cohesion: A module is said to possess procedural cohesion, if the set of functions of the module are all part of a procedure (algorithm) in which a certain sequence of steps have to be carried out for achieving an objective, e.g. the algorithm for decoding a message. Communicational cohesion: A module is said to have communicational cohesion, if all functions of the module refer to or update the same data structure, e.g. the set of functions defined on an array or a stack. Sequential cohesion: A module is said to possess sequential cohesion, if the elements of a module form the parts of sequence, where the output from one element of the sequence is input to the next. Functional cohesion: Functional cohesion is said to exist, if different elements of a module cooperate to achieve a single function. For example, a module containing all the functions required to manage employees pay-roll displays functional cohesion. Suppose a module displays functional cohesion, and we are asked to describe what the module does, then we would be able to describe it using a single sentence.
1.2.2. Coupling
Coupling between two modules is a measure of the degree of interdependence or interaction between the two modules. A module having high cohesion and low coupling is said to be functionally independent of other modules. If two modules interchange large amounts of data, then they are highly interdependent. The degree of coupling between two modules depends on their interface complexity. The interface complexity is basically determined by the number of types of parameters that are interchanged while invoking the functions of the module. Even if no techniques to precisely and quantitatively estimate the coupling between two modules exist today, classification of the different types of coupling will help to quantitatively estimate the degree of coupling between two modules. Five types of coupling can occur between any two modules as shown in fig. 36.2. Date Low Fig. 36.2 Classification of coupling Stamp Coupling: Two modules are stamped coupled, if they communicate using a composite data item such as a record in PASCAL or a structure in C. Control coupling: Control coupling exists between two couples, if data from one module is used to direct the order of instructions execution in another. An example of control coupling is a flag set in one module and tested in another module. Common coupling: Two modules are common coupled, if they share some global data items. Content coupling: Content coupling exists between two modules, if their code is shared, e.g. a branch from one module into another module. Version 2 EE IIT, Kharagpur 5 Stamp Control Common Content High
1.2.3. Functional Independence

A module having high cohesion and low coupling is said to be functionally independent of other modules. By the term functional independence, we mean that a cohesive module performs a single task or function. A functionally independent module has minimal interaction with other modules. Functional independence is a key to any good design primarily due to the following reasons: Error isolation: Functional independence reduces error propagation. The reason behind this is that if a module is functionally independent, its degree of interaction with the other modules is less. Therefore, any error existing in a module would not directly effect the other modules. Scope of reuse: Reuse of a module becomes possible- because each module does some welldefined and precise function and the interaction of the module with the other modules is simple and minimal. Therefore, a cohesive module can be easily taken out and reused in a different program. Understandability: Complexity of the design is reduced, because different modules can be understood in isolation as modules are more or less independent of each other.
1.2.4. Function-Oriented Design Approach

The following are the salient features of a typical function-oriented design approach: 1. A system is viewed as something that performs a set of functions. Starting at this highlevel view of the system, each function is successively refined into more detailed functions. For example, consider a function create-new-library member which essentially creates the record for a new member, assigns a unique membership number to him, and prints a bill towards his membership charge. This function may consist of the following sub-functions: assign-membership-number create-member-record print-bill Each of these sub-functions may be split into more detailed sub-functions and so on. 2. The system state is centralized and shared among different functions, e.g. data such as member-records is available for reference and updating to several functions such as: create-new-member delete-member update-member-record
1.2.5. Object-Oriented Design Approach

In the object-oriented design approach, the system is viewed as collection of objects (i.e. entities). The state is decentralized among the objects and each object manages its own state information. For example, in a Library Automation Software, each library member may be a separate object with its own data and functions to operate on these data. In fact, the functions defined for one object cannot refer or change data of other objects. Objects have their own internal data which define their state. Similar objects constitute a class. In other words, each object is a member of some class. Classes may inherit features from super class. Conceptually, objects communicate by message passing. Version 2 EE IIT, Kharagpur 6
1.2.6. Function-Oriented Vs. Object-Oriented Design

The following are some of the important differences between function-oriented and objectoriented design.
Unlike function-oriented design methods, in OOD, the basic abstraction are not real-
world functions such as sort, display, track, etc, but real-world entities such as employee, picture, machine, radar system, etc. For example in OOD, an employee pay-roll software is not developed by designing functions such as update-employeerecord, get-employee-address, etc. but by designing objects such as employees, departments, etc.
In object-oriented design, software is not developed by designing functions such as
update-employee-record, get-employee-address, etc., but by designing objects such as employee, department, etc.
In OOD, state information is not represented in a centralized shared memory but is
distributed among the objects of the system. For example, while developing an employee pay-roll system, the employee data such as the names of the employees, their code numbers, basic salaries, etc. are usually implemented as global data in a traditional programming system; whereas in an object-oriented system these data are distributed among different employee objects of the system. Objects communicate by passing messages. Therefore, one object may discover the state information of another object by interrogating it. Of course, somewhere or the other the real-world functions must be implemented.
Function-oriented techniques such as SA/SD group functions together if, as a group,
they constitute a higher-level function. On the other hand, object-oriented techniques group functions together on the basis of the data they operate on. To illustrate the differences between the object-oriented and the function-oriented design approaches, an example can be considered. Example: Fire-Alarm System The owner of a large multi-storied building wants to have a computerized fire alarm system for his building. Smoke detectors and fire alarms would be placed in each room of the building. The fire alarm system would monitor the status of these smoke detectors. Whenever a fire condition is reported by any of the smoke detectors, the fire alarm system should determine the location at which the fire condition is reported by any of the smoke detectors. The fire alarm system should determine the location at which the fire condition has occurred and then sound the alarms only in the neighboring locations. The fire alarm system should also flash an alarm message on the computer consol. Fire fighting personnel man the console round the clock. After a fire condition has been successfully handled, the fire alarm system should support resetting the alarms by the fire fighting personnel.
Function-Oriented Approach: /* Global data (system state ) accessible by various functions */ BOOL detector_status[MAX_ROOMS]; int detector_locs[MAX_ROOMS]; BOOL alarm_status[MAX_ROOMS];/* alarm activated when status is set */ int alarm_locs[MAX_ROOMS]; /* room number where alarm is located */ int neighbor-alarm[MAX_ROOMS][10]; /* each detector has at most 10 neighboring locations */ The functions which operate on the system state are: interrogate_detectors(); get_detector_location(); determine_neighbor(); ring_alarm(); reset_alarm(); report_fire_location(); Object-Oriented Approach: class detector attributes: status, location, neighbors operations: create, sense-status, get-location, find-neighbors class alarm attributes: location, status operations: create, ring-alarm, get_location, reset-alarm In the object oriented program, an appropriate number of instances of the class detector and alarm should be created. If the function-oriented and the object-oriented programs are examined, then it is seen that in the function-oriented program the system state is centralized and several functions on this central data is defined. In case of the object-oriented program, the state information is distributed among various objects. It is not necessary that an object-oriented design be implemented by using an object-oriented language only. However, an object-oriented language such as C++, supports the definition of all the basic mechanisms of class, inheritance, objects, methods, etc., and also supports all key object-oriented concepts that we have just discussed. Thus, an object-oriented language facilitates the implementation of an OOD. However, an OOD can as well be implemented using a conventional procedural language though it may require more effort to implement an OOD using a procedural language as compared to the effort required for implementing the same design using an object-oriented language. Even though object-oriented and function-oriented approaches are remarkably different approaches to software design, they do not replace each other but complement each other in some sense. For example, usually one applies the top-down function oriented techniques to design the internal methods of a class, once the classes are identified. In this case, though outwardly the system appears to have been developed in an object-oriented fashion, inside each class there may be a small hierarchy of functions designed in a top-down manner.
2. Function-Oriented Software Design

Function-oriented design techniques view a system as a black-box that performs a set of high-level functions. During the design process, these high-level functions are successively decomposed into more detailed functions and finally the different identified functions are mapped to modules. The term top-down decomposition is often used to denote such successive decompositions of a set of high-level functions into more detailed functions.
2.1. Structured Analysis

Structured analysis is used to carry out the top-down decomposition of a set of high-level functions depicted in the problem description and to represent them graphically. During structured analysis, functional decomposition of the system is achieved. That is, each function that the system performs is analysed and hierarchically decomposed into more detailed functions. Structured analysis technique is based on the following essential underlying principles: Top-down decomposition approach. Divide and conquer principle. Each function is decomposed independently. Graphical representation of the analysis results using Data Flow Diagrams (DFDs).
2.2. Data Flow Diagrams

The DFD (also known as a bubble chart) is a simple graphical formalism that can be used to represent a system in terms of the input data to the system, various processing carried out on these data, and the output data generated by the system. A DFD model uses a very limited number of primitive symbols (as shown in fig. 36.3) to represent the functions performed by a system and the data flow among these functions.
Data Store
Process
External Entity
Data Flow
Output
Fig. 36.3 Symbols used for designing DFDs The main reason why the DFD technique is so popular is probably because of the fact that DFD is a very simple formalism it is simple to understand and use. Starting with a set of highlevel functions that a system performs, a DFD model hierarchically represents various subfunctions. In fact, any hierarchical model is simple to understand. The human mind is such that it can easily understand any hierarchical model of a system because in a hierarchical model, starting with a very simple and abstract model of a system, different details of the system are slowly introduced through different hierarchies. The data flow diagramming technique also follows a very simple set of intuitive concepts and rules. DFD is an elegant modeling technique that turns out to be useful not only to represent the results of structured analysis of a software problem but also for several other applications such as showing the flow of documents or items in an organization.
2.2.1. Data Dictionary

A data dictionary lists all data items appearing in the DFD model of a system. The data items listed include all data flows and the contents of all data stores appearing on the DFDs in the DFD model of a system. A data dictionary lists the purpose of all data items and the definition of all composite data items in terms of their component data items. For example, a data dictionary entry may represent that the data grossPay consists of the components regularPay and overtimePay. grossPay = regularPay + overtimePay For the smallest units of data items, the data dictionary lists their name and their type. A data dictionary plays a very important role in any software development process because of the following reasons:
A data dictionary provides a standard terminology for all relevant data for use by
engineers working in a project. A consistent vocabulary for data items is very important, since in large projects different engineers of the project have a tendency to use different terms to refer to the same data, which unnecessarily causes confusion.
The data dictionary provides the analyst with a means to determine the definition of
different data structures in terms of their component elements.
2.3. DFD : Levels and Model

The DFD model of a system typically consists of several DFDs, viz., level 0 DFD, level 1 DFD, level 2 DFDs, etc. A single data dictionary should capture all the data appearing in all the DFDs constituting the DFD model of a system.
2.3.1. Balancing DFDs

The data that flow into or out of a bubble must match the data flow at the next level of DFD. This is known as balancing a DFD. The concept of balancing a DFD has been illustrated in fig. 36.4. In the level 1 of the DFD, data items d1 and d3 flow out of the bubble 0.1 and the data item d2 flows into the bubble P1. In the next level, bubble 0.1 is decomposed. The decomposition is balanced, as d1 and d3 flow out of the level 2 diagram and d2 flows in.
d3 P1 0.1 d2 d1 P3 0.3 (a) Level 1 DFD d4 P2 0.2
d2 P11 0.1.1 d21 d23
P12 0.1.2 d1 d22
P13 0.1.3 d3
(b) Level 2 DFD Fig. 36.4 An example showing balanced decomposition
2.3.2. Context Diagram

The context diagram is the most abstract data flow representation of a system. It represents the entire system as a single bubble. This bubble is labeled according to the main function of the system. The various external entities with which the system interacts and the data flow occurring between the system and the external entities are also represented. The data input to the system and the data output from the system are represented as incoming and outgoing arrows. These data flow arrows should be annotated with the corresponding data names. The name context diagram is well justified because it represents the context in which the system is to exist, i.e. the external entities who would interact with the system and the specific data items they would be supplying the system and the data items they would be receiving from the system. The context diagram is also called the level 0 DFD.
To develop the context diagram of the system, we have to analyse the SRS document to identify the different types of users who would be using the system and the kinds of data they would be inputting to the system and the data they would be receiving from the system. Here, the term users of the system also includes the external systems which supply data to or receive data from the system. The bubble in the context diagram is annotated with the name of the software system being developed (usually a noun). This is in contrast with the bubbles in all other levels which are annotated with verbs. This is expected since the purpose of the context diagram is to capture the context of the system rather than its functionality. Example 1: RMS Calculating Software A software system called RMS calculating software would read three integral numbers from the user in the range of -1000 and +1000 and then determine the root mean square (rms) of the three input numbers and display it. In this example, the context diagram (fig. 36.5) is simple to draw. The system accepts three integers from the user and returns the result to him. User
dataitems
rms
rms Calculator 0
Fig. 36.5 Context Diagram Example 2: Tic-Tac-Toe Computer Game Tic-tac-toe is a computer game in which a human player and the computer make alternative moves on a 3 3 square. A move consists of marking previously unmarked square. The player, who is first to place three consecutive marks along a straight line (i.e. along a row, column, or diagonal) on the square, wins. As soon as either of the human player or the computer wins, a message congratulating the winner should be displayed. If neither player manages to get three consecutive marks along a straight line, nor all the squares on the board are filled up, then the game is drawn. The computer always tries to win a game. The context diagram of this problem is shown in fig. 36.6.
display
Tic-Tac-Toe Software
Human Player
move
Fig. 36.6 Context diagram for tic-tac-toe computer game
2.3.3. Developing the DFD Model

A DFD model of a system graphically depicts the transformation of the data input to the system to the final result through a hierarchy of levels. A DFD starts with the most abstract definition of the system (lowest level) and at each higher level DFD, more details are successively introduced. To develop a higher-level DFD model, processes are decomposed into their sub-processes and the data flow among these sub-processes is identified. To develop the data flow model of a system, first the most abstract representation of the problem is to be worked out. The most abstract representation of the problem is also called the context diagram. After, developing the context diagram, the higher-level DFDs have to be developed.
Context Diagram
Level 1 DFD: To develop the level 1 DFD, examine the high-level functional requirements. If there are between 3 to 7 high-level functional requirements, then these can be directly represented as bubbles in the level 1 DFD. We can then examine the input data to these functions, the data output by these functions, and represent them appropriately in the diagram. If a system has more than 7 high-level functional requirements, then some of the related requirements have to be combined and represented in the form of a bubble in the level 1 DFD. Such a bubble can be split in the lower DFD levels. If a system has less than three high-level functional requirements, then some of them need to be split into their sub-functions so that we have roughly about 5 to 7 bubbles on the diagram. Decomposition: Each bubble in the DFD represents a function performed by the system. The bubbles are decomposed into sub-functions at the successive levels of the DFD. Decomposition of a bubble is also known as factoring or exploding a bubble. Each bubble at any level of DFD is usually decomposed to anything between 3 to 7 bubbles. Too few bubbles at any level make that level superfluous. For example, if a bubble is decomposed to just one bubble or two bubbles, then this decomposition becomes redundant. Also, too many bubbles, i.e. more than 7 bubbles at any level of a DFD makes the DFD model hard to understand. Decomposition of a bubble should be carried on until a level is reached at which the function of the bubble can be described using a simple algorithm.
Numbering the Bubbles: It is necessary to number the different bubbles occurring in the DFD. These numbers help in uniquely identifying any bubble in the DFD from its bubble number. The bubble at the context level is usually assigned the number 0 to indicate that it is the 0 level DFD. Bubbles at level 1 are numbered, 0.1, 0.2, 0.3, etc. When a bubble numbered x is decomposed, its children bubble are numbered x.1, x.2, x.3, etc. In this numbering scheme, by looking at the number of a bubble, we can unambiguously determine its level, its ancestors and its successors. Example: Supermarket Prize Scheme A supermarket needs to develop the following software to encourage regular customers. For this, the customer needs to supply his/her residence address, telephone number and the driving license number. Each customer who registers for this scheme is assigned a unique customer number (CN) by the computer. A customer can present his CN to the check out staff when he makes any purchase. In this case, the value of his purchase is credited against his CN. At the end of each year, the supermarket intends to award surprise gifts to 10 customers who make the highest total purchase over the year. Also, it intends to award a 22 carat gold coin to every customer whose purchase exceeds Rs.10,000. The entries against the CN are the reset on the day of every year after the prize winners lists are generated.
Sales-clerk
Sales details
Winner-list
Manager
Supermarket software 0 Gen-winner command Customerdetails

CN
Customer Fig. 36.7 Context diagram for supermarket problem The context diagram for this problem is shown in fig. 36.7, the level 1 DFD in fig. 36.8, and the level 2 DFD in fig. 36.9.
Customer-details Registercustomer 0.1

CN
Sales details Registersales 0.3
Customer-data
Sales-info
Generatewinner-list 0.2 Winner-list Generate-winner-command
Fig. 36.8 Level 1 diagram for supermarket problem Generate-winner-command Surprise-gift winner-list Gen-surprisegift-winner 0.2.1 Sales-info Find-totalsales 0.2.3
Total-sales Sales-info Gen-goldcoin-giftGold-coin- winner winner-list 0.2.2 Reset 0.2.3
Fig. 36.9 Level 2 diagram for supermarket problem Version 2 EE IIT, Kharagpur 15
Data Dictionary for the DFD Model address: name + house# + street# + city + pin sales-details: {item + amount}* + CN CN: integer customer-data: {address + CN}* sales-info: {sales-details}* winner-list: surprise-gift-winner-list + gold-coin-winner-list surprise-gift-winner-list: {address + CN}* gold-coin-winner-list: {address + CN}* gen-winner-command: command total-sales: {CN + integer}*
2.3.4. Common Errors in Constructing DFD Model

Although DFDs are simple to understand and draw, students and practitioners alike encounter similar types of problems while modelling software problems using DFDs. While learning from experience is a powerful thing, it is an expensive pedagogical technique in the business world. It is therefore helpful to understand the different types of mistakes that users usually make while constructing the DFD model of systems.
Many beginners commit the mistake of drawing more than one bubble in the context
diagram. A context diagram should depict the system as a single bubble.

Many beginners have external entities appearing at all levels of DFDs. All external
entities interacting with the system should be represented only in the context diagram. The external entities should not appear at other levels of the DFD.
It is a common oversight to have either too less or too many bubbles in a DFD. Only 3
to 7 bubbles per diagram should be allowed, i.e. each bubble should be decomposed to between 3 and 7 bubbles.
Many beginners leave different levels of DFD unbalanced. A common mistake committed by many beginners while developing a DFD model is
attempting to represent control information in a DFD. It is important to realize that a DFD is the data flow representation of a system and it does not represent control information. The following examples represent some mistakes of this kind:
A book can be searched in the library catalogue by inputting its name. If the book is available in the library, then the details of the book are displayed. If the book is not listed in the catalogue, then an error message is generated. While generating the DFD model for this simple problem, many beginners commit the mistake of drawing an arrow (as shown in fig. 36.10) to indicate the error function is invoked after the search book. But, this is a control information and should not be shown on the DFD.
Key Words Show- Error-message errormessage
Searchbook Search-results
Fig. 36.10 To show control information on a DFD A mistake
Another error is trying to represent when or in what order different functions (processes) are invoked and the conditions under which different functions are invoked. If a bubble A invokes either the bubble B or the bubble C depending upon some conditions, we need only to represent the data that flows between bubbles A and B or bubbles A and C and not the conditions depending on which the two modules are invoked.
A data store should be connected only to bubbles through data arrows. A data store
cannot be connected to either another data store or to an external entity.

All the functionalities of the system must be captured by the DFD model. No function
of the system specified in its SRS document should be overlooked.

Only those functions of the system specified in the SRS document should be
represented, i.e. the designer should not assume functionality of the system not specified by the SRS document and then try to represent them in the DFD.
Improper or unsatisfactory data dictionary. The data and function names must be intuitive. Some students and even practicing
engineers use symbolic data names such a, b, c, etc. Such names hinder understanding the DFD model.
2.3.5. Shortcomings of a DFD Model

DFD models suffer from several shortcomings. The important shortcomings of the DFD models are the following:
DFDs leave ample scope to be imprecise. In the DFD model, we judge the function
performed by a bubble from its label. However, a short label may not capture the entire functionality of a bubble. For example, a bubble named find-book-position has only intuitive meaning and does not specify several things, e.g. what happens when some input information is missing or is incorrect. Further, the find-book-position bubble may not convey anything regarding what happens when the required book is missing.
Control aspects are not defined by a DFD. For instance, the order in which inputs are
consumed and outputs are produced by a bubble is not specified. A DFD model does not specify the order in which the different bubbles are executed. Representation of such aspects is very important for modeling real-time systems. Version 2 EE IIT, Kharagpur 17
The method of carrying out decomposition to arrive at the successive levels and the
ultimate level to which decomposition is carried out are highly subjective and depend on the choice and judgment of the analyst. Due to this reason, even for the same problem, several alternative DFD representations are possible. Further, many a times it is not possible to say which DFD representation is superior or preferable to another.
The data flow diagramming technique does not provide any specific guidance as to
how exactly to decompose a given function into its sub-functions and we have to use subjective judgment to carry out decomposition.
2.3.6. Extending DFD Technique To Real-Time Systems

The aim of structured design is to transform the results of the structured analysis (i.e. a DFD representation into a structure chart). A structure chart represents the software architecture, i.e. the various modules making up the system, the module dependency, and the parameters that are passed among the different modules. Since the main focus in a structure chart representation is on the module structure of software and the interaction between the different modules, the procedural aspects are not represented. A real-time system is one where the functions must not only produce correct result but also should produce them by some pre-specified time. For real-time systems since reasoning about time is important to come up with a correct design, explicit representation of control and event flow aspects are essential. One of the widely accepted techniques for extending the DFD technique to real-time system analysis is the Ward and Mellor technique [1985]. In the Ward and Mellor notation, a type of process that handles only control flows is introduced. These processes representing control processing are denoted using dashed bubbles. Control flows are shown using dashed lines/arrows. Unlike Ward and Mellor, Hatley and Pirbhai [1987] show the dashed and solid representations on separate diagrams. To be able to separate the data processing and the control processing aspects, a Control Flow Diagram (CFD) is defined. This reduces the complexity of the diagrams. In order to link the data processing and control processing diagrams, a notational reference (solid bar) to a control specification is used. The CSPEC describes the following: The effect of an external event or control signal The processes that are invoked as a consequence of an event Control specifications represent the behaviour of the system in two different ways: It contains a state transition diagram (STD). The STD is a sequential specification of behaviour. It contains a program activation table (PAT). The PAT is a combinational specification of behaviour. PAT represents invocation sequence of bubbles in a DFD.
2.4. Structured Design

The aim of structured design is to transform the results of the structured analysis (i.e. a DFD representation into a structure chart). A structure chart represents the software architecture, i.e. the various modules making up the system, the module dependency, and the parameters that are passed among the different modules. Since the main focus in a structure chart representation is on the module structure of software and the interaction between the different modules, the procedural aspects are not represented.
2.4.1. Flow Chart Vs. Structure Chart

We are all familiar with the flow chart representation of a program. Flow chart is a convenient technique to represent the flow of control in a program. A structure chart differs from a flow chart in three principal ways: It is usually difficult to identify the different modules of the software from its flow chart representation. Sequential ordering of tasks inherent in a flow chart is suppressed in a structure chart.
2.4.2. Transformation of a DFD into a Structure Chart

Systematic techniques are available to transform the DFD representation of a problem into a module structure represented by a structure chart. Structured design provides two strategies: Transform Analysis Transaction Analysis
2.4.3. Transform Analysis

Transform analysis identifies the primary functional components (modules) and the high level inputs and outputs for these components. The first step in transform analysis is to divide the DFD into 3 types of parts: Input Logical processing Output The input portion of the DFD includes processes that transform input data from physical (e.g. character from terminal) to logical forms (e.g. internal tables, lists, etc.). Each input portion is called an afferent branch. The output portion of a DFD transforms output data from logical to physical form. Each output portion is called efferent branch. The remaining portion of a DFD is called central transform. In the next step of transform analysis, the structure chart is derived by drawing one functional component for the central transform, and the afferent and efferent branches. These are drawn below a root module, which would invoke these modules. Identifying the highest level input and output transforms requires experience and skill. One possible approach is to trace the inputs until a bubble is found whose output cannot be deduced from its inputs alone. Processes which validate input or add information to them are not central transforms. Processes which sort input or filter data from it are. The first level structure chart is produced by representing each input and output unit as boxes and each central transform as a single box. In the third step of transform analysis, the structure chart is refined by adding sub-functions required by each of the high-level functional components. Many levels of functional components may be added. This process of breaking functional components into subcomponents is called factoring. Factoring includes adding read and write modules, error-handling modules, initialization and termination process, identifying customer modules etc. The factoring process is continued until all bubbles in the DFD are represented in the structure chart. Example: Structure chart for the RMS software Version 2 EE IIT, Kharagpur 19
For this example, the context diagram was drawn earlier. To draw the level 1 DFD (fig. 36.11), from a cursory analysis of the problem description, we can see that there are four basic functions that the system needs to perform accept the input numbers from the user, validate the numbers, calculate the root mean square of the input numbers and, then display the result.
data-items
validateinput 0.1
valid-data
computerms 0.2
rms
displayresult 0.3
rms
Fig. 36.11 Level 1 DFD By observing the level 1 DFD, we identify the validate-input as the afferent branch, and write-output as the efferent branch, and the remaining (i.e. compute-rms) as the central transform. By applying the step 2 and step 3 of transform analysis, we get the structure chart shown in fig. 36.12.
main
valid-data valid-data rms
rms
get-gooddata
data-items data-items
computerms
valid-data
write-result
read-input
validateinput
Fig. 36.12 Structure chart
2.4.4. Transaction Analysis

A transaction allows the user to perform some meaningful piece of work. Transaction analysis is useful while designing transaction processing programs. In a transaction-driven system, one of several possible paths through the DFD is traversed depending upon the input data item. This is in contrast to a transform centred system which is characterized by similar processing steps for each data item. Each different way in which input data is handled is a transaction. A simple way to identify a transaction is to check the input data. The number of bubbles on which the input data to the DFD are incident defines the number of transactions. However, some transactions may not require any input data. These transactions can be identified from the experience of solving a large number of examples. For each identified transaction, trace the input data to the output. All the traversed bubbles belong to the transaction. These bubbles should be mapped to the same module on the structure chart. In the structure chart, draw a root module and below this module draw each identified transaction a module. Every transaction carries a tag, which identifies its type. Transaction analysis uses this tag to divide the system into transaction modules and a transaction-centre module. The structure chart for the supermarket prize scheme software is shown in fig. 36.13.
root
customerregistration
sales-registration winner-listgeneration Gen-winnerlist registersales
registercustomer customerdetails
CN
totalsales
surprise gold totallist coin sales list totalsales gengoldcoinwinnerlist salesdetails getsalesdetails sales-details
salesinfo getcustomerdetails generateCN find-totalsales
gensurprisegift-list
recordsalesdetails
Fig. 36.13 Structure chart for the supermarket prize scheme
3. Exercises
1. Mark the following as True or False. Justify your answer. a. Coupling between two modules is nothing but a measure of the degree of dependence between them. b. The primary characteristic of a good design is low cohesion and high coupling. c. A module having high cohesion and low coupling is said to be functionally independent of other modules. d. The degree of coupling between two modules does not depend on their interface complexity. e. In the function-oriented design approach, the system state is decentralized and not shared among different functions. f. The essence of any good function-oriented design technique is to map the functions performing similar activities into a module. g. In the object-oriented design, the basic abstraction is real-world functions. h. An OOD (Object-Oriented Design) can be implemented using object-oriented languages only. i. A DFD model of a system represents the functions performed by the system and the data flow taking place among these functions. j. A data dictionary lists all data items appearing in the DFD model of a system but does not capture the composition relationship among the data. k. The context diagram of a system represents it using more than one bubble. l. A DFD captures the order in which the processes (bubbles) operate. m. There should be at the most one control relationship between any two modules in a properly designed structure chart. For the following, mark all options which are true. a. The desirable characteristics that every good software design need are Correctness Understandability Efficiency Maintainability All of the above b. A module is said to have logical cohesion, if it performs a set of tasks that relate to each other very loosely. all the functions of the module are executed within the same time span. all elements of the module perform similar operations, e.g. error handling, data input, data output, etc. None of the above. c. High coupling among modules makes it difficult to understand and maintain the product difficult to implement and debug expensive to develop the product as the modules having high coupling cannot be developed independently all of the above d. The desirable characteristics that every good software design need are error isolation scope of reuse Version 2 EE IIT, Kharagpur 22
2.
e.
f.
g.
h.
i.
j.
k.
l.
m.
understandability all of the above The purpose of structured analysis is to capture the detailed structure of the system as perceived by the user to define the structure of the solution that is suitable for implementation in some programming language all of the above Structured analysis technique is based on top-down decomposition approach bottom-up approach divide and conquer principle none of the above Data Flow Diagram (DFD) is also known as a: structure chart bubble chart Gantt chart PERT chart The context diagram of a DFD is also known as level 0 DFD level 1 DFD level 2 DFD none of the above Decomposition of a bubble is also known as classification factoring exploding aggregation Decomposition of a bubble should be carried on till the atomic program instructions are reached up to two levels until a level is reached at which the function of the bubble can be described using a simple algorithm none of the above The bubbles in a level-1 DFD represent exactly one high-level functional requirement described in SRS document more than one high-level functional requirement part of a high-level functional requirement any of the above depending on the problem By looking at the structure chart, we can say whether a module calls another module just once or many times not say whether a module calls another module just once or many times tell the order in which the different modules are invoked not tell the order in which the different modules are invoked In which of the following ways does a structure chart differ from a flow chart? it is always difficult to identify the different modules of the software from its flow chart representation Version 2 EE IIT, Kharagpur 23
3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15.
data interchange among different modules is not presented in a flow chart sequential ordering of tasks inherent in a flow chart is suppressed in a structure chart none of the above n. The input portion in the DFD that transforms input data from physical to logical form is called central transform efferent branch afferent branch none of the above o. If during structured design, you observe that the data entering a DFD are incident on different bubbles, then you would use: transform analysis transaction analysis combination of transform and transaction analysis neither transform nor transaction analysis p. During detailed design, which of the following activities take place? the pseudo code for the different modules of the structure chart are developed in the form of MSPECs data structures are designed for the different modules of the structure chart module structure is designed none of the above State the major design activities. Identify separately, the activities undertaken during highlevel design and detailed design. Why is functional independence of a module a key factor for a good software design? What the salient features of a function-oriented design approach and object-oriented design approach. Differentiate between both these approaches. Identify the aim of the structured analysis activity. Which documents are produced at the end of structured analysis activity? Identify the necessity of constructing DFDs in the context of a good software design. Write down the importance of data dictionary in the context of good software design. Explain the term balancing a DFD with an example Discuss the essential activities required to develop the DFD of a system more systematically. What do you understand by top-down decomposition in the context of structured analysis? Explain with a suitable example. Identify the common errors made during construction of a DFD model. Identify the shortcomings of the DFD model. Differentiate between a structure chart and a flow chart. Explain transform analysis with a suitable example. Explain transaction analysis with an example.
Module 7
Lesson 37
Software Design Part 2

At the end of this lesson, the student would be able to: Define classes, objects, attributes, and methods Explain data abstraction and identify its advantages Explain inheritance, and identify the different types of inheritance and its advantages Explain encapsulation and identify its advantages Explain polymorphism, static and dynamic binding Identify the advantages of object-oriented design Explain what a model is Understand the different views of the system that are captured by UML diagrams Explain the use case model of a system Factorize use cases into different component use cases Explain the relationships among classes by means of association, aggregation, and composition Draw interaction diagrams for a given problem Draw activity diagrams for a given problem Develop the state chart diagram for a given class Explain design patterns Identify pattern solution for a particular problem in terms of class and interaction diagrams Explain expert pattern, creator pattern, controller pattern, and model-view-separation pattern Explain domain modelling Identify the types of objects identified during domain analysis and explain their interaction Identify the different approaches for identifying objects in the context of OOD methodology Develop sequence diagram for a use case
1. Object-Oriented Concepts An Introduction

In this section, first we will discuss the basic mechanisms in object-oriented paradigm. We will then discuss some key concepts and a few related technical terms.
1.1. Basic Entities

Classes, Objects, Attributes, and Methods: Similar objects constitute a class. This means, objects possessing similar attributes and displaying similar behaviour constitute a class. For example, all employee objects have similar attributes such as his name, code number, salary, address, etc. and exhibits similar behaviour as other employee objects. Once the class is defined, it serves as a template for object creation. Since each object is created as an instance of some class, classes can be considered as abstract data types (ADTs). In the object-oriented approach, a system is designed as a set of interfacing objects. Normally, each object represents a tangible real-world entity such as a library member, an employee, a book, etc. Objects are basically class variables. However, at times some conceptual entities can be considered as objects (e.g. a scheduler, a controller, etc.) to simplify solutions to certain problems. When a system is analysed, developed, and implemented in terms of the natural objects occurring in it, it becomes easier to understand the design and implementation of the system. Each object essentially consists of some data that are private to the object and a set of functions that operate on those data as shown in fig.37.1. In fact, the functions of an object have the sole authority to operate on the private data of that object. Therefore, an object can not directly access the data internal to another object. However, an object can indirectly access the internal data of the other objects by invoking the operations (i.e. methods) supported by those objects. m7 m8
m1 m2
Data
m5 m6
Object
mi are methods of the object m 3m 4 Fig. 37.1 A Model of an object Version 2 EE IIT, Kharagpur 4
The data internal to an object are called the attributes of the object, and the functions supported by an object are called its methods. Fig. 37.2 shows LibraryMember class with eight attributes and five methods.
1.2. Data Abstraction

Data abstraction means that each object hides (abstracts away) from other objects the exact way in which its internal information is organized and manipulated. It only provides a set of methods, which other objects can use for accessing and manipulating this private information of the object. Other objects can not directly access the private data of an object. For example, a stack object might store its internal data either in the form of an array of values or in the form of a linked list. Other objects would not know how exactly this object has stored its data and how it manipulates its data. What they would know is the set of methods such as push, pop, and top-ofstack that it provides to the other objects for accessing and manipulating the data.
1.2.1. Advantages of Data Abstraction

An important advantage of the principle of data abstraction is that it reduces coupling among the objects. Therefore, it reduces the overall complexity of a design, and helps in maintenance and code reuse. LibraryMember
Member Name Membership Number Address Phone Number E-Mail Number Membership Admission Date Membership Expiry Date Books Issued issueBook(); findPendingBooks(); findOverdueBooks(); returnBook(); findMembershipDetails();
Fig. 37.2 A class model with attributes and methods
1.3. Inheritance
The inheritance feature allows us to define a new class by extending or modifying an existing class. The original class is called the base class (or super class) and the new class obtained through inheritance is called as the derived class (or sub class). A base class is a generalization of its derived classes. This means that the base class contains only those properties that are common to all the derived classes. Again each derived class is a specialization of its base class because it Version 2 EE IIT, Kharagpur 5
modifies or extends the basic properties of the base class in certain ways. Thus, the inheritance relationship can be viewed as a generalization-specialization relationship. Using the inheritance relationship, different classes can be arranged in a class hierarchy (or class tree). In addition to inheriting all properties of the base class, a derived class can define new properties. That is, it can define new data and methods. It can even give new definitions to methods which already exist in the base class. Redefinition of methods which existed in the base class is called as method overriding. In fig. 37.3, LibraryMember is the base class for the derived classes Faculty, Student, and Staff. Similarly, Student is the base class for the derived classes Undergraduate, Postgraduate, and Research. Each derived class inherits all the data and methods of the base class. It can also define additional data and methods or modify some of the inherited data and methods. The different classes in a library automation system and the inheritance relationship among them are shown in the fig. 37.3. The inheritance relationship has been represented in fig. 37.3 using a directed arrow drawn from a derived class to its base class. In fig. 37.3, the LibraryMember base class might define the data for name, address, and library membership number for each member. Though Faculty, Student, and Staff classes inherit these data, they might have to redefine the respective issue-book methods because the number of books that can be borrowed and the duration of loan may be different for the different category of library members. Thus, the issue-book method is overridden by each of the derived classes and the derived classes might define additional data max-number-books and max-duration-ofissue which may vary for the different member categories. Library Member Bass class
Faculty
Student
Staff
Derived Classes
Under Graduate
Post Graduate
Research
Fig. 37.3 Library Information System Example
1.3.1. Object-Oriented Vs. Object-Based Languages

Languages that support classes (Abstract Data Types) but do not support inheritance are called object-based languages. On the other hand, languages that support both classes as well as inheritance are called object-oriented languages.
1.3.2. Advantages of Inheritance

An important advantage of the inheritance mechanism is code reuse. If certain methods or data are similar in asset of classes, then instead of defining these methods and data each of these Version 2 EE IIT, Kharagpur 6
classes separately, these methods and data are defined only once in the base class and are inherited by each of its subclasses. For example, in the Library Information System example of fig. 37.3, each category of member objects Faculty, Student, and Staff need the data membername, member-address, and membership-number and therefore these data are defined in the base class LibraryMember and inherited by its subclasses. Another advantage of the inheritance mechanism is the conceptual simplification that comes from reducing the number of independent features of the classes.
1.3.3. Multiple Inheritance

Multiple inheritance is a mechanism by which a sub class can inherit attributes and methods from more than one base class. Suppose research students are also allowed to be staff of the institute, then some of the characteristics of the Research class might be similar to the Student class and some other characteristics might be similar to the Staff class. Such a class hierarchy can be represented as in fig. 37.4. Multiple inheritance is represented by arrows drawn from the subclass to each of the base classes. The class Research inherits features from both the classes Student and Staff. Library Member Base class
Faculty
Student
Staff
Derived Classes
Under Graduate
Post Graduate
Research
Multiple Inheritance
Fig. 37.4 Library Information System example with multiple inheritance.
1.4. Encapsulation
The property of an object by which it interfaces with the outside world only through messages is referred to as encapsulation. The data of an object are encapsulated within its methods and are available only through message-based communication. This concept is schematically represented in fig. 37.5.
m3
m4
m2
Data
m5
m1 Methods
m6
Fig. 37.5 Schematic representation of the concept of encapsulation
1.4.1. Advantages Of Encapsulation

Encapsulation offers three important advantages:
It protects an objects internal data from corruption by other objects. This protection
includes protection from unauthorized access and protection from different types of problems that arise from concurrent access of data such as deadlock and inconsistent values.
Encapsulation hides the internal structure of an object so that interaction with the
object is simple and standardized. This facilitates reuse of objects across different projects. Furthermore, if the internal structure or procedures of an object are modified, other objects are not affected. This results in easy maintenance.
Since objects communicate among each other using messages only, they are weakly
coupled. The fact that objects are inherently weakly coupled enhances understanding of design since each object can be studied and understood almost in isolation from other objects.
1.5. Polymorphism
Polymorphism literally means polymorphism denotes the following:
poly
(many) morphs
(forms).
Broadly speaking,
The same message can result in different actions when received by different objects. This is also referred to as static binding. This occurs when multiple methods with the same operation name exist. When we have an inheritance hierarchy, an object can be assigned to another object of its ancestor class. When such an assignment occurs, a method call to the ancestor object would result in the invocation of the appropriate method of the object of the derived class. The exact method to which a method call would be bound cannot be known at compile time, and is dynamically decided at the runtime. This is also known as dynamic binding.
1.5.1. Static Binding

An example of static binding is the following. Suppose a class named Circle has three definitions for the create operation. One definition does not take any argument and creates a circle with default parameters. The second definition takes the center point and radius as its parameters. In this case, the fill style values for the circle would be set to default no fill. The third takes the centre point, the radius, and the fill style as its input. When the create method is invoked, depending on the parameters given in the invocation, the appropriate method will be called. If create is invoked with no parameters, then a default circle would be created. If only the centre and the radius are supplied, then an appropriate circle would be created with no fill type, and so on. A class definition of the Circle class with the overloaded create method is shown in fig. 37.6. When the same operation (e.g. create) is implemented by multiple methods, the method name is said to be overloaded. Class Circle { private: float x, y, radius; int fillType; public: create(); create (float x, float y, float centre); create (float x, float y, float centre, int fillType); } Fig. 37.6 Circle class with overloaded create method
1.5.2. Dynamic Binding

Using dynamic binding a programmer can send a generic message to a set of objects which may be of different types (i.e., belonging to different classes) and leave the exact way in which the message would be handled to the receiving objects. Suppose we have a class hierarchy of different geometric objects in a drawing as shown in fig. 37.7. Now, suppose the display method is declared in the shape class and is overridden in each derived class. If the different types of geometric objects making up a drawing are stored in an array of type shape, then a single call to the display method for each object would take care to display the appropriate drawing element. That is, the same draw call to a shape object would take care of drawing the appropriate shape. This code segment is shown in fig. 37.8.
Shape
Circle
Rectangle
Line
Ellipse
Square
Cube Fig. 37.7 Class hierarchy of geometric objects Traditional Code

if(shape == Circle) then draw_circle(); else if(shape == Rectangle)then draw_rectangle(); ____ ____
Object-oriented Code
shape.draw();
Fig. 37.8 Traditional code and object-oriented code using dynamic binding
1.5.3. Advantages of Dynamic Binding

The main advantage of dynamic binding is that it leads to elegant programming and facilitates code reuse and maintenance. With dynamic binding, new derived objects can be added with minimal changes to existing objects. This advantage of polymorphism can be illustrated by comparing the code segments of an object-oriented program and a traditional program for drawing various graphic objects on the screen. It can be assumed that the shape is the base class, and the classes Circle, Rectangle, and Ellipse are derived from it. Now, shape can be assigned any objects of type Circle, Rectangle, Ellipse, etc. But, a draw method invocation of the shape object would invoke the appropriate method. It can be easily seen in fig. 37.8 that, because of dynamic binding, the object-oriented code is much more concise and intellectually appealing. Also, suppose in the example program segment, it is later found necessary to handle a new graphics drawing primitive, say Ellipse, then, the procedural code has to be changed by adding a
new if-then-else clause. However, in case of the object-oriented program, the code need not change. Only a new class called Ellipse has to be defined.
1.6. Advantages of the Object-Oriented Design

In the last few years that OOD has come into existence, it has found widespread acceptance in industry as well as in academics. The main reason for the popularity of OOD is that it holds the following promises: Code and design reuse. Increased productivity. Ease of testing and maintenance. Better code and design understandability. Out of all these advantages, the chief advantage of OOD is improved productivity which comes about due to a variety of factors, such as Code reuse by the use of predefined class libraries. Code reuse due to inheritance. Simpler and more intuitive abstraction, i.e. better organization of inherent complexity. Better problem decomposition.
2. Object Modelling using UML 2.1. Model and its uses

A model captures aspects important for some application while omitting (or abstracting) the rest. A model in the context of software development can be graphical, textual, mathematical, or program code-based. Models are very useful in documenting design and analysis results. Models also facilitate the analysis and design procedures themselves. Graphical models are very popular because they are easy to understand and construct. UML is primarily a graphical modelling tool. However, it often requires text explanations to accompany the graphical models. An important reason behind constructing a model is that it helps manage complexity. Once models of a system have been constructed, they can be used for a variety of purposes during software development, including the following: Code reuse by the use of predefined class libraries Analysis Specification Code generation Design Visualize and understand the problem and the working of a system Testing, etc. In all these applications, the UML models can not only be used to document the results but also to arrive at the results themselves. Since a model can be used for a variety of purposes, it is reasonable to expect that the model would vary depending on the purpose for which it is being constructed. For example, a model developed for initial analysis and specification should be very different from the one used for design. A model that is being used for analysis and specification would not show any of the design decisions that would be made later on during the design stage. On the other hand, a model used for design purposes should capture all the design decisions. Therefore, it is a good idea to explicitly mention the purpose for which a model has been developed, along with the model. Version 2 EE IIT, Kharagpur 11
2.2. UML diagrams

UML can be used to construct nine different types of diagrams to capture five different views of a system. Just as a building can be modelled from several views (or perspectives) such as ventilation perspective, electrical perspective, lighting perspective, heating perspective, etc.; the different UML diagrams provide different perspectives of the software system to be developed and facilitate a comprehensive understanding of the system. Such models can be refined to get the actual implementation of the system. The UML diagrams can capture the following five views of a system: Users view Structural view Behavioral view Implementation view Environmental view Fig. 37.9 shows the UML diagrams responsible for providing the different views. Users view: This view defines the functionalities (facilities) made available by the system to its users. The users view captures the external users view of the system in terms of the functionalities offered by the system. The users view is a black-box view of the system where the internal structure, the dynamic behavior of different system components, the implementation etc. are not visible. The users view is very different from all other views in the sense that it is a functional model compared to the object model of all other views. The users view can be considered as the central view and all other views are expected to conform to this view. This thinking is in fact the crux of any user centric development style.
Structural View Behavioral View

- Class Diagram - Object Diagram Sequence Diagram Collaboration Diagram State-chart-Dia Users View Activity Dia - Use Case Diagram Environmental View - Deployment Dia
Implementation View - Component Diagram
Fig. 37.9 Different types of diagrams and views supported in UML Structural view: The structural view defines the kinds of objects (classes) important to the understanding of the working of a system and to its implementation. It also captures the relationships among the classes (objects). The structural model is also called the static model, since the structure of a system does not change with time. Behavioral view: The behavioural view captures how objects interact with each other to realize the system behaviour. The system behaviour captures the time-dependent (dynamic) behaviour of the system. Implementation view: This view captures the important components of the system and their dependencies. Version 2 EE IIT, Kharagpur 12
Environmental view: This view models how the different components are implemented on different pieces of hardware.
2.3. Use Case Model

The use case model for any system consists of a set of use cases. Intuitively, use cases represent the different ways in which a system can be used by the users. A simple way to find all the use cases of a system is to ask the question: What can the users do by using the system? Thus for the Library Information System (LIS), the use cases could be: issue-book query-book return-book create-member add-book, etc Use cases correspond to the high-level functional requirements. The use cases partition the system behaviour into transactions, so that each transaction performs some useful action from the users point of view. Each transaction may involve either a single message or multiple message exchanges between the user and the system to complete.
2.3.1. Purpose of Use Cases

The purpose of a use case is to define a piece of coherent behaviour without revealing the internal structure of the system. The use cases do not mention any specific algorithm to be used or the internal data representation, internal structure of the software, etc. A use case typically represents a sequence of interactions between the user and the system. These interactions consist of one mainline sequence. The mainline sequence represents the normal interaction between a user and the system. The mainline sequence is the most occurring sequence of interaction. For example, the mainline sequence of the withdraw cash use case supported by a bank ATM drawn, complete the transaction, and get the amount. Several variations to the main line sequence may also exist. Typically, a variation from the mainline sequence occurs when some specific conditions hold. For the bank ATM example, variations or alternate scenarios may occur, if the password is invalid or the amount to be withdrawn exceeds the amount balance. The variations are also called alternative paths. A use case can be viewed as a set of related scenarios tied together by a common goal. The mainline sequence and each of the variations are called scenarios or instances of the use case. Each scenario is a single path of user events and system activity through the use case.
2.3.2. Representation of Use Cases

Use cases can be represented by drawing a use case diagram and writing an accompanying text elaborating the drawing. In the use case diagram, each use case is represented by an ellipse with the name of the use case written inside the ellipse. All the ellipses (i.e. use cases) of a system are enclosed within a rectangle which represents the system boundary. The name of the system being modelled (such as Library Information System) appears inside the rectangle. The different users of the system are represented by using the stick person icon. Each stick person icon is normally referred to as an actor. An actor is a role played by a user with respect to the system use. It is possible that the same user may play the role of multiple actors. Each actor can participate in one or more use cases. The line connecting the actor and the use case is called Version 2 EE IIT, Kharagpur 13
the communication relationship. It indicates that the actor makes use of the functionality provided by the use case. Both the human users and the external systems can be represented by stick person icons. When a stick person icon represents an external system, it is annotated by the stereotype <<external system>>. Example The use case model for the Tic-Tac-Toe problem is shown in fig. 37.10. This software has only one use case play move. Note that the use case get-user-move is not used here. The name get-user-move would be inappropriate because the use cases should be named from the users perspective.
Play move
Player Tic-tac-toe game Fig. 37.10 Use case model for tic-tac-toe game Text Description Each ellipse on the use case diagram should be accompanied by a text description. The text description should define the details of the interaction between the user and the computer and other aspects of the use case. It should include all the behaviour associated with the use case in terms of the mainline sequence, different variations to the normal behaviour, the system responses associated with the use case, the exceptional conditions that may occur in the behaviour, etc. The behaviour description is often written in a conversational style describing the interactions between the actor and the system. The text description may be informal, but some structuring is recommended. The following are some of the information which may be included in a use case text description in addition to the mainline sequence, and the alternative scenarios. Contact persons: This section lists personnel of the client organization with whom the use case was discussed, date and time of the meeting, etc. Actors: In addition to identifying the actors, some information about actors using this use case which may help the implementation of the use case may be recorded. Pre-condition: The preconditions would describe the state of the system before the use case execution starts. Post-condition: This captures the state of the system after the use case has successfully completed. Non-functional requirements: This could contain the important constraints for the design and implementation, such as platform and environment conditions, qualitative statements, response time requirements, etc. Exceptions, error situations: This contains only the domain-related errors such as lack of users access rights, invalid entry in the input fields, etc. Obviously, errors that are not domain related, such as software errors, need not be discussed here. Sample dialogs: These serve as examples illustrating the use case. Version 2 EE IIT, Kharagpur 14
Specific user interface requirements: These contain specific requirements for the user interface of the use case. For example, it may contain forms to be used, screen shots, interaction style, etc. Document references: This part contains references to specific domain-related documents which may be useful to understand the system operation.
2.3.3. Utility of Use Cases

Use cases (represented by ellipses) along with the accompanying text description serve as a type of requirements specification of the system and form the core model to which all other models must conform. But, what about the actors (stick person icons)? One possible use of identifying the different types of users (actors) is in identifying and implementing a security mechanism through a login system, so that each actor can involve only those functionalities to which he is entitled to. Another possible use is in preparing the documentation (e.g. users manual) targeted at each category of user. Further, actors help in identifying the use cases and understanding the exact functioning of the system.
2.3.4. Factoring of Commonality Among Use Cases

It is often desirable to factor use cases into component use cases. Actually, factoring of use cases is required under two situations. First, complex use cases need to be factored into simpler use cases. This would not only make the behaviour associated with the use case much more comprehensible, but also make the corresponding interaction diagrams more tractable. Without decomposition, the interaction diagrams for complex use cases may become too large to be accommodated on a single sized (A4) paper. Secondly, use cases need to be factored whenever there is common behaviour across different use cases. Factoring would make it possible to define such behaviour only once and reuse it whenever required. It is desirable to factor out common usage such as error handling from a set of use cases. This makes analysis of the class design much simpler and elegant. However, a word of caution here. Factoring of use cases should not be done except for achieving the above two objectives. From the design point of view, it is not advantageous to break up a use case into many smaller parts just for the sake of it. UML offers three mechanisms for factoring of use cases, as follows: Generalization Use case generalization can be used when one use case is similar to another, but does something slightly differently or something more. Generalization works the same way with use cases as it does with classes. The child use case inherits the behaviour and meaning of the parent use case. The notation is the same too (as shown in fig. 37.11). It is important to remember that the base and the derived use cases are separate use cases and should have separate text descriptions.
Pay membership fee
Pay through credit card
Pay through library pay card
Fig. 37.11 Representation of use case generalization << include >>

Common Use case
Base Use case
Fig. 37.12 Representation of use case inclusion Includes The includes relationship in the older versions of UML (prior to UML 1.1) was known as the uses relationship. The includes relationship involves one use case including the behaviour of another use case in its sequence of events and actions. The includes relationship occurs when a chunk of behaviour is similar across a number of use cases. The factoring of such behaviour will help in not repeating the specification and implementation across different use cases. Thus, the includes relationship explores the issue of reuse by factoring out the commonality across use cases. It can also be gainfully employed to decompose a large and complex use cases into more manageable parts. As shown in fig. 37.12, the includes relationship is represented using a predefined stereotype <<include>>. In the includes relationship, a base use case compulsorily and automatically includes the behaviour of the common use cases. As shown in example fig. 37.13, issue-book and renew-book both include check-reservation use case. The base use case may include several use cases. In such cases, it may interleave their associated common use cases together. The common use case becomes a separate use case and the independent text description should be provided for it.
Issue Book
Renew Book
<< include >> << include >>
<< include >>
<< include >>
Check for Reservation
Get user selection
Update selected books
Fig. 37.13 Example use case inclusion Extends The main idea behind the extends relationship among the use cases is that it allows you to show optional system behaviour. An optional system behaviour is extended only under certain conditions. This relationship among use cases is also predefined as a stereotype as shown in fig.37.14. The extends relationship is similar to generalization. But unlike generalization, the extending use case can add additional behaviour only at an extension point only when certain conditions are satisfied. The extension points are points within the use case where variation to the mainline (normal) action sequence may occur. The extends relationship is normally used to capture alternate paths or scenarios. << extends >>
Base Use case
Common Use case
Fig. 37.14 Representation of use case extension Organization of Use Cases When the use cases are factored, they are organized hierarchically. The high-level use cases are refined into a set of smaller and more refined use cases as shown in fig. 37.15. Top-level use cases are super-ordinate to the refined use cases. The refined use cases are sub-ordinate to the top-level use cases. Note that only the complex use cases should be decomposed and organized in a hierarchy. It is not necessary to decompose simple use cases. The functionality of the super-ordinate use cases is traceable to their sub-ordinate use cases. Thus, the functionality provided by the super-ordinate use cases is composite of the functionality of the sub-ordinate use cases. In the highest level of the use case model, only the fundamental use cases are shown. The focus is on the application context. Therefore, this level is also referred to as the context diagram. In the context diagram, the system limits are emphasized. The toplevel diagram contains only those use cases with which the external users of the system interact. The subsystem-level use cases specify the services offered by the subsystems to the other subsystems. Any number of levels involving the subsystems may be utilized. In the
lowest level of the use case hierarchy, the class-level use cases specify the functional fragments or operations offered by the classes.
use case 1 use case 2
use case 3
External users
use case 3.1
use case 3.3 Subsystems
use case 3.2
use case 1 use case 2
use case 3 Method
Fig. 37.15 Hierarchical organization of use cases
2.4. Class Diagrams

A class diagram describes the static structure of a system. It shows how a system is structured rather than how it behaves. The static structure of a system comprises of a number of class diagrams and their dependencies. The main constituents of a class diagram are classes and their relationships: generalization, aggregation, association, and various kinds of dependencies. The classes represent entities with common features, i.e. attributes and operations. Classes are represented as solid outline rectangles with compartments. Classes have a mandatory name compartment where the name is written centered in boldface. The class name is usually written using mixed case convention and begins with an uppercase. The class names are usually chosen to be singular nouns. An example of a class is shown in fig. 37.1.2. Classes have optional attributes and operations compartments. A class may appear on several diagrams. Its attributes and operations are suppressed on all but one diagram.
2.4.1. Association
Associations are needed to enable objects to communicate with each other. An association describes a connection between classes. The association relation between two objects is called object connection or link. Links are instances of associations. A link is a physical or conceptual connection between object instances. For example, suppose Amit has borrowed the book Graph Theory. Here, borrowed is the connection between the objects Amit and Graph Theory book. Mathematically, a link can be considered to be a tuple, i.e. an ordered list of object instances. An association describes a group of links with a common structure and common semantics. For example, consider the statement that Library Member borrows Books. Here, borrows is the association between the class LibraryMember and the class Book. Usually, an association is a binary relation (between two classes). However, three or more different classes can be involved in an association. A class can have an association relationship with itself (called recursive association). In this case, it is usually assumed that two different objects of the class are linked by the association relationship. Association between two classes is represented by drawing a straight line between the concerned classes. Fig. 37.16 illustrates the graphical representation of the association relation. The name of the association is written along side the association line. An arrowhead may be placed on the association line to indicate the reading direction of the association. The arrowhead should not be misunderstood to be indicating the direction of a pointer implementing an association. On each side of the association relation, the multiplicity is noted as an individual number or as a value range. The multiplicity indicates how many instances of one class are associated with each other. Value ranges of multiplicity are noted by specifying the minimum and maximum value, separated by two dots, e.g. 1..5. An asterisk is a wild card and means many (zero or more). The association of fig. 37.16 should be read as Many books may be borrowed by a Library Member. Observe that associations (and links) appear as verbs in the problem statement. 1 Library Member borrowed by *
Book
Fig. 37.16 Association between two classes Associations are usually realized by assigning appropriate reference attributes to the classes involved. Thus, associations can be implemented using pointers from one object class to another. Links and associations can also be implemented by using a separate class that stores which objects of a class are linked to which objects of another class. Some CASE tools use the role names of the association relation for the corresponding automatically generated attribute.
2.4.2. Aggregation
Aggregation is a special type of association where the involved classes represent a whole-part relationship. The aggregate takes the responsibility of forwarding messages to the appropriate parts. Thus, the aggregate takes the responsibility of delegation and leadership. When an instance of one object contains instances of some other objects, then aggregation (or composition) relationship exists between the composite object and the component object. Aggregation is
represented by the diamond symbol at the composite end of a relationship. The number of instances of the component class aggregated can also be shown as in fig. 37.17 (a). 1 * 1 *
Document
Paragraph
Line
Fig. 37.17(a) Representation of aggregation The aggregation relationship cannot be reflexive (i.e. recursive). That is, an object cannot contain objects of the same class as itself. Also, the aggregation relation is not symmetric. That is, two classes A and B cannot contain instances of each other. However, the aggregation relationship can be transitive. In this case, aggregation may consist of an arbitrary number of levels.
2.4.3. Composition
Composition is a stricter form of aggregation, in which the parts are existence-dependent on the whole. This means that the life of the parts are closely tied to the life of the whole. When the whole is created, the parts are created and when the whole is destroyed, the parts are destroyed. A typical example of composition is an invoice object with invoice items. As soon as the invoice object is created, all the invoice items in it are created and as soon as the invoice object is destroyed, all invoice items in it are also destroyed. The composition relationship is represented as a filled diamond drawn at the composite-end. An example of the composition relationship is shown in fig. 37.17 (b). 1 *
Order
Item
Fig. 37.17(b) Representation of composition
2.5. Interaction Diagrams

Interaction diagrams are models that describe how a group of objects collaborate to realize some behaviour. Typically, each interaction diagram realizes the behaviour of a single use case. An interaction diagram shows a number of example objects and the messages that are passed between the objects within the use case. There are two kinds of interaction diagrams: sequence diagrams and collaboration diagrams. These two diagrams are equivalent in the sense that any one diagram can be derived automatically from the other. However, they are both useful. These two actually portray different perspectives of behaviour of the system and different types of inferences can be drawn from them. The interaction diagrams can be considered as a major tool in the design methodology.
2.5.1. Sequence Diagrams

A sequence diagram shows interaction among objects as a two dimensional chart. The chart is read from top to bottom. The objects participating in the interaction are shown at the top of the Version 2 EE IIT, Kharagpur 20
chart as boxes attached to a vertical dashed line. Inside the box, the name of the object is written with a colon separating it from the name of the class, and both the name of the object and class are underlined. The objects appearing at the top signify that the object already existed when the use case execution was initiated. However, if some object is created during the execution of the use case and participates in the interaction (e.g. a method call), then the object should be shown at the appropriate place on the diagram where it is created. The vertical dashed line is called the objects lifeline. The lifeline indicates the existence of the object at any particular point of time. The rectangle drawn on the lifetime is called the activation symbol and indicates that the object is active as long as the rectangle exists. Each message is indicated as an arrow between the lifelines of two objects. The messages are shown in chronological order from the top to the bottom. That is, reading the diagram from the top to the bottom would show the sequence in which the messages occur. Each message is labeled with the message name. Some control information can also be included. Two types of control information are particularly valuable. A condition (e.g. [invalid]) indicates that a message is sent, only if the condition is true. An iteration marker shows the message is sent many times to multiple receiver objects as would happen when a collection or the elements of an array are being iterated. The basis of the iteration can also be indicated e.g. [for every book object].
Library Book Renewal controller Library Book Register
Library Boundary
Book
Library Member
renewBook
display Borrowing
findMemberBorrowing
selectBooks [reserved] apology
bookSelected [reserved] apology confirm *find update
confirm
updateMemberBorrowing
Fig. 37.18 Sequence diagram for the renew book use case
The sequence diagram for the book renewal use case for the Library Automation Software is shown in fig. 37.18. The development of the sequence diagram in the development methodology would help us in determining the responsibilities of the different classes; i.e. what methods should be supported by each class.
2.5.2. Collaboration Diagrams

A collaboration diagram shows both structural and behavioural aspects explicitly. This is unlike a sequence diagram which shows only the behavioural aspects. The structural aspect of a collaboration diagram consists of objects and the links existing between them. In this diagram, an object is also called a collaborator. The behavioural aspect is described by the set of messages exchanged among the different collaborators. The link between objects is shown as a solid line and can be used to send messages between two objects. The message is shown as a labeled arrow placed near the link. Messages are prefixed with sequence numbers because they are the only way to describe the relative sequencing of the messages in this diagram. The collaboration diagram for the example of fig. 37.18 is shown in fig. 37.19. The use of the collaboration diagrams in our development process would be to help us to determine which classes are associated with which other classes.
Library Book Register 10: confirm [reserved] 8: apology 5: bookSelected [reserved] 7: apology
6: *find 9: update
Book
1: renewBook Library Boundary Library Book Renewal controller
3: displayBorrowing 2: findMemberBorrowing 4: selectBooks 12: confirm Library Member
Fig. 37.19 Collaboration diagram for the renew book use case
2.6. Activity Diagrams

The activity diagram is possibly one modelling element which was not present in any of the predecessors of UML. No such diagrams were present either in the works of Booch, Jacobson, or Rumbaugh. It is possibly based on the event diagram of Odell [1992] though the notation is very different from that used by Odell. The activity diagram focuses on representing activities or chunks of processing which may or may not correspond to the methods of classes. An activity is Version 2 EE IIT, Kharagpur 22
a state with an internal action and one or more outgoing transitions which automatically follow the termination of the internal activity. If an activity has more than one outgoing transition, then these must be identified through conditions. An interesting feature of the activity diagrams is the swim lanes. Swim lanes enable you to group activities based on who is performing them, e.g. academic department vs. hostel office. Thus swim lanes subdivide activities based on the responsibilities of some components. The activities in a swim lane can be assigned to some model elements, e.g. classes or some component, etc. Activity diagrams are normally employed in business process modelling. This is carried out during the initial stages of requirements analysis and specification. Activity diagrams can be very useful to understand complex processing activities involving many components. Later these diagrams can be used to develop interaction diagrams which help to allocate activities (responsibilities) to classes.
Academic Section check student records Accounts Section Hostel Office Hospital Department
receive fees
allot hostel
create hospital record register in courses conduct medical examinatio
receive fees
allot room
issue identity card
Fig. 37.20 Activity diagram for student admission procedure at IIT The student admission process in IIT is shown as an activity diagram in fig. 37.20. This shows the part played by different components of the Institute in the admission procedure. After the fees are received at the account section, parallel activities start at the hostel office, hospital, and the Department. After all these activities are completed (this synchronization is represented as a horizontal line), the identity card can be issued to a student by the Academic section. Version 2 EE IIT, Kharagpur 23
2.7. State Chart Diagrams

A state chart diagram is normally used to model how the state of an object changes in its lifetime. State chat diagrams are good at describing how the behaviour of an object changes across several use case executions. However, if we are interested in modelling some behaviour that involves several objects collaborating with each other, state chart diagram is not appropriate. State chart diagrams are based on the finite state machine (FSM) formalism. An FSM consists of a finite number of states corresponding to those of the object being modelled. The object undergoes state changes when specific events occur. The FSM formalism existed long before the object-oriented technology and has been used for a wide variety of applications. Apart from modelling, it has even been used in theoretical computer science as a generator for regular languages. A major disadvantage of the FSM formalism is the state explosion problem. The number of states becomes too many and the model too complex when used to model practical systems. This problem is overcome in UML by using state charts. The state chart formalism was proposed by David Harel [1990]. A state chart is a hierarchical model of a system and introduces the concept of a composite state (also called nested state). Actions are associated with transitions and are considered to be processes that occur quickly and are not interruptible. Activities are associated with states and can take a longer time. An activity can be interrupted by an event.
Order received Unprocessed order [reject] checked [accept] checked
Rejected order
Accepted order
[some items not available] processed [all items available] processed/deliver all items available Pending order newsupply Fulfilled order
Fig. 37.21 State chart diagram for an order object The basic elements of the state chart diagram are as follows: Initial state. This is represented as a filled circle. Final state. This is represented by a filled circle inside a larger circle. Version 2 EE IIT, Kharagpur 24
State. These are represented by rectangles with rounded corners. Transition. A transition is shown as an arrow between two states. Normally, the name
of the event which causes the transition is placed along side the arrow. A guard to the transition can also be assigned. A guard is a Boolean logic condition. The transition can take place only if the grade evaluates to true. The syntax for the label of the transition is shown in 3 parts: event[guard]/action. An example state chart for the order object of the Trade House Automation software is shown in fig. 37.21.
3. Object-Oriented Software Development

The object-modelling concepts introduced in the earlier sections can be put together to develop an object-oriented analysis and design methodology. Object-oriented design (OOD) advocates a radically different design approach compared to the traditional function-oriented design approach. OOD paradigm suggests that the natural objects (i.e. the entities) occurring in a problem should be identified first and then implemented. Object-oriented design techniques not only identify objects, but also identify the internal details of these identified objects. Also, the relationships existing among different objects are identified and represented in such a way that the objects can be easily implemented using a programming language. The term object-oriented analysis (OOA) refers to a method of developing an initial model of the software from the requirements specification. The analysis model is refined into a design model. The design model can be implemented using a programming language. The term objectoriented programming refers to the implementation of programs using object-oriented concepts.
3.1. Design Patterns

Design patterns are reusable solutions to problems that recur in many applications. A pattern serves as a guide for creating a good design. Patterns are based on sound common sense and the application of fundamental design principles. These are created by people who spot repeating themes across designs. The pattern solutions are typically described in terms of class and interaction diagrams. Examples of design patterns are expert pattern, creator pattern, controller pattern, etc. In addition to providing the model of a good solution, design patterns include a clear specification of the problem, and also explain the circumstances in which the solution would and would not work. Thus, a design pattern has four important parts:

The problem The context in which the problem occurs The solution The context within which the solution works
3.1.1. Design Pattern Solutions

The design pattern solutions are typically described in terms of class and interaction diagrams. Expert Pattern Problem: Which class should be responsible for doing certain things? Version 2 EE IIT, Kharagpur 25
Solution: Assign responsibility to the information expert the class that has the information necessary to fulfill the required responsibility. The expert pattern expresses the common intuition that objects do things related to the information they have. The class diagram and collaboration diagrams for this solution to the problem of which class should compute the total sales is shown in the fig. 37.1.1. Sale Transaction Saleltem (a) 1: total 2: subtotal 3: price ItemSpecification
SaleTransaction
Saleltem
ItemSpecification
(b) Fig. 37.22 Expert pattern: (a) Class diagram (b) Collaboration diagram
Creator Pattern
Problem: Which class should be responsible for creating a new instance of some class? Solution: Assign a class C1 the responsibility to create an instance of class C2, if one or more of the following are true: C1 is an aggregation of objects of type C2 C1 contains objects of type C2 C1 closely uses objects of type C2 C1 has the data that would be required to initialize the objects of type C2, when they are created
Controller Pattern
Problem: Who should be responsible for handling the actor requests? Solution: For every use case, there should be a separate controller object which would be responsible for handling requests from the actor. Also, the same controller should be used for all the actor requests pertaining to one use case so that it becomes possible to maintain the necessary information about the state of the use case. The state information maintained by a controller can be used to identify the out-of-sequence actor requests, e.g. whether voucher request is received before arrange payment request. Model View Separation Pattern Problem: How should the non-GUI classes communicate with the GUI classes? Context in which the problem occurs: This is a very commonly occurring pattern which is found in almost every problem. Here, model is a synonym for the domain layer objects, view is a synonym for the presentation layer objects such as the GUI objects.
Solution: The model view separation pattern states that model objects should not have direct knowledge (or be directly coupled) of the view objects. This means that there should not be any direct calls from other objects to the GUI objects. This results in a good solution, because the GUI classes are related to a particular application whereas the other classes may be reused. There are actually two solutions to this problem which work in different circumstances. These are as follows: Solution 1: Polling or Pull from above It is the responsibility of a GUI object to ask for the relevant information from the other objects, i.e. the GUI objects pull the necessary information from the other objects whenever required. This model is frequently used. However, it is inefficient for certain applications. For example, simulation applications which require visualization, the GUI objects would not know when the necessary information becomes available. Other examples are, monitoring applications such as network monitoring, stock market quotes, and so on. In these situations, a push-from-below model of display update is required. Since push-from-below is not an acceptable solution, an indirect mode of communication from the other objects to the GUI objects is required. Solution 2: Publish- subscribe pattern An event notification system is implemented through which the publisher can indirectly notify the subscribers as soon as the necessary information becomes available. An event manager class can be defined as one which keeps track of the subscribers and the types of events they are interested in. An event is published by the publisher by sending a message to the event manager object. The event manager notifies all registered subscribers usually via a parameterized message (called a callback). Some languages specifically support event manager classes. For example, Java provides the EventListener interface for such purposes.
3.2. Domain Modelling

Domain modelling is known as conceptual modelling. A domain model is a representation of the concepts or objects appearing in the problem domain. It also captures the obvious relationships among these objects. Examples of such conceptual objects are the Book, BookRegister, MemeberRegister, LibraryMember, etc. The recommended strategy is to quickly create a rough conceptual model where the emphasis is in finding the obvious concepts expressed in the requirements while deferring a detailed investigation. Later during the development process, the conceptual model is incrementally refined and extended. The objects identified during domain analysis can be classified into three types: Boundary objects Controller objects Entity objects The boundary and controller objects can be systematically identified from the use case diagram whereas identification of entity objects requires practice. So, the crux of the domain modeling activity is to identify the entity models.
3.2.1. Boundary objects

The boundary objects are those with which the actors interact. These include screens, menus, forms, dialogs, etc. The boundary objects are mainly responsible for user interaction. Therefore, Version 2 EE IIT, Kharagpur 27
they normally do not include any processing logic. However, they may be responsible for validating inputs, formatting, outputs, etc. The boundary objects were earlier being called the interface objects. However, the term interface class is being used for Java, COM/DCOM, and UML with different meaning. A recommendation for the initial identification of the boundary classes is to define one boundary class per actor/use case pair.
3.2.2. Entity objects

These normally hold information such as data tables and files that need to outlive use case execution, e.g. Book, BookRegister, LibraryMember, etc. Many of the entity objects are dumb servers. They are normally responsible for storing data, fetching data, and doing some fundamental kinds of operation that do not change often.
3.2.3. Controller objects

The controller objects coordinate the activities of a set of entity objects and interface with the boundary objects to provide the overall behavior of the system. The responsibilities assigned to a controller object are closely related to the realization of a specific use case. The controller objects effectively decouple the boundary and entity objects from one another making the system tolerant to changes of the user interface and processing logic. The controller objects embody most of the logic involved with the use case realization (this logic may change from time to time). A typical interaction of a controller object with boundary and entity objects is shown in fig. 37.22. Normally, each use case is realized using one controller object. However, some use cases can be realized without using any controller object, i.e. through boundary and entity objects only. This is often true for use cases that achieve only some simple manipulation of the stored information.
3.2.4. Example
Lets consider the query book availability use case of the Library Information System (LIS). Realization of the use case involves only matching the given book name against the books available in the catalog. More complex use cases may require more than one controller object to realize the use case. A complex use case can have several controller objects such as transaction manager, resource coordinator, and error handler. There is another situation where a use case can have more than one controller object. Sometimes the use cases require the controller object to transit through a number of states. In such cases, one controller object might have to be created for each execution of the use case.
Boundary 1
Controller
Boundary 2
Entity 1
Entity 2
Entity 3
Fig. 37.23 A typical realization of a use case through the collaboration of boundary, controller, and entity objects
3.2.5. Identification of Entity Objects

One of the most important steps in any object-oriented design methodology is the identification of objects. In fact, the quality of the final design depends to a great extent on the appropriateness of the objects identified. However, to date no formal methodology exists for identification of objects. Several semi-formal and informal approaches have been proposed for object identification. These can be classified into the following broad classes: Grammatical analysis of the problem description Derivation from data flow Derivation from the entity relationship (E-R) diagram A widely accepted object identification approach is the grammatical analysis approach. Grady Booch originated the grammatical analysis approach [1991]. In Boochs approach, the nouns occurring in the extended problem description statement (processing narrative) are mapped to objects and the verbs are mapped to methods.
3.3. Boochs Object Identification Method

Boochs object identification approach requires a processing narrative of the given problem to be first developed. The processing narrative describes the problem and discusses how it can be solved. The objects are identified by noting down the nouns in the processing narrative. Synonym of a noun must be eliminated. If an object is required to implement a solution, then it is said to be part of the solution space. Otherwise, if an object is necessary only to describe the problem, then it is said to be a part of the problem space. However, several of the nouns may not be objects. An imperative procedure name, i.e., noun form of a verb actually represents an action and should not be considered as an object. A potential object found after lexical analysis is usually considered legitimate, only if it satisfies the following criteria: Retained information: Some information about the object should be remembered for the system to function. If an object does not contain any private data, it can not be expected to play any important role in the system. Multiple attributes: Usually objects have multiple attributes and support multiple methods. It is very rare to find useful objects which store only a single data element or support only a Version 2 EE IIT, Kharagpur 29
single method, because an object having only a single data element or method is usually implemented as a part of another object. Common operations: A set of operations can be defined for potential objects. If these operations apply to all occurrences of the object, then a class can be defined. An attribute or operation defined for a class must apply to each instance of the class. If some of the attributes or operations apply only to some specific instances of the class, then one or more subclasses can be needed for these special objects. Normally, the actors themselves and the interactions among themselves should be excluded from the entity identification exercise. However, some times there is a need to maintain information about an actor within the system. This is not the same as modeling the actor. These classes are sometimes called surrogates. For example, in the Library Information System (LIS) we would need to store information about each library member. This is independent of the fact that the library member also plays the role of an actor of the system. Although the grammatical approach is simple and intuitively appealing, yet through a naive use of the approach, it is very difficult to achieve high quality results. In particular, it is very difficult to come up with useful abstractions simply by doing grammatical analysis of the problem description. Useful abstractions usually result from clever factoring of the problem description into independent and intuitively correct elements.
3.3.1. An Example: Tic-Tac-Toe

Tic-tac-toe is a computer game in which a human player and the computer make alternative moves on a 3 x 3 square. A move consists of marking a previously unmarked square. A player who first places three consecutive marks along a straight line (i.e., along a row, column, or diagonal) on the square, wins the game. As soon as either the human player or the computer wins, a message congratulating the winner should be displayed. If neither player manages to get three consecutive marks along a straight line, but all the squares on the board are filled up, then the game is drawn. The computer always tries to win a game. By performing a grammatical analysis of this problem statement, it can be seen that nouns have been underlined in the problem description and the actions or verbs have been italicized. However, on closer examination synonyms can be eliminated from the identified nouns. The list of nouns after eliminating the synonyms is the following: Tic-tac-toe, computer game, human player, move, square, mark, straight line, board, row, column, and diagonal. From this list of possible objects, nouns can be eliminated e.g. human player, as it does not belong to the problem domain. Also, the nouns square, game, computer, Tic-tac-toe, straight line, row, column, and diagonal can be eliminated, as any data and methods can not be associated with them. The noun move can also be eliminated from the list of potential objects since it is an imperative verb and actually represents an action. Thus, there is only one object left board. After being experienced in object identification, it is not normally necessary to really identify all nouns in the problem description by underlining them or actually listing them down, and systematically eliminate the non-objects to arrive at the final set of objects. The step-by-step workout of the analysis and design procedure is given as follows: The use case model is shown in fig 37.10. The initial domain model is shown in fig 37.23(a). The domain model after adding the boundary and control classes is shown in fig 37.23(b). Sequence diagram for the play move use case is shown in fig. 37.25. Version 2 EE IIT, Kharagpur 30
Class diagram is shown in fig. 37.24. The messages of the sequence diagram have
been populated as methods of the corresponding classes.

Board (a)
PlayMoveBoundary
PlayMoveController (b)
Board
Fig. 37.24 (a) Initial domain model (b) Refined domain model
Board int position [9] checkMoveValidity checkResult playMove
PlayMoveBoundary AnnounceInvalidMove announceResult displayBoard
Controller
announceInvalidMove announceResult
Fig. 37.25 Class diagram
:playMove Boundary
:playMove Controller
:Board
Move
acceptMove
checkMoveValidity [invalid move]
[invalid move] announcelnvalidMove
announceInvalidMove checkWinner [game over] announceResult
[game over] announceResult playMove checkWinner [game over] announceResult getBoardPositions
[game over] announceResult displayBoardPosition [game not over] promtNextMove
Fig. 37.26 Sequence diagram for the play move use case
4. Exercises
1. Mark the following as True or False. Justify your answer. a. All software engineering principles are backed by either scientific basis or theoretical proof. b. Data abstraction helps in easy code maintenance and code reuse. c. Classes can be considered equivalent to Abstract Data Types (ADTs). d. The inheritance relationship describes has a relationship among classes. e. Inheritance feature of the object oriented paradigm helps in code reuse. f. An important advantage of polymorphism is facilitation of reuse. g. Using dynamic binding a programmer can send a generic message to a set of objects which may be of different types i.e. belonging to different classes. h. In dynamic binding, address of an invoked method is known only at the compile time i. For any given problem, one should construct all the views using all the diagrams provided by UML. j. Use cases are explicitly dependent among themselves. Version 2 EE IIT, Kharagpur 32
k. l.
2.
Each actor can participate in one and only one use case. Class diagrams developed using UML can serve as the functional specification of a system. m. The terms method and operation are equivalent concepts and can be used interchangeably. n. The aggregation relationship can be recursively defined, i.e. an object can contain instances of itself. o. In a UML class diagram, the aggregation relationship defines an equivalence relationship among objects. p. The aggregation relationship can be considered to be a special type of association relationship. q. Normally, you use an interaction diagram to represent how the behaviour of an object changes over its life time. r. The interaction diagrams can be effectively used to describe how the behaviour of an object changes across several use cases. s. A state chart diagram is good at describing behaviour that involves multiple objects cooperating with each other to achieve some behaviour. t. Facade pattern tells how non-GUI classes should communicate with the GUI classes. u. The use cases should be tightly tied to the GUI. v. The responsibilities assigned to a controller object are closely related to the realization of a specific use case. w. There is a one-to-one correspondence between the classes of the domain model and the final class diagram. x. A large number of message exchanges between objects indicates good delegation and is a sure sign of a design well-done. y. Deep class hierarchies are the hallmark of any good OOD. z. Cohesiveness of the data and methods within a class is a sign of good OOD. For the following, mark all options which are true. a. In the object-oriented approach, each object essentially consists of some data that are private to the object a set of functions (or operations) that operate on those data the set of methods it provides to the other objects for accessing and manipulating the data none of the above b. Redefinition of methods in a derived class which existed in the base class is called function overloading operator overloading method overriding none of the above c. The mechanism by which a subclass inherits attributes and methods from more than one base class is called single inheritance multiple inheritance multi-level inheritance hierarchical inheritance d. In the object-oriented approach, the same message can result in different actions when received by different objects. This feature is referred to as static binding Version 2 EE IIT, Kharagpur 33
e.
f.
g.
h.
i.
j.
k.
l.
dynamic binding genericity overloading UML is a language to model syntax an object-oriented development methodology an automatic code generation tool none of the above In the context of use case diagram, the stick person icon is used to represent human users external systems internal systems none of the above The design pattern solutions are typically described in terms of class diagrams object diagrams interaction diagrams both class and interaction diagrams The class that should be responsible for doing certain things for which it has the necessary information is the solution proposed by creator pattern controller pattern expert pattern facade pattern The class that should be responsible for creating a new instance of some class is the solution proposed by creator pattern controller pattern expert pattern facade pattern The objects identified during domain analysis can be classified into boundary objects controller objects entity objects all of the above The most critical part of the domain modelling activity is to identify controller objects boundary objects entity objects none of the above The objects which effectively decouple the boundary and entity objects from one another making the system tolerant to changes of the user interface and processing logic are controller objects boundary objects entity objects Version 2 EE IIT, Kharagpur 34
3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14.
15. 16. 17. 18. 19. 20. 21. 22.
none of the above What is the basic difference between a class and its object? Also, identify the basic difference between methods and messages. Explain what you understand by data abstraction. Identify its advantages. Explain the different types of inheritance with examples. Identify the advantages of inheritance. Explain encapsulation in the context of OO programming. State the advantages of encapsulation. Identify the differences between static binding and dynamic binding. What are the advantages of dynamic binding? Explain the advantages of object-oriented design. Explain the need of a model in the context of software development. Describe the different types of views of a system captured by UML diagrams. What is the purpose of a use case? What is the necessity for developing use case diagram? Which diagrams in UML capture the behavioural view of the system? Which UML diagrams capture the structural aspects of a system? Which UML diagrams capture the important components of the system and their dependencies? Represent the following relations among classes using UML diagram. a. Students credit 5 courses each semester. Each course is taught by one or more teachers. b. Bill contains a number of items. Each item describes some commodity, the price of unit, and total price. c. An order consists of one or more order items. Each order item contains the name of the item, its quantity and the date by which it is required. Each order item is described by an item type specification object having details such as its vendor addresses, its unit price, and the manufacturer. How should you identify use cases of a system? What is the difference between an operation and a method in the context of OOD technique? What does the association relationship among classes represent? Give examples of the association relationship. What does aggregation relationship between classes represent? Give examples of aggregation relationship between classes. Why are objects always passed by reference in all popular programming languages? What are design patterns? What are the advantages of using design patterns? Write down some popular design patterns and their necessities. Give an outline of object-oriented development process. What is meant by domain modelling? Differentiate the different types of objects that are identified during domain analysis.
References (Lessons 29 - 33)

1. 2. 3. 4. Sommerville, Software Engineering, Addison Wesley, Reading, MA, USA, 2000. Steve Heath, Embedded System Design: Real World Design, Butter-worth Heinemann, Newton, Mass., USA, May 2002. Hatley D. and Pirbhai I., Strategies for Real-Time System Specification, Dorset House, New York, 1987. Ward P.T. and Mellor S.J., Structured Development of Real-Time Systems, Yourdon Press, New York, 1985.
Module 8
Testing of Embedded System
Lesson 38
Testing Embedded Systems
After going through this lesson the student would be able to Distinguish between the terms testing and verification Describe the common types of faults that occur in embedded systems Explain the various types of models that are used to represent the faults Describe the methodology of testing systems with embedded cores Distinguish among terms like DFT, BIST and on-line testing Explain the need and mechanism of Automatic Test Pattern Generation in the context of testing embedded hard-ware software systems
Testing Embedded Systems 1. Introduction
What is testing?
Testing is an organized process to verify the behavior, performance, and reliability of a device or system against designed specifications. It ensures a device or system to be as defect-free as possible. Expected behavior, performance, and reliability must be both formally described and measurable.
Verification vs. Testing [1]

Verification or debugging is the process of removing defects ("bugs") in the design phase to ensure that the synthesized design, when manufactured will behave as expected. Testing is a manufacturing step to ensure that the manufactured device is defect free. Testing is one of the detective measures, and verification one of the corrective measures of quality. Verification Verifies the correctness of design. Testing Verifies correctness of manufactured system. Performed by simulation, hardware Two-part process: emulation, or formal methods. 1. Test generation: software process executed once during design. 2. Test application: electrical tests applied to hardware.
Performed once prior to manufacturing. Responsible for quality of design.
Test application performed on every manufactured device. Responsible for quality of devices.
What is an "embedded system"?

Embedded systems are electronically controlled system where hardware and software are combined [2-3]. These are computers incorporated in consumer products or other devices to perform application-specific functions. The enduser is usually not even aware of their existence. Embedded systems can contain a variety of computing devices, such as microcontrollers, application-specific integrated circuits, and digital signal processors. Most systems used in real life as power plant system, medical instrument system, home appliances, air traffic control station, routers and firewalls, telecommunication exchanges, robotics and industrial automation, smart cards, personal digital assistant (PDA) and cellular phone are example of embedded system.
Real-Time System
Most, if not all, embedded systems are "real-time". The terms "real-time" and "embedded" are often used interchangeably. A real-time system is one in which the correctness of a computation not only depends on its logical correctness, but also on the time at which the result is produced. In hard real time systems if the timing constraints of the system are not met, system crash could be the consequence. For example, in mission-critical application where failure is not an option, time deadlines must be followed. In case of soft real time systems no catastrophe will occur if deadline fails and the time limits are negotiable.
In spite of the progress of hardware/software codesign, hardware and software in embedded system are usually considered separately in the design process. There is a strong interaction between hardware and software in their failure mechanisms and diagnosis, as in other aspects of system performance. System failures often involve defects in both hardware and software. Software does not break in the traditional sense, however it can perform inappropriately due to faults in the underlying hardware, as well as specification or design flaws in either the hardware or the software. At the same time, the software can be exploited to test for and respond to the presence of faults in the underlying hardware. It is necessary to understand the importance of the testing of embedded system, as its functions have been complicated. However the studies related to embedded system test are not adequate.
2.
Embedded Systems Testing
Test methodologies and test goals differ in the hardware and software domains. Embedded software development uses specialized compilers and development software that offer means for debugging. Developers build application software on more powerful computers and eventually test the application in the target processing environment. Version 2 EE IIT, Kharagpur 4
In contrast, hardware testing is concerned mainly with functional verification and self-test after chip is manufactured. Hardware developers use tools to simulate the correct behavior of circuit models. Vendors design chips for self-test which mainly ensures proper operation of circuit models after their implementation. Test engineers who are not the original hardware developers test the integrated system. This conventional, divided approach to software and hardware development does not address the embedded system as a whole during the system design process. It instead focuses on these two critical issues of testing separately. New problems arise when developers integrate the components from these different domains. In theory, unsatisfactory performance of the system under test should lead to a redesign. In practice, a redesign is rarely feasible because of the cost and delay involved in another complete design iteration. A common engineering practice is to compensate for problems within the integrated system prototype by using software patches. These changes can unintentionally affect the behavior of other parts in the computing system. At a higher abstraction level, executable specification languages provide an excellent means to assess embedded-systems designs. Developers can then test system-level prototypes with either formal verification techniques or simulation. A current shortcoming of many approaches is, however, that the transition from testing at the system level to testing at the implementation level is largely ad hoc. To date, system testing at the implementation level has received attention in the research community only as coverification, which simulates both hardware and software components conjointly. Coverification runs simulations of specifications on powerful computer systems. Commercially available coverification tools link hardware simulators and software debuggers in the implementation phase of the design process. Since embedded systems are frequently employed in mobile products, they are exposed to vibration and other environmental stresses that can cause them to fail. Some embedded systems, such as those in automotive applications, are exposed to extremely harsh environments. These applications are preparing embedded systems to meet new and more stringent requirements of safety and reliability is a significant challenge for designers. Critical applications and applications with high availability requirements are the main candidates for on-line testing.
3.
Faults in Embedded Systems
Incorrectness in hardware systems may be described in different terms as defect, error and faults. These three terms are quite bit confusing. We will define these terms as follows [1]: Defect: A defect in a hardware system is the unintended difference between the implemented hardware and its intended design. This may be a process defects, material defects, age defects or package effects. Error: A wrong output signal produced by a defective system is called an error. An error is an effect whose cause is some defect. Errors induce failures, that is, a deviation from appropriate system behavior. If the failure can lead to an accident, it is a hazard. Fault: A representation of a defect at the abstraction level is called a fault. Faults are physical or logical defects in the design or implementation of a device.
3.1 Hardware Fault Model (Gate Level Fault Models)

As the complexity and integration of hardware are increasing with technology, defects are too numerous and very difficult to analyze. A fault model helps us to identify the targets for testing and analysis of failure. Further, the effectiveness of the model in terms of its relation to actual failures should be established by experiments. Faults in a digital system can be classified into three groups: design, fabrication, and operational faults. Design faults are made by human designers or CAD software (simulators, translators, or layout generators), and occur during the design process. These faults are not directly related to the testing process. Fabrication defects are due to an imperfect manufacturing process. Defects on hardware itself, bad connections, bridges, improper semiconductor doping and irregular power supply are the examples of physical faults. Physical faults are also called as defect-oriented faults. Operational or logical faults are occurred due to environmental disturbances during normal operation of embedded system. Such disturbances include electromagnetic interference, operator mistakes, and extremes of temperature and vibration. Some design defects and manufacturing faults escape detection and combine with wearout and environmental disturbances to cause problems in the field. Hardware faults are classified as stuck-at faults, bridging faults, open faults, power disturbance faults, spurious current faults, memory faults, transistor faults etc. The most commonly used fault model is that of the stuck-at fault model [1]. This is modeled by having a line segment stuck at logic 0 or 1 (stuck-at 1 or stuck-at 0). Stuck-at Fault: This is due to the flaws on hardware, and they represent faults of the signal lines. A signal line is the input or output of a logic gate. Each connecting line can have two types of faults: stuck-at-0 (s-a-0) or stuck-at-1 (s-a-1). In general several stuck-at faults can be simultaneously present in the circuit. A circuit with n lines can have 3n 1 possible stuck line combinations as each line can be one of the three states: s-a-0, s-a-1 or fault free. Even a moderate value of n will give large number of multiple stuck-at faults. It is a common practice, therefore to model only single stuck-at faults. An n-line circuit can have at most 2n single stuckat faults. This number can be further reduced by fault collapsing technique. Single stuck-at faults is characterized by the following properties: 1. Fault will occur only in one line. 2. The faulty line is permanently set to either 0 or 1. 3. The fault can be at an input or output of a gate. 4. Every fan-out branch is to be considered as a separate line. Figure 38.1 gives an example of a single stuck-at fault. A stuck-at-1 fault as marked at the output of OR gate implies that the faulty signal remains 1 irrespective of the input state of the OR gate.
1 1
AND
Faultv Response True Response AND 0 (1)
0 0
OR
0(1) Stuck-at-1
Fig. 38.1 An example of a stuck-at fault Bridging faults: These are due to a short between a group of signal. The logic value of the shorted net may be modeled as 1-dominant (OR bridge), 0-dominant (AND bridge), or intermediate, depending upon the technology in which the circuit is implemented. Stuck-Open and Stuck-Short faults: MOS transistor is considered as an ideal switch and two types of faults are modeled. In stuck-open fault a single transistor is permanently stuck in the open state and in stuck-short fault a single transistor is permanently shorted irrespective of its gate voltage. These are caused by bad connection of signal line. Power disturbance faults: These are caused by inconsistent power supplies and affect the whole system. Spurious current faults: that exposed to heavy ion affect whole system. Operational faults are usually classified according to their duration: Permanent faults exist indefinitely if no corrective action is taken. These are mainly manufacturing faults and are not frequently occur due to change in system operation or environmental disturbances. Intermittent faults appear, disappear, and reappear frequently. They are difficult to predict, but their effects are highly correlated. Most of these faults are due to marginal design or manufacturing steps. These faults occur under a typical environmental disturbance. Transient faults appear for an instant and disappear quickly. These are not correlated with each other. These are occurred due random environmental disturbances. Power disturbance faults and spurious current faults are transient faults.
3.2 Software-Hardware Covalidation Fault Model

A design error is a difference between the designers intent and an executable specification of the design. Executable specifications are often expressed using high-level hardware-software languages. Design errors may range from simple syntax errors confined to a single line of a design description, to a fundamental misunderstanding of the design specification which may impact a large segment of the description. A design fault describes the behavior of a set of design errors, allowing a large set of design errors to be modeled by a small set of design faults. The majority of covalidation fault models are behavioral-level fault models. Existing covalidation fault models can be classified by the style of behavioral description upon which the models are based. Many different internal behavioral formats are possible [8]. The covalidation fault models Version 2 EE IIT, Kharagpur 7
currently applied to hardware-software designs have their origins in either the hardware [9] or the software [10] domains.
3.2.1 Textual Fault Models

A textual fault model is one, which is applied directly to the original textual behavioral description. The simplest textual fault model is the statement coverage metric introduced in software testing [10] which associates a potential fault with each line of code, and requires that each statement in the description be executed during testing. This coverage metric is accepted as having limited accuracy in part because fault effect observation is ignored. Mutation analysis is a textual fault model which was originally developed in the field of software test, and has also been applied to hardware validation. A mutant is a version of a behavioral description which differs from the original by a single potential design error. A mutation operator is a function which is applied to the original program to generate a mutant.
3.2.2 Control-Dataflow Fault Models

A number of fault models are based on the traversal of paths through the contol data flow graph (CDFG) representing the system behavior. In order to apply these fault models to a hardwaresoftware design, both hardware and software components must be converted into a CDFG description. Applying these fault models to the CDFG representing a single process is a well understood task. Existing CDFG fault models are restricted to the testing of single processes. The earliest control-dataflow fault models include the branch coverage and path coverage [10] models used in software testing. The branch coverage metric associates potential faults with each direction of each conditional in the CDFG. The branch coverage metric has been used for behavioral validation for coverage evaluation and test generation [11, 12]. The path coverage metric is a more demanding metric than the branch coverage metric because path coverage reflects the number of controlflow paths taken. The assumption is that an error is associated with some path through the control flow graph and all control paths must be executed to guarantee fault detection. Many CDFG fault models consider the requirements for fault activation without explicitly considering fault effect observability. Researchers have developed observability-based behavioral fault models [13, 14] to alleviate this weakness.
3.2.3 State Machine Fault Models

Finite state machines (FSMs) are the classic method of describing the behavior of a sequential system and fault models have been defined to be applied to state machines. The commonly used fault models are state coverage which requires that all states be reached, and transition coverage which requires that all transitions be traversed. State machine transition tours, paths covering each transition of the machine, are applied to microprocessor validation [15]. The most significant problem with the use of state machine fault models is the complexity resulting from the state space size of typical systems. Several efforts have been made to alleviate this problem by identifying a subset of the state machine which is critical for validation [16].
3.2.4 Application-Specific Fault Models

A fault model which is designed to be generally applicable to arbitrary design types may not be as effective as a fault model which targets the behavioral features of a specific application. To justify the cost of developing and evaluating an application-specific fault model, the market for the application must be very large and the fault modes of the application must be well understood. For this reason, application-specific fault models are seen in microprocessor test and validation [17,18].
3.3 Interface Faults

To manage the high complexity of hardware-software design and covalidation, efforts have been made to separate the behavior of each component from the communication architecture [19]. Interface covalidation becomes more significant with the onset of core-based design methodologies which utilize pre-designed, pre-verified cores. Since each core component is preverified, the system covalidation problem focuses on the interface between the components. A case study of the interface-based covalidation of an image compression system has been presented [20].
4.
Testing of Embedded Core-Based System-on-Chips (SOCs)
The system-on-chip test is a single composite test comprised of the individual core tests of each core, the UDL tests, and interconnect tests. Each individual core or UDL test may involve surrounding components. Certain operational constraints (e.g., safe mode, low power mode, bypass mode) are often required which necessitates access and isolation modes. In a core-based system-on-chip [5], the system integrator designs the User Defined Logic (UDL) and assembles the pre-designed cores provided by the core vendor. A core is typically hardware description of standard IC e.g., DSP, RISC processor, or DRAM core. Embedded cores represent intellectual property (IP) and in order to protect IP, core vendors do not release the detailed structural information to the system integrator. Instead a set of test pattern is provided by the core vendor that guarantees a specific fault coverage. Though the cores are tested as part of overall system performance by the system integrator, the system integrator deals the core as a black box. These test patterns must be applied to the cores in a given order, using a specific clock strategy. The core internal test developed by a core provider need to be adequately described, ported and ready for plug and play, i.e., for interoperability, with the system chip test. For an internal test to accompany its corresponding core and be interoperable, it needs to be described in an commonly accepted, i.e., standard, format. Such a standard format is currently being developed by IEEE PI 500 and referred to as standardization of a core test description language [22]. In SOCs cores are often embedded in several layers of user-defined or other core-based logic, and direct physical access to its peripheries is not available from chip I/Os. Hence, an electronic access mechanism is needed. This access mechanism requires additional logic, such as a wrapper around the core and wiring, such as a test access mechanism to connect core peripheries to the test sources and sinks. The wrapper performs switching between normal mode Version 2 EE IIT, Kharagpur 9
and the test mode(s) and the wiring is meant to connect the wrapper which surrounds the core to the test source and sink. The wrapper can also be utilized for core isolation. Typically, a core needs to be isolated from its surroundings in certain test modes. Core isolation is often required on the input side, the output side, or both. source test access mechnism
embedded core wrapper
test access mechnism
sink
Fig. 38. 2 Overview of the three elements in an embedded-core test approach: (1) test pattern source, (2) test access mechanism, and (3) core test wrapper [5]. A conceptual architecture for testing embedded-core-based SOCs is shown in Figure 38.2 It consists of three structural elements:
1. Test Pattern Source and Sink

The test pattern source generates the test stimuli for the embedded core, and the test pattern sink compares the response(s) to the expected response(s). Test pattern source as well as sink can be implemented either off-chip by external Automatic Test Equipment (ATE), on-chip by Built-In Self-Test (or Embedded ATE), or as a combination of both. Source and sink do not need to be of the same type, e.g., the source of an embedded core can be implemented off-chip, while the sink of the same core is implemented on-chip. The choice for a certain type of source or sink is determined by (1) The type of circuitry in the core, (2) The type of pre-defined tests that come with the core and (3) Quality and Cost considerations. The type of circuitry of a certain core and the type of predefined tests that come with the core determine which implementation options are left open for test pattern source and sink. The actual choice for a particular source or sink is in general determined by quality and cost considerations. On-chip sources and sinks provide better accuracy and performance related defect coverage, but at the same time increase the silicon area and hence might reduce manufacturing yield.
2. Test Access Mechanism

The test access mechanism takes care of on-chip test pattern transport. It can be used (1) to transport test stimuli from the test pattern source to the core-under-test, and (2) to transport test responses from the core-under-test to the test pattern sink. The test access mechanism is by definition, implemented on-chip. Although for one core often the same type of' test access mechanism is used for both stimulus as well as response transportation, this is not required and various combinations may co-exist. Designing a test access mechanism involves making a tradeoff between the transport capacity (bandwidth) of the mechanism and the test application cost it induces. The bandwidth is limited by the bandwidth of source and sink and the amount of silicon area one wants to spend on the test access mechanism itself.
3. Core Test Wrapper

The core test wrapper forms the interface between the embedded core and its system chip environment. It connects the core terminals both to the rest of the IC, as well as to the test access mechanism. By definition, the core test wrapper is implemented on-chip. The core test wrapper should have the following mandatory modes. Normal operation (i.e., non-test) mode of' the core. In this mode, the core is connected to its system-IC environment and the wrapper is transparent. Core test mode. In this mode the test access mechanism is connected to the core, such that test stimuli can be applied at the core's inputs and responses can be observed at the core's outputs. Interconnect test mode. In this mode the test access mechanism is connected to the interconnect wiring and logic, such that test stimuli can be applied at the core's outputs and responses can be observed at the core's inputs. Apart from these mandatory modes, a core test wrapper might have several optional modes, e.g., a detach mode to disconnect the core from its system chip environment and the test access mechanism, or a bypass mode for the test access mechanisms. Depending on the implementation of the test access mechanism, some of the above modes may coincide. For example, if the test access mechanism uses existing functionality, normal operation and core test mode may coincide. Pre-designed cores have their own internal clock distribution system. Different cores have different clock propagation delays, which might result in clock skew for inter-core communication. The system-IC designer should take care of this clock skew issue in the functional communication between cores. However, clock skew might also corrupt the data transfer over the test access mechanism, especially if this mechanism is shared by multiple cores. The core test wrapper is the best place to have provisions for clock skew prevention in the test access paths between the cores. In addition to the test integration and interdependence issues, the system chip composite test requires adequate test scheduling. Effective test scheduling for SOCs is challenging because it must address several conflicting goals: (1) total SOC testing time minimization, (2) power dissipation, (3) precedence constraints among tests and (4) area overhead constraints [2]. Also, test scheduling is necessary to run intra-core and inter-core tests in certain order not to impact the initialization and final contents of individual cores.
5.
On-Line Testing
On-line testing addresses the detection of operational faults, and is found in computers that support critical or high-availability applications [23]. The goal of on-line testing is to detect fault effects, that is, errors, and take appropriate corrective action. On-line testing can be performed by external or internal monitoring, using either hardware or software; internal monitoring is referred to as self-testing. Monitoring is internal if it takes place on the same substrate as the circuit under test (CUT); nowadays, this usually means inside a single ICa system-on-a-chip (SOC). There are four primary parameters to consider in the design of an on-line testing scheme:
Error coverage (EC): This is defined as the fraction of all modeled errors that are detected, usually expressed in percent. Critical and highly available systems require very good error detection or error coverage to minimize the impact of errors that lead to system failure. Error latency (EL): This is the difference between the first time the error is activated and the first time it is detected. EL is affected by the time taken to perform a test and by how often tests are executed. A related parameter is fault latency (FL), defined as the difference between the onset of the fault and its detection. Clearly, FL EL, so when EL is difficult to determine, FL is often used instead. Space redundancy (SR): This is the extra hardware or firmware needed to perform on-line testing. Time redundancy (TR): This is the extra time needed to perform on-line testing. An ideal on-line testing scheme would have 100% error coverage, error latency of 1 clock cycle, no space redundancy, and no time redundancy. It would require no redesign of the CUT, and impose no functional or structural restrictions on the CUT. To cover all of the fault types described earlier, two different modes of on-line testing are employed: concurrent testing which takes place during normal system operation, and non-concurrent testing which takes place while normal operation is temporarily suspended. These operating modes must often be overlapped to provide a comprehensive on-line testing strategy at acceptable cost.
5.1 Non-concurrent testing

This form of testing is either event-triggered (sporadic) or time-triggered (periodic), and is characterized by low space and time redundancy. Event-triggered testing is initiated by key events or state changes in the life of a system, such as start-up or shutdown, and its goal is to detect permanent faults. It is usually advisable to detect and repair permanent faults as soon as possible. Event-triggered tests resemble manufacturing tests. Time-triggered testing is activated at predetermined times in the operation of the system. It is often done periodically to detect permanent faults using the same types of tests applied by event triggered testing. This approach is especially useful in systems that run for extended periods, where no significant events occur that can trigger testing. Periodic testing is also essential for detecting intermittent faults. Periodic testing can identify latent design or manufacturing flaws that only appear under the right environmental conditions.
5.2 Concurrent testing

Non-concurrent testing [23] cannot detect transient or intermittent faults whose effects disappear quickly. Concurrent testing, on the other hand, continuously checks for errors due to such faults. However, concurrent testing is not by itself particularly useful for diagnosing the source of errors, so it is often combined with diagnostic software. It may also be combined with nonconcurrent testing to detect or diagnose complex faults of all types. A common method of providing hardware support for concurrent testing, especially for detecting control errors, is a watchdog timer. This is a counter that must be reset by the system on a repetitive basis to indicate that the system is functioning properly. A watchdog timer is based on the assumption that the system is fault-freeor at least aliveif it is able to perform the simple task of resetting the timer at appropriate intervals, which implies that control flow is correctly traversing timer reset points. Version 2 EE IIT, Kharagpur 12
For critical or highly available systems, it is essential to have a comprehensive approach to on-line testing that covers all expected permanent, intermittent, and transient faults. In recent years, built-in-self-test (BIST) has emerged as an important method for testing manufacturing faults, and it is increasingly promoted for on-line testing as well.
6.
Test Pattern Generation
6.1 Test Plan

Test plans are generated to verify the device specification, which comprise of the decision on test type, fault coverage, test time etc. For example, the test pattern generator and response analyzer may reside on an automatic test equipment (ATE) or on-chip, depending on the test environment. In the case of production testing in an industry, ATE may be the option, while on-site testing may require on-chip testers (BIST).
6.2 Test Programming

The test program comprises modules for the generation of the test vectors and the corresponding expected responses from a circuit with normal behavior. CAD tools are used to automate the generation of optimized test vectors for the purpose [1,24]. Figure. 38.3 illustrates the basic steps in the development of a test program. Chip specifications Test generation Logic design (from simulators) Vectors Test plan Test types Test Program Generator Test program Fig. 38.3 Test program generation Physical design Pin assignments
Timing specs
6.3 Test Pattern Generation

Test pattern generation is the process of generating a (minimal) set of input patterns to stimulate the inputs of a circuit, such that detectable faults can be sensitized and their effects can be propagated to the output. The process can be done in two phases: (1) derivation of a test, and (2) application of a test. For (1), appropriate models for the circuit (gate or transistor level) and faults are to be decided. Construction of the test is to be accomplished in a manner such that the output signal from a faulty circuit is different from that of a good circuit. This can be computationally very expensive, but the task is to be performed offline and only once at the end of the design stage. The generation of a test set can be obtained either by algorithmic methods Version 2 EE IIT, Kharagpur 13
(with or without heuristics), or by pseudo-random methods. On the other hand, for (2), a test is subsequently applied many times to each integrated circuit and thus must be efficient both in space (storage requirements for the patterns) and in time. The main considerations in evaluating a test set are: (i) the time to construct a minimal test set; (ii) the size of the test set; (iii) the time involved to carry out the test; and (iv) the equipment required (if external). Most algorithmic test pattern generators are based on the concept of sensitized paths. The Sensitized Path Method is a heuristic approach to generating tests for general combinational logic networks. The circuit is assumed to have only a single fault in it. The sensitized path method consists of two parts: 1. The creation of a SENSITIZED PATH from the fault to the primary output. This involves assigning logic values to the gate inputs in the path from the fault site to a primary output, such that the fault effect is propagated to the output. 2. The JUSTIFICATION operation, where the assignments made to gate inputs on the sensitized path is traced back to the primary inputs. This may require several backtracks and iterations. In the case of sequential circuits the same logic is applied but before that the sequential elements are explicitly driven to a required state using scan based design-for-test (DFT) circuitry [1,24]. The best-known algorithms are the D-algorithm, PODEM and FAN [1,24]. Three steps can be identified in most automatic test pattern generation (ATPG) programs: (a) listing the signals on the inputs of a gate controlling the line on which a fault should be detected; (b) determining the primary input conditions necessary to obtain these signals (back propagation) and sensitizing the path to the primary outputs such that the signals and faults can be observed; (c) repeating this procedure until all detectable faults in a given fault set have been covered.
6.4 ATPG for Hardware-Software Covalidation

Several automatic test generation (ATG) approaches have been developed which vary in the class of search algorithm used, the fault model assumed, the search space technique used, and the design abstraction level used. In order to perform test generation for the entire system, both hardware and software component behaviors must be described in a uniform manner. Although many behavioral formats are possible, ATG approaches have focused on CDFG and FSM behavioral models. Two classes of search algorithms have been explored, fault directed and coverage directed. Fault directed techniques successively target a specific fault and construct a test sequence to detect that fault. Each new test sequence is merged with the current test sequence (typically through concatenation) and the resulting fault coverage is evaluated to determine if test generation is complete. Fault directed algorithms have the advantage that they are complete in the sense that a test sequence will be found for a fault if a test sequence exists, assuming that sufficient CPU time is allowed. For test generation, each CDFG path can be associated with a set of constraints which must be satisfied to traverse the path. Because the operations found in a hardware-software description can be either boolean or arithmetic, the solution method chosen must be able to handle both types of operations. Constraint logic programming (CLP) techniques [27] are capable to handle a broad range of constraints including non-linear constraints on both boolean and arithmetic variables. State machine testing has been accomplished by defining a transition tour which is a path which traverses each state machine transition at least once 26ransition tours have been generated by iteratively improving an existing partial tour by Version 2 EE IIT, Kharagpur 14
concatenating on to it the shortest path to an uncovered transition [26 A significant limitation to state machine test generation techniques is the time complexity of the state enumeration process performed during test generation. Coverage directed algorithms seek to improve coverage without targeting any specific fault. These algorithms heuristically modify an existing test set to improve total coverage, and then evaluate the fault coverage produced by the modified test set. If the modified test set corresponds to an improvement in fault coverage then the modification is accepted. Otherwise the modification is either rejected or another heuristic is used to determine the acceptability of the modification. The modification method is typically either random or directed random. An example of such a technique is presented in [25] which uses a genetic algorithm to successively improve the population of test sequences.
7.
Embedded Software Testing
7.1 Software Unit Testing

The unit module is either an isolated function or a class. This is done by the development team, typically the developer and is done usually in the peer review mode. Test data /test cases are developed based on the specification of the module. The test case consists of either: Data-intensive testing: applying a large range of data variation for function parameter values, or Scenario-based testing: exercising different method invocation sequences to perform all possible use cases as found in the requirements. Points of Observation are returned value parameters, object property assessments, and source code coverage. Since it is not easy to track down trivial errors in a complex embedded system, every effort should be made to locate and remove them at the unit-test level.
7.2 Software Integration Testing

All the unit modules are integrated together. Now the module to be tested is a set of functions or a cluster of classes. The essence of integration testing is the validation of the interface. The same type of Points of Control applies as for unit testing (data-intensive main function call or methodinvocation sequences), while Points of Observation focus on interactions between lower-level models using information flow diagrams. First, performance tests can be run that should provide a good indication about the validity of the architecture. As for functional testing, the earlier is the better. Each forthcoming step will then include performance testing. White-box testing is also the method used during that step. Therefore software integration testing is the responsibility of the developer.
7.3 Software Validation Testing

This can be considered one of the activities that occur toward the end of each software integration. Partial use-case instances, which also called partial scenarios, begin to drive the test implementation. The test implementation is less aware of and influenced by the implementation details of the module. Points of Observation include resource usage evaluation since the module Version 2 EE IIT, Kharagpur 15
is a significant part of the overall system. This is considered as white-box testing. Therefore, software validation testing is also the responsibility of the developer.
7.4 System Unit Testing

Now the module to be tested is a full system that consists of user code as tested during software validation testing plus all real-time operating system (RTOS) and platform-related pieces such as tasking mechanisms, communications, interrupts, and so on. The Point of Control protocol is no longer a call to a function or a method invocation, but rather a message sent/received using the RTOS message queues, for example. Test scripts usually bring the module under test into the desired initial state; then generate ordered sequences of samples of messages; and validate messages received by comparing (1) message content against expected messages and (2) date of reception against timing constraints. The test script is distributed and deployed over the various virtual testers. System resources are monitored to assess the system's ability to sustain embedded system execution. For this aspect, grey-box testing is the preferred testing method. In most cases, only a knowledge of the interface to the module is required to implement and execute appropriate tests. Depending on the organization, system unit testing is either the responsibility of the developer or of a dedicated system integration team.
7.5 System Integration Testing

The module to be tested starts from a set of components within a single node and eventually encompasses all system nodes up to a set of distributed nodes. The Points of Control and Observations (PCOs) are a mix of RTOS and network-related communication protocols, such as RTOS events and network messages. In addition to a component, a Virtual Tester can also play the role of a node. As for software integration, the focus is on validating the various interfaces. Grey-box testing is the preferred testing method. System integration testing is typically the responsibility of the system integration team.
7.6 System Validation Testing

The module to be tested is now a complete implementation subsystem or the complete embedded system. The objectives of this final aspect are several: Meet external-actor functional requirements. Note that an external-actor might either be a device in a telecom network (say if our embedded system is an Internet Router), or a person (if the system is a consumer device), or both (an Internet Router that can be administered by an end user). Perform final non-functional testing such as load and robustness testing. Virtual testers can be duplicated to simulate load, and be programmed to generate failures in the system. Ensure interoperability with other connected equipment. Check conformance to applicable interconnection standards. Going into details for these objectives is not in the scope of this article. Black-box testing is the preferred method: The tester typically concentrates on both frequently used and potentially risky or dangerous use-case instances.
8.
Interaction Testing Technique between Hardware and Software in Embedded Systems
In embedded system where hardware and software are combined, unexpected situation can occur owing to the interaction faults between hardware and software. As the functions of embedded system get more complicated, it gets more difficult to detect faults that cause such troubles. Hence, Faults Injection Technique is strongly recommended in a way it observes system behaviors by injecting faults into target system so as to detect interaction faults between hardware and software in embedded system. The test data selection technique discussed in [21] first simulates behaviors of embedded system to software program from requirement specification. Then hardware faults, after being converted to software faults, are injected into the simulated program. And finally, effective test data are selected to detect faults caused by the interactions between hardware and software.
9.
Conclusion
Rapid advances in test development techniques are needed to reduce the test cost of million-gate SOC devices. In this chapter a number of state-of-the-art techniques are discussed for testing of embedded systems. Modular test techniques for digital, mixed-signal, and hierarchical SOCs must develop further to keep pace with design complexity and integration density. The test data bandwidth needs for analog cores are significantly different than that for digital cores, therefore unified top-level testing of mixed-signal SOCs remains major challenge. This chapter also described granular based embedded software testing technique.
References
[1] [2] [3] [4] [5] [6] [7] [8] [9] M. L. Bushnell and V. D Agarwal, Essentials of Electronic Testing Kluwer academic Publishers, Norwell, MA, 2000. E. A. Lee, What's Ahead for Embedded Software?, IEEE Computer, pp 18-26, September, 2000. E. A. Lee, Computing for embedded systems, proceeding of IEEE Instrumentation and Measurement Technology Conference, Budapest, Hungary, May, 2001. Semiconductor Industry Association, International Technology Roadmap for Semiconductors, 2001 Edition, http://public.itrs.net/Files/2001ITRS/Home.html Y. Zorian, E.J.Marinissen, and S.Dey, Testing Embedded-Core Based System Chips, IEEE Computer, 32,52-60,1999 M-C Hsueh, T. K.Tsai, and R. K. Lyer, Fault Injection Techniques and Tools, IEEE Computer, pp75-82, April,1997. V. Encontre, Testing Embedded Systems: Do You Have The GuTs for It? www128.ibm.com/developerworks/rational/library/content/03July/1000/1050/1050.pdf D. D. Gajski and F. Vahid, Specification and design of embedded hardware-software systems, IEEE Design and Test of Computers, vol. 12, pp. 5367, 1995. S. Dey, A. Raghunathan, and K. D. Wagner, Design for testability techniques at the behavioral and register-transfer level, Journal of Electronic Testing: Theory and Applications (JETTA), vol. 13, pp. 7991, October 1998. B. Beizer, Software Testing Techniques, Second Edition, Van Nostrand Reinhold, 1990. Version 2 EE IIT, Kharagpur 17
[10]
[11] [12]
[13]
[14] [15] [16] [17]
[18]
[19]
[20]
[21]
[22] [23] [24] [25]
[26] [27]
G. Al Hayek and C. Robach, From specification validation to hardware testing: A unified method, in International Test Conference, pp. 885893, October 1996. A. von Mayrhauser, T. Chen, J. Kok, C. Anderson, A. Read, and A. Hajjar, On choosing test criteria for behavioral level harware design verification, in High Level Design Validation and Test Workshop, pp. 124130, 2000. L. A. Clarke, A. Podgurski, D. J. Richardson, and S. J. Zeil, A formal evaluation of data flow path selection criteria, IEEE Trans. on Software Engineering, vol. SE-15, pp. 13181332, 1989. S. C. Ntafos, A comparison of some structural testing strategies, IEEE Trans. on Software Engineering, vol. SE-14, pp. 868874, 1988. J. Laski and B. Korel, A data flow oriented program testing strategy, IEEE Trans. on Software Engineering, vol. SE-9, pp. 3343, 1983. Q. Zhang and I. G. Harris, A domain coverage metric for the validation of behavioral vhdl descriptions, in International Test Conference, October 2000. D. Moundanos, J. A. Abraham, and Y. V. Hoskote, Abstraction techniques for validation coverage analysis and test generation, IEEE Transactions on Computers, vol. 47, pp. 214, January 1998. N. Malik, S. Roberts, A. Pita, and R. Dobson, Automaton: an autonomous coveragebased multiprocessor system verification environment, in IEEE International Workshop on Rapid System Prototyping, pp. 168172, June 1997. K.-T. Cheng and A. S. Krishnakumar, Automatic functional test bench generation using the extended finite state machine model, in Design Automation Conference, pp. 16, 1993. J. P. Bergmann and M. A. Horowitz, Improving coverage analysis and test generation for large designs, in International Conference on Computer-Aided Design, pp. 580583, 1999. A. Sung and B. Choi, An Interaction Testing Technique between Hardware and Software in Embedded Systems, Proceedings of Ninth Asia-Pacific Software Engineering Conference, 2002. 4-6 Dec. 2002 Page(s):457 464 IEEE P I500 Web Site. http://grouper.ieee.org/groups/I SOO/. H. Al-Asaad, B. T. Murray, and J. P. Hayes, Online BIST for embedded systems IEEE Design & Test of Computers, Volume 15, Issue 4, Oct.-Dec. 1998 Page(s): 17 24 M. Abramovici, M.A. Breuer, AND A.D. Friedman, Digital Systems Testing and Testable Design, IEEE Press 1990. F. Corno, M. Sonze Reorda, G. Squillero, A. Manzone, and A. Pincetti, Automatic test bench generation for validation of RT-level descriptions: an industrial experience, in Design Automation and Test in Europe, pp. 385389, 2000. R. C. Ho, C. H. Yang, M. A. Horowitz, and D. L. Dill, Architecture validation for processors, in International ymposium on Computer Architecture, pp. 404413, 1995. P. Van Hentenryck, Constraint Satisfaction in Logic Programming, MIT Press, 1989.
Problems
1. 2. 3. 4. How testing differs from verification? What is embedded system? Define hard real-time system and soft real-time system with example. Why testing embedded system is difficult? How hardware testing differs from software testing? Version 2 EE IIT, Kharagpur 18
5. 6. 7. 8.
What is co-testing? Distinguish between defects, errors and faults with example. Calculate the total number of single and multiple stuck-at faults for a logic circuit with n lines. In the circuit shown in Figure 38.4 if any of the following tests detect the fault x1 sa-0? a) (0,1,1,1) b) (1,0,1,1) c) (1,1,0,1) d) (1,0,1,0)
x1 z x2 x3 x4 Fig. P1 9. Define the following fault models using examples where possible: a) Single and multiple stuck-at fault b) Bridging fault c) Stuck-open and stuck-short fault d) Operational fault What is meant by co-validation fault model? Describe different software fault model? Describe the basic structure of core-based testing approach for embedded system. What is concurrent or on-line testing? How it differs from non-concurrent testing? Define error coverage, error latency, space redundancy and time redundancy in view of on-line testing? What is a test vector? How test vectors are generated? Describe different techniques for test pattern generation. Define the following for software testing: a) Software unit testing b) Software integration testing c) Software validation testing d) System unit testing e) System integration testing f) System validation testing
10. 11. 12. 13. 14. 15. 16.
Module 8
Lesson 39
Design for Testability
After going through this lesson the student would be able to Explain the meaning of the term Design for Testability (DFT) Describe some adhoc and some formal methods of incorporating DFT in a system level design Explain the scan-chain based method of DFT Highlight the advantages and disadvantages of scan-based designs and discuss alternatives
Design for Testability 1. Introduction
The embedded system is an information processing system that consists of hardware and software components. Nowadays, the number of embedded computing systems in areas such as telecommunications, automotive electronics, office automation, and military applications are steadily growing. This market expansion arises from greater memory densities as well as improvements in embeddable processor cores, intellectual-property modules, and sensing technologies. At the same time, these improvements have increased the amount of software needed to manage the hardware components, leading to a higher level of system complexity. Designers can no longer develop high-performance systems from scratch but must use sophisticated system modeling tools. The increased complexity of embedded systems and the reduced access to internal nodes has made it not only more difficult to diagnose and locate faulty components, but also the functions of embedded components may be difficult to measure. Creating testable designs is key to developing complex hardware and/or software systems that function reliably throughout their operational life. Testability can be defined with respect to a fault. A fault is testable if there exists a well-specified procedure (e.g., test pattern generation, evaluation, and application) to expose it, and the procedure is implementable with a reasonable cost using current technologies. Testability of the fault therefore represents the inverse of the cost in detecting the fault. A circuit is testable with respect to a fault set when each and every fault in this set is testable. Design-for-testability techniques improve the controllability and observability of internal nodes, so that embedded functions can be tested. Two basic properties determine the testability of a node: 1) controllability, which is a measure of the difficulty of setting internal circuit nodes to 0 or 1 by assigning values to primary inputs (PIs), and 2) observability, which is a measure of the difficulty of propagating a nodes value to a primary output (PO) [1-3]. A node is said to be testable if it is easily controlled and observed. For sequential circuits, some have added predictability, which represents the ability to obtain known output values in response to given input stimuli. The factors affecting predictability include initializability, races, hazards, oscillations, etc. DFT techniques include analog test busses and scan methods. Testability can also be improved with BIST circuitry, where signal generators and analysis circuitry are implemented on chip [1, 3-4]. Without testability, design flaws may escape detection until a
product is in the hands of users; equally, operational failures may prove difficult to detect and diagnose. Increased embedded system complexity makes thorough assessment of system integrity by testing external black-box behavior almost impossible. System complexity also complicates test equipment and procedures. Design for testability should increase a systems testability, resulting in improved quality while reducing time to market and test costs. Traditionally, hardware designers and test engineers have focused on proving the correct manufacture of a design and on locating and repairing field failures. They have developed several highly structured and effective solutions to this problem, including scan design and self test. Design verification has been a less formal task, based on the designers skills. However, designers have found that structured design-for-test features aiding manufacture and repair can significantly simplify design verification. These features reduce verification cycles from weeks to days in some cases. In contrast, software designers and test engineers have targeted design validation and verification. Unlike hardware, software does not break during field use. Design errors, rather than incorrect replication or wear out, cause operational bugs. Efforts have focused on improving specifications and programming styles rather than on adding explicit test facilities. For example, modular design, structured programming, formal specification, and object orientation have all proven effective in simplifying test. Although these different approaches are effective when we can cleanly separate a designs hardware and software parts, problems arise when boundaries blur. For example, in the early design stages of a complex system, we must define system level test strategies. Yet, we may not have decided which parts to implement in hardware and which in software. In other cases, software running on general-purpose hardware may initially deliver certain functions that we subsequently move to firmware or hardware to improve performance. Designers must ensure a testable, finished design regardless of implementation decisions. Supporting hardware-software codesign requires cotesting techniques, which draw hardware and software test techniques together into a cohesive whole.
2.
Design for Testability Techniques
Design for testability (DFT) refers to those design techniques that make the task of subsequent testing easier. There is definitely no single methodology that solves all embedded system-testing problems. There also is no single DFT technique, which is effective for all kinds of circuits. DFT techniques can largely be divided into two categories, i.e., ad hoc techniques and structured (systematic) techniques. DFT methods for digital circuits: Ad-hoc methods Structured methods: Scan Partial Scan Built-in self-test (discussed in Lesson 34) Boundary scan (discussed in Lesson 34)
2.1 Ad-hoc DFT methods

Good design practices learnt through experience are used as guidelines for ad-hoc DFT. Some important guidelines are given below.
Things to be followed
Large circuits should be partitioned into smaller sub-circuits to reduce test costs. One of the most important steps in designing a testable chip is to first partition the chip in an appropriate way such that for each functional module there is an effective (DFT) technique to test it. Partitioning must be done at every level of the design process, from architecture to circuit, whether testing is considered or not. Partitioning can be functional (according to functional module boundaries) or physical (based on circuit topology). Partitioning can be done by using multiplexers and/or scan chains. Test access points must be inserted to enhance controllability & observability of the circuit. Test points include control points (CPs) and observation points (OPs). The CPs are active test points, while the OPs are passive ones. There are also test points, which are both CPs and OPs. Before exercising test through test points that are not PIs and POs, one should investigate into additional requirements on the test points raised by the use of test equipments. Circuits (flip-flops) must be easily initializable to enhance predictability. A power-on reset mechanism controllable from primary inputs is the most effective and widely used approach. Test control must be provided for difficult-to-control signals. Automatic Test Equipment (ATE) requirements such as pin limitation, tri-stating, timing resolution, speed, memory depth, driving capability, analog/mixed-signal support, internal/boundary scan support, etc., should be considered during the design process to avoid delay of the project and unnecessary investment on the equipments. Internal oscillators, PLLs and clocks should be disabled during test. To guarantee tester synchronization, internal oscillator and clock generator circuitry should be isolated during the test of the functional circuitry. The internal oscillators and clocks should also be tested separately. Analog and digital circuits should be kept physically separate. Analog circuit testing is very much different from digital circuit testing. Testing for analog circuits refers to real measurement, since analog signals are continuous (as opposed to discrete or logic signals in digital circuits). They require different test equipments and different test methodologies. Therefore they should be tested separately.
Things to be avoided
Asynchronous(unclocked) logic feedback in the circuit must be avoided. A feedback in the combinational logic can give rise to oscillation for certain inputs. Since no clocking is employed, timing is continuous instead of discrete, which makes tester synchronization virtually impossible, and therefore only functional test by application board can be used.
Monostables and self-resetting logic should be avoided. A monostable (one-shot) multivibrator produces a pulse of constant duration in response to the rising or falling transition of the trigger input. Its pulse duration is usually controlled externally by a resistor and a capacitor (with current technology, they also can be integrated on chip). One-shots are used mainly for 1) pulse shaping, 2) switch-on delays, 3) switch-off delays, 4) signal delays. Since it is not controlled by clocks, synchronization and precise duration control are very difficult, which in turn reduces testability by ATE. Counters and dividers are better candidates for delay control. Redundant gates must be avoided. High fanin/fanout combinations must be avoided as large fan-in makes the inputs of the gate difficult to observe and makes the gate output difficult to control. Gated clocks should be avoided. These degrade the controllability of circuit nodes. The above guidelines are from experienced practitioners. These are not complete or universal. In fact, there are drawbacks for these methods: There is a lack of experts and tools. Test generation is often manual This method cannot guarantee for high fault coverage. It may increase design iterations. This is not suitable for large circuits
2.2 Scan Design Approaches for DFT 2.2.1Objectives of Scan Design

Scan design is implemented to provide controllability and observability of internal state variables for testing a circuit. It is also effective for circuit partitioning. A scan design with full controllability and observability turns the sequential test problem into a combinational one.
2.2.2 Scan Design Requirements

Circuit is designed using pre-specified design rules. Test structure (hardware) is added to the verified design. One (or more) test control (TC) pin at the primary input is required. Flip-flops are replaced by scan flip-flops (SFF) and are connected so that they behave as a shift register in the test mode. The output of one SFF is connected to the input of next SFF. The input of the first flip-flop in the chain is directly connected to an input pin (denoted as SCANIn), and the output of the last flipflop is directly connected to an output pin (denoted as SCANOUT). In this way, all the flip-flops can be loaded with a known value, and their value can be easily Version 2 EE IIT, Kharagpur 6
accessed by shifting out the chain. Figure 39.1 shows a typical circuit after the scan insertion operation. Input/output of each scan shift register must be available on PI/PO. Combinational ATPG is used to obtain tests for all testable faults in the combinational logic. Shift register tests are applied and ATPG tests are converted into scan sequences for use in manufacturing test. Primary Inputs Combinational Logic Primary Outputs SFF SFF SFF SCANOUT
TC SCANIN CLK Fig. 39.1 Scan structure to a design Fig. 39.1 shows a scan structure connected to design. The scan flip-flips (FFs) must be interconnected in a particular way. This approach effectively turns the sequential testing problem into a combinational one and can be fully tested by compact ATPG patterns. Unfortunately, there are two types of overheads associated with this technique that the designers care about very much. These are the hardware overhead (including three extra pins, multiplexers for all FFs, and extra routing area) and performance overhead (including multiplexer delay and FF delay due to extra load).
2.2.3 Scan Design Rules

Only clocked D-type master-slave flip-flops for all state variables should be used. At least one PI pin must be available for test. It is better if more pins are available. All clock inputs to flip-flops must be controlled from primary inputs (PIs). There will be no gated clock. This is necessary for FFs to function as a scan register. Clocks must not feed data inputs of flip-flops. A violation of this can lead to a race condition in the normal mode.
2.2.4 Scan Overheads

The use of scan design produces two types of overheads. These are area overhead and performance overhead. The scan hardware requires extra area and slows down the signals. IO pin overhead: At least one primary pin necessary for test. Area overhead: Gate overhead = [4 nsff/(ng+10nff)] x 100%, where ng = number of combinational gates; nff = number of flip-flops; nsff = number of scan flip-flops; For full scan number of scan flip-flops is equal to the number of original circuit flip-flops. Example: ng = 100k gates, nff = 2k flip-flops, overhead = 6.7%. For more accurate estimation scan wiring and layout area must be taken into consideration. Performance overhead: The multiplexer of the scan flip-flop adds two gate-delays in combinational path. Fanouts of the flip-flops also increased by 1, which can increase the clock period.
2.3
Scan Variations
MUXed Scan Scan path Scan-Hold Flip-Flop Serial scan Level-Sensitive Scan Design (LSSD) Scan set Random access scan
There have been many variations of scan as listed below, few of these are discussed here.
2.3.1 MUX Scan

It was invented at Stanford in 1973 by M. Williams & Angell. In this approach a MUX is inserted in front of each FF to be placed in the scan chain.
C/L X
SI C T DI SI T C
FF
FF
FF
SO
L1 D Q
L2 D Q
Fig. 39.2 The Shift-Register Modification approach Fig. 39.2 shows that when the test mode pin T=0, the circuit is in normal operation mode and when T=1, it is in test mode (or shift-register mode). The scan flip-flips (FFs) must be interconnected in a particular way. This approach effectively turns the sequential testing problem into a combinational one and can be fully tested by compact ATPG patterns. There are two types of overheads associated with this method. The hardware overhead due to three extra pins, multiplexers for all FFs, and extra routing area. The performance overhead includes multiplexer delay and FF delay due to extra load.
2.3.2 Scan Path

This approach is also called the Clock Scan Approach. It was invented by Kobayashi et al. in 1968, and reported by Funatsu et al. in 1975, and adopted by NEC. In this approach multiplexing is done by two different clocks instead of a MUX. It uses two-port raceless D-FFs as shown in Figure 39.3. Each FF consists of two latches operating in a master-slave fashion, and has two clocks (C1 and C2) to control the scan input (SI) and the normal data input (DI) separately. The two-port raceless D-FF is controlled in the following way: For normal mode operation C2 = 1 to block SI and C1 = 0 1 to load DI. For shift register test mode C1 = 1 to block DI and C2 = 0 1 to load SI.
C2 SI DI DO SO
C1 L1
L2
Fig. 39.3 Logic diagram of the two-port raceless D-FF This approach gives a lower hardware overhead (due to dense layout) and less performance penalty (due to the removal of the MUX in front of the FF) compared to the MUX Scan Approach. The real figures however depend on the circuit style and technology selected, and on the physical implementation.
2.3.3 Level-Sensitive Scan Design (LSSD)

This approach was introduced by Eichelberger and T. Williams in 1977 and 1978. It is a latch-based design used at IBM. It guarantees race-free and hazard-free system operation as well as testing. It is insensitive to component timing variations such as rise time, fall time, and delay. It is faster and has a lower hardware complexity than SR modification. It uses two latches (one for normal operation and one for scan) and three clocks. Furthermore, to enjoy the luxury of race-free and hazard-free system operation and test, the designer has to follow a set of complicated design rules. A logic circuit is level sensitive (LS) iff the steady state response to any allowed input change is independent of the delays within the circuit. Also, the response is independent of the order in which the inputs change
D C D L C +L
CD 0 0 0 1 1 0 1 1
+L L L 0 1
Fig. 39.4 A polarity-hold latch DI +L1
C SI
DI C SI A
L1
+L1
+L2 B A B
L2
+L2
Fig. 39.5 The polarity-hold shift-register latch (SRL) LSSD requires that the circuit be LS, so we need LS memory elements as defined above. Figure 39.4 shows an LS polarity-hold latch. The correct change of the latch output (L) is not dependent on the rise/fall time of C, but only on C being `1' for a period of time greater than or equal to data propagation and stabilization time. Figure 39.5 shows the polarity-hold shift-register latch (SRL) used in LSSD as the scan cell. The scan cell is controlled in the following way: Normal mode: A=B=0, C=0 1. SR (test) mode: C=0, AB=10 01 to shift SI through L1 and L2.
Advantages of LSSD
1. Correct operation independent of AC characteristics is guaranteed. 2. FSM is reduced to combinational logic as far as testing is concerned. 3. Hazards and races are eliminated, which simplifies test generation and fault simulation.
Drawbacks of LSSD
1. Complex design rules are imposed on designers. There is no freedom to vary from the overall schemes. It increases the design complexity and hardware costs (4-20% more hardware and 4 extra pins). 2. Asynchronous designs are not allowed in this approach. 3. Sequential routing of latches can introduce irregular structures. 4. Faults changing combinational function to sequential one may cause trouble, e.g., bridging and CMOS stuck-open faults. 5. Test application becomes a slow process, and normal-speed testing of the entire test sequence is impossible. 6. It is not good for memory intensive designs.
2.3.4 Random Access Scan

This approach was developed by Fujitsu and was used by Fujitsu, Amdahl, and TI. It uses an address decoder. By using address decoder we can select a particular FF and either set it to any desired value or read out its value. Figure 39.6 shows a random access structure and Figure 39.7 shows the RAM cell [1,6-7].
PI
Combinational Logic CK TC SCANIN RAM nff bite
PO
SCANOUT
Select Address Log2 nff bites Address Decoder
Fig. 39.6 The Random Access structure
From comb. logic SCANIN
D SD Scan flip-flop (SF
To comb. logic
CK TC SCAN OUT Fig. 39.7 The RAM cell The difference between this approach and the previous ones is that the state vector can now be accessed in a random sequence. Since neighboring patterns can be arranged so that they differ in only a few bits, and only a few response bits need to be observed, the test application time can be reduced. In this approach test length is reduced. This approach provides the ability to `watch' a node in normal operation mode, which is impossible with previous scan methods. This is suitable for delay and embedded memory testing. The major disadvantage of the approach is high hardware overhead due to address decoder, gates added to SFF, address register, extra pins and routing
SE
2.3.5 Scan-Hold Flip-Flop

Special type of scan flip-flop with an additional latch designed for low power testing application. It was proposed by DasGupta et al [5]. Figure 39.8 shows a hold latch cascaded with the SFF. The control input HOLD keeps the output steady at previous state of flip-flop. For HOLD = 0, the latch holds its state and for HOLD = 1, the hold latch becomes transparent. For normal mode operation, TC = HOLD =1 and for scan mode, TC = 1 and Hold = 0. Hardware overhead increases by about 30% due to extra hardware the hold latch. This approach reduces power dissipation and isolate asynchronous part during scan. It is suitable for delay test [8]. Version 2 EE IIT, Kharagpur 13
To SD of next SHFF D Q S SFF T Q CK HO Fig. 39.8 Scan-hold flip-flop (SHFF)
Partial Scan Design

In this approach only a subset of flip-flops is scanned. The main objectives of this approach are to minimize the area overhead and scan sequence length. It would be possible to achieve required fault coverage In this approach sequential ATPG is used to generate test patterns. Sequential ATPG has number of difficulties such as poor initializability, poor controllability and observability of the state variables etc. Number of gates, number of FFs and sequential depth give little idea regarding testability and presence of cycles makes testing difficult. Therefore sequential circuit must be simplified in such a way so that test generation becomes easier. Removal of selected flip-flops from scan improves performance and allows limited scan design rule violations. It also allows automation in scan flip-flop selection and test generation Figure 39.9 shows a design using partial scan architecture [1]. Sequential depth is calculated as the maximum number of FFs encountered from PI line to PO line.
PI Combinational circuit
PO
CK1 FF FF CK2 SFF TC SFF SCANIN Fig. 39.9 Design using partial scan structure SCANOUT
Things to be followed for a partial scan method

A minimum set of flip-flops must be selected, removal of which would eliminate all cycles. Break only the long cycles to keep overhead low. All cycles other than self-lops should be removed.
3. Conclusions
Accessibility to internal nodes in a complex circuitry is becoming a greater problem and thus it is essential that a designer must consider how the IC will be tested and extra structures will be incorporated in the design. Scan design has been the backbone of design for testability in the industry for a long time. Design automation tools are available for scan insertion into a circuit which then generate test patterns. Overhead increases due to the scan insertion in a circuit. In ASIC design 10 to 15 % scan overhead is generally accepted.
References
[1] [2] [3] [4] [5] M. L. Bushnell and V. D Agarwal, Essentials of Electronic Testing Kluwer academic Publishers, Norwell, MA, 2000. M. Abramovici, M.A. Breuer, and A.D. Friedman, Digital Systems Testing and Testable Design, IEEE Press 1990. V.D. Agrawal, C.R. Kime, and K.K. Saluja, ATutorial on Built-In Self-Test, Part 1: Principles, IEEE Design and Test of Computers,Vol. 10, No. 1, Mar. 1993, pp. 73-82. V.D. Agrawal, C.R. Kime, and K.K. Saluja, ATutorial on Built-In Self-Test, Part 2: Applications, IEEE Design and Test of Computers, Vol. 10, No. 2, June 1993, pp. 69-77. S. DasGupta, R. G. Walther, and T. W. Williams, An Enhencement to LSSD and Some Applications of LSSD in Reliability, in Proc. Of the International Fault-Tolerant Computing Symposium. B. R. Wilkins, Testing Digital Circuits, An Introduction, Berkshire, UK: Van Nostrand Reinhold, 1986[RAM]. T.W.Williams, editor, VLSI Testing. Amsterdam, The Netherlands: North-Holand, 1986 [RAM]. A.Krstic and K-T. Cheng, Delay Fault Testing for VLSI Circuits. Boston: Kluwer Academic Publishers, 1998.
[6] [7] [8]
Review Questions
1. What is Design-for-Testability (DFT)? What are the different kinds of DFT techniques used for digital circuit testing? 2. What are the things that must be followed for ad-hoc testing? Describe drawbacks of adhoc testing. 3. Describe a full scan structure implemented in a digital design. What are the scan overheads? 4. Suppose that your chip has 100,000 gates and 2,000 flip-flops. A combinational ATPG produced 500 vectors to fully test the logic. A single scan-chain design will require about 106 clock cycles for testing. Find the scan test length if 10 scan chains are implemented. Given that the circuit has 10 PIs and 10 POs, and only one extra pin can be added for test, how much more gate overhead will be needed for the new design? 5. For a circuit with 100000 gates and 2000 flip-flops connected in a single chain, what will be the gate overhead for a scan design where scan-hold flip-flops are used? 6. Calculate the syndromes for the carry and sum outputs of a full adder cell. Determine whether there is any single stuck fault on any input for which one of the outputs is syndrome-untestable. If there is, suggest an implementation possibly with added inputs, which makes the cell syndrome-testable. 7. Describe the operation of a level-sensitive scan design implemented in a digital design. What are design rules to be followed to make the design race-free and hazard-free? What are the advantages and disadvantages of LSSD?
8. Consider the random-access scan architecture. How would you organize the test data to minimize the total test time? Describe a simple heuristic for ordering these data. 9. Make a comparison of different scan variations in terms of scan overhead. 10. Consider the combinational circuit below which has been portioned into 3 cones (two CONE Xs and one CONE Y) and one Exclusive-OR gate.
J A B C D E F CONE Y H CONE X K CONE X G
For those two cones, we have the following information. CONE X has a structure which can be tested 100% by using the following 4 vectors and its output is also specified. A/G 0 0 1 1 B/H 0 1 1 0 C/F 1 1 0 0 OUTPUT 0 0 1 1
CONE Y has a structure which can be tested 100% by using the following 4 vectors and its output is also specified. C 0 0 1 1 D 0 1 0 1 E 1 0 1 1 OUTPUT 0 1 1 0
Derive a smallest test set to test this circuit so that each partition is applied the required 4 test vectors. Also, the XOR gate should be exhaustively tested.
Fill in the blank entries below. (You may not add additional vectors). A 0 0 1 1 B 0 1 1 0 C 1 1 0 0 1 D 1 E F G 0 0 1 1 H J K
Module 8
Lesson 40
Built-In-Self-Test (BIST) for Embedded Systems
After going through this lesson the student would be able to Explain the meaning of the term Built-in Self-Test (BIST) Identify the main components of BIST functionality Describe the various methods of test pattern generation for designing embedded systems with BIST Define what is a Signature Analysis Register and describe some methods to designing such units Explain what is a Built-in Logic Block Observer (BILBO) and describe how to use this block for designing BIST
Built-In-Self-Test (BIST) for Embedded Systems 1. Introduction
BIST is a design-for-testability technique that places the testing functions physically with the circuit under test (CUT), as illustrated in Figure 40.1 [1]. The basic BIST architecture requires the addition of three hardware blocks to a digital circuit: a test pattern generator, a response analyzer, and a test controller. The test pattern generator generates the test patterns for the CUT. Examples of pattern generators are a ROM with stored patterns, a counter, and a linear feedback shift register (LFSR). A typical response analyzer is a comparator with stored responses or an LFSR used as a signature analyzer. It compacts and analyzes the test responses to determine correctness of the CUT. A test control block is necessary to activate the test and analyze the responses. However, in general, several test-related functions can be executed through a test controller circuit. Test Controller Test
ROM
Reference Signature Hard ware pattern generator M U X CUT Output Response Compactor PO Comparator
Good/Faulty Signature
Fig. 40.1 A Typical BIST Architecture As shown in Figure 40.1, the wires from primary inputs (PIs) to MUX and wires from circuit output to primary outputs (POs) cannot be tested by BIST. In normal operation, the CUT receives its inputs from other modules and performs the function for which it was designed. During test mode, a test pattern generator circuit applies a sequence of test patterns to the CUT, Version 2 EE IIT, Kharagpur 3
and the test responses are evaluated by a output response compactor. In the most common type of BIST, test responses are compacted in output response compactor to form (fault) signatures. The response signatures are compared with reference golden signatures generated or stored onchip, and the error signal indicates whether chip is good or faulty. Four primary parameters must be considered in developing a BIST methodology for embedded systems; these correspond with the design parameters for on-line testing techniques discussed in earlier chapter [2]. Fault coverage: This is the fraction of faults of interest that can be exposed by the test patterns produced by pattern generator and detected by output response monitor. In presence of input bit stream errors there is a chance that the computed signature matches the golden signature, and the circuit is reported as fault free. This undesirable property is called masking or aliasing. Test set size: This is the number of test patterns produced by the test generator, and is closely linked to fault coverage: generally, large test sets imply high fault coverage. Hardware overhead: The extra hardware required for BIST is considered to be overhead. In most embedded systems, high hardware overhead is not acceptable. Performance overhead: This refers to the impact of BIST hardware on normal circuit performance such as its worst-case (critical) path delays. Overhead of this type is sometimes more important than hardware overhead.
Issues for BIST

Area Overhead: Additional active area due to test controller, pattern generator, response evaluator and testing of BIST hardware. Pin Overhead: At least 1 additional pin is needed to activate BIST operation. Input MUX adds extra pin overheads. Performance overhead: Extra path delays are added due to BIST. Yield loss increases due to increased chip area. Design effort and time increases due to design BIST. The BIST hardware complexity increases when the BIST hardware is made testable.
Benefits of BIST
It reduces testing and maintenance cost, as it requires simpler and less expensive ATE. BIST significantly reduces cost of automatic test pattern generation (ATPG). It reduces storage and maintenance of test patterns. It can test many units in parallel. It takes shorter test application times. It can test at functional system speed. BIST can be used for non-concurrent, on-line testing of the logic and memory parts of a system [2]. It can readily be configured for event-triggered testing, in which case, the BIST control can be tied to the system reset so that testing occurs during system start-up or shutdown. BIST can also be designed for periodic testing with low fault latency. This requires incorporating a testing process into the CUT that guarantees the detection of all target faults within a fixed time. On-line BIST is usually implemented with the twin goals of complete fault coverage and low fault latency. Hence, the test generation (TG) and response monitor (RM) are generally designed Version 2 EE IIT, Kharagpur 4
to guarantee coverage of specific fault models, minimum hardware overhead, and reasonable set size. These goals are met by different techniques in different parts of the system. TG and RM are often implemented by simple, counter-like circuits, especially linear-feedback shift registers (LFSRs) [3]. The LFSR is simply a shift register formed from standard flip-flops, with the outputs of selected flip-flops being fed back (modulo-2) to the shift registers inputs. When used as a TG, an LFSR is set to cycle rapidly through a large number of its states. These states, whose choice and order depend on the design parameters of the LFSR, define the test patterns. In this mode of operation, an LFSR is seen as a source of (pseudo) random tests that are, in principle, applicable to any fault and circuit types. An LFSR can also serve as an RM by counting (in a special sense) the responses produced by the tests. An LFSR RMs final contents after applying a sequence of test responses forms a fault signature, which can be compared to a known or generated good signature, to see if a fault is present. Ensuring that the fault coverage is sufficiently high and the number of tests is sufficiently low are the main problems with random BIST methods. Two general approaches have been proposed to preserve the cost advantages of LFSRs while making the generated test sequence much shorter. Test points can be inserted in the CUT to improve controllability and observability; however, they can also result in performance loss. Alternatively, some determinism can be introduced into the generated test sequence, for example, by inserting specific seed tests that are known to detect hard faults. A typical BIST architecture using LFSR is shown in Figure 40.2 [4]. Since the output patterns of the LFSR are time-shifted and repeated, they become correlated; this reduces the effectiveness of the fault detection. Therefore a phase shifter (a network of XOR gates) is often used to decorrelate the output patterns of the LFSR. The response of the CUT is usually compacted by a multiple input shift register (MISR) to a small signature, which is compared with a known faultfree signature to determine whether the CUT is faulty. Scan chain 1 (/bits) LFSR
. . .
Phase shifter
Scan chain 2 (/bits)
MISR
. . .
Scan chain n (/bits)
Fig. 40.2 A generic BIST architecture based on an LFSR, an MISR, and a phase shifter
2.
BIST Test Pattern Generation Techniques
2.1 Stored patterns

An automatic test pattern generation (ATPG) and fault simulation technique is used to generate the test patterns. A good test pattern set is stored in a ROM on the chip. When BIST is activated, test patterns are applied to the CUT and the responses are compared with the corresponding stored patterns. Although stored-pattern BIST can provide excellent fault coverage, it has limited applicability due to its high area overhead.
2.2 Exhaustive patterns

Exhaustive pattern BIST eliminates the test generation process and has very high fault coverage. To test an n-input block of combinational logic, it applies all possible 2n-input patterns to the block. Even with high clock speeds, the time required to apply the patterns may make exhaustive pattern BIST impractical for a circuit with n>20.
DQ1
DQ2
DQ3
Clock Reset
Q1 Q2 Q3
Fig. 40.3 Exhaustive pattern generator
2.3 Pseudo-exhaustive patterns

In pseudo-exhaustive pattern generation, the circuit is partitioned into several smaller subcircuits based on the output cones of influence, possibly overlapping blocks with fewer than n inputs. Then all possible test patterns are exhaustively applied to each sub-circuit. The main goal of pseudo-exhaustive test is to obtain the same fault coverage as the exhaustive testing and, at the same time, minimize the testing time. Since close to 100% fault coverage is guaranteed, there is no need for fault simulation for exhaustive testing and pseudo-exhaustive testing. However, such a method requires extra design effort to partition the circuits into pseudo-exhaustive testable sub-circuits. Moreover, the delivery of test patterns and test responses is also a major consideration. The added hardware may also increase the overhead and decrease the performance.
Five-Bit Binary Counter 1 0 for Counter 1 1 for Counter 2 Five-Bit Binary Counter 2
X1 X2 X3
2-Bit X4 2-1 X5 MUX
2 6 3 1 4 7 5 f h
X6 X7 X8
Fig. 40.4 Pseudo-exhaustive pattern generator Version 2 EE IIT, Kharagpur 6
Circuit partitioning for pseudo-exhaustive pattern generation can be done by cone segmentation as shown in Figure 40.4. Here, a cone is defined as the fan-ins of an output pin. If the size of the largest cone in K, the patterns must have the property to guarantee that the patterns applied to any K inputs must contain all possible combinations. In Figure 40.4, the total circuit is divided into two cones based on the cones of influence. For cone 1 the PO h is influenced by X1, X2, X3, X4 and X5 while PO f is influenced by inputs X4, X5, X6, X7 and X8. Therefore the total test pattern needed for exhaustive testing of cone 1 and cone 2 is (25 +25) = 64. But the original circuit with 8 inputs requires 28 = 256 test patterns exhaustive test.
2.4 Pseudo-Random Pattern Generation

A string of 0s and 1s is called a pseudo-random binary sequence when the bits appear to be random in the local sense, but they are in someway repeatable. The linear feedback shift register (LFSR) pattern generator is most commonly used for pseudo-random pattern generation. In general, this requires more patterns than deterministic ATPG, but less than the exhaustive test. In contrast with other methods, pseudo-random pattern BIST may require a long test time and necessitate evaluation of fault coverage by fault simulation. This pattern type, however, has the potential for lower hardware and performance overheads and less design effort than the preceding methods. In pseudorandom test patterns, each bit has an approximately equal probability of being a 0 or a 1. The number of patterns applied is typically of the order of 103 to 107 and is related to the circuit's testability and the fault coverage required. Linear feedback shift register reseeding [5] is an example of a BIST technique that is based on controlling the LFSR state. LFSR reseeding may be static, that is LFSR stops generating patterns while loading seeds, or dynamic, that is, test generation and seed loading can proceed simultaneously. The length of the seed can be either equal to the size of the LFSR (full reseeding) or less than the LFSR (partial reseeding). In [5], a dynamic reseeding technique that allows partial reseeding is proposed to encode test vectors. A set of linear equations is solved to obtain the seeds, and test vectors are ordered to facilitate the solution of this set of linear equations.
hn-1 D FF Xn-1
hn-2 D FF Xn-2
h2 D FF X1
h1 D FF X0
Fig. 40.5 Standard Linear Feedback Shift Register Figure 40.5 shows a standard, external exclusive-OR linear feedback shift register. There are n flip-flops (Xn-1,X0) and this is called n-stage LFSR. It can be a near-exhaustive test pattern generator as it cycles through 2n-1 states excluding all 0 states. This is known as a maximal length LFSR. Figure 40.6 shows the implementation of a n-stage LFSR with actual digital circuit. [1]
hn-1 D Q n-1 x Xn-1 Clock
hn-2 D Q n-2 x Xn-2
h2
h1 D Q x X1
D Q 1 X0
Fig. 40.6 n-stage LFSR implementation with actual digital circuit
2.5 Pattern Generation by Counter

In a BIST pattern generator based on a folding counter, the properties of the folding counter are exploited to find the seeds needed to cover the given set of deterministic patterns. Width compression is combined with reseeding to reduce the hardware overhead. In a two-dimensional test data compression technique an LFSR and a folding counter are combined for scan-based BIST. LFSR reseeding is used to reduce the number of bits to be stored for each pattern (horizontal compression) and folding counter reseeding is used to reduce the number of patterns (vertical compression).
2.6 Weighted Pseudo-random Pattern Generation

Bit-flipping [9], bit-fixing, and weighted random BIST [1,8] are example of techniques that rely on altering the patterns generated by LFSR to embed deterministic test cubes. A hybrid between pseudorandom and stored-pattern BIST, weighted pseudorandom pattern BIST is effective for dealing with hard-to-detect faults. In a pseudorandom test, each input bit has a probability of 1/2 of being either a 0 or a 1. In a weighted pseudorandom test, the probabilities, or input weights, can differ. The essence of weighted pseudorandom testing is to bias the probabilities of the input bits so that the tests needed for hard-to-detect faults are more likely to occur. One approach uses software that determines a single or multiple weight set based on a probabilistic analysis of the hard-to detect faults. Another approach uses a heuristic-based initial weight set followed by additional weight sets produced with the help of an ATPG system. The weights are either realized by logic or stored in on-chip ROM. With these techniques, researchers obtained fault coverage over 98% for 10 designs, which is the same as the coverage of deterministic test vectors. In hybrid BIST method based on weighted pseudorandom testing, a weight of 0, 1, or (unbiased) is assigned to each scan chain in CUT. The weight sets are compressed and stored on the tester. During test application, an on-chip lookup table is used to decompress the data from the tester and generate weight sets. In order to reduce the hardware overhead, scan cells are carefully reordered and a special ATPG approach is used to generate suitable test cubes.
DQ X7
DQ X6
DQ X5
DQ X4
DQ DQ X3 X2
DQ X1
DQ X0
Weight W1 select W2
1/16
1/8 1 of 4 MUX
1/4
1/2
Inversion Fig. 40.7 Weighted pseudo-random pattern generator
LFSR
0 0 123 193 61 114 228 92 25
D Q
D Q
D Q
D Q
D Q
D Q
D Q
1/8
3/4 1/2 7/8 1/2 (a)
0.8
0.6
0.8
0.4 (b)
0.5
0.3
0.3
Fig. 40.8 weighted pseudorandom patterns. Figure 40.7 shows a weighted pseudo-random pattern generator implemented with programmable probabilities of generating zeros and ones at the PIs. As we know, LFSR generates pattern with equal probability of 1s and 0s. As shown in Figure 40.8 (a), if a 3-input AND gate is used, the probability of 1s becomes 0.125. If a 2-input OR gate is used, the probability becomes 0.75. Second, one can use cellular automata to produce patterns of desired weights as shown in Figure 40.8(b).
2.7 Cellular Automata for Pattern Generation

Cellular automata are excellent for pattern generation, because they have a better randomness distribution than LFSRs. There is no shift induced bit value correlation. A cellular automaton is a collection of cells with regular connections. Each pattern generator cell has few logic gates, a flip-flop and is connected only to its local neighbors. If Ci is the state of the current CA cell, Ci+1 and Ci-1 are the states of its neighboring cells. The next state of cell Ci is determined by (Ci-1, Ci , and Ci+1). The cell is replicated to produce cellular automaton. The two commonly used CA structures are shown in Figure 40.9.
0 Fca D Q Fca DD QQ Fca D Q Fca D Q Fca D Q Fca D Q
(a) CA with null boundary conditions
Fca D Q
Fca DD QQ
Fca D Q
Fca D Q
Fca D Q
Fca D Q
(b) CA with null cyclic boundary conditions Fig. 40.9 The structure of cellular automata In addition to an LFSR, a straightforward way to compress the test response data and produce a fault signature is to use an FSM or an accumulator. However, the FSM hardware overhead and accumulator aliasing are difficult parameters to control. Keeping the hardware overhead acceptably low and reducing aliasing are the main difficulty in RM design.
2.9 Comparison of Test Generation Strategies

Implementing a BIST strategy, the main issues are fault coverage, hardware overhead, test time overhead, and design effort. These four issues have very complicated relationship. Table 1 summarizes the characteristics of the test strategies mentioned earlier based on the four issues. Table 7.1 Comparison of different test strategies Test Generation Methodology Stored Pattern Exhaustive Pseudo-exhaustive Pseudo-random Weighted Pseudo-random Fault Coverage High High High Low Medium Hardware Overhead High Low High Low Medium Test Time Overhead Short Long Medium Long Long Design Effort Large Small Large Small Medium
3.
BIST Response Compression/Compaction Techniques
During BIST, large amount of data in CUT responses are applied to Response Monitor (RM). For example, if we consider a circuit of 200 outputs and if we want to generate 5 million random Version 2 EE IIT, Kharagpur 10
patterns, then the CUT response to RM will be 1 billion bits. This is not manageable in practice. So it is necessary to compact this enormous amount of circuit responses to a manageable size that can be stored on the chip. The response analyzer compresses a very long test response into a single word. Such a word is called a signature. The signature is then compared with the prestored golden signature obtained from the fault-free responses using the same compression mechanism. If the signature matches the golden copy, the CUT is regarded fault-free. Otherwise, it is faulty. There are different response analysis methods such as ones count, transition count, syndrome count, and signature analysis. Compression: A reversible process used to reduce the size of the response. It is difficult in hard ware. Compaction: An irreversible (lossy) process used to reduce the size of the response. a) b) c) d) Parity compression: It computes the parity of a bit stream. Syndrome: It counts the number of 1s in the bit stream. Transition count: It counts the number of times 01 and 10 condition occur in the bit stream. Cyclic Redundancy Check (CRC): It is also called signature. It computes CRC check word on the bit stream.
Signature analysis Compact good machine response into good machine signature. Actual signature generated during testing, and compared with good machine signature. Aliasing: Compression is like a function that maps a large input space (the response) into a small output space (signature). It is a many-to-one mapping. Errors may occur in the in the input bit stream. Therefore, a faulty response may have the signature that matches the to the golden signature and the circuit is reported as the fault-free one. Such a situation is referred as the aliasing or masking. The aliasing probability is the possibility that a faulty response is treated as fault-free. It is defined as follows: Let us assume that the possible input patterns are uniformly distributed over the possible mapped signature values. There are 2m input patterns, 2r signatures and 2n-r input patterns map into given signature. Then the aliasing or masking probability
P(M)= =
Number of erroneos input that map into the golden signature Number of faulty input responses
2 m-r -1 2 m -1 2 m-r m for large m 2 1 = r 2
The aliasing probability is the major considerations in response analysis. Due to the n-to-1 mapping property of the compression, it is unlikely to do diagnosis after compression. Therefore, the diagnosis resoluation is very poor after compression. In addition to the aliasing probability, hardware overhead and hardware compatibility are also important issues. Here, hardware compatibility is referred to how well the BIST hardware can be incorporated in the CUT or DFT. Version 2 EE IIT, Kharagpur 11
3.1 Ones Count

The number of ones in the CUT output response is counted. In this method the number of ones is the signature. It requires a simple counter to accomplish the goal. Figure 40.10 shows the test structure of ones count for a single output CUT. For multiple output ones, a counter for each output or one output at a time with the same input sequence can be used. Input test sequence can be permuted without changing the count.
Test Pattern Clock
CUT
Counter
Fig. 40.10 Ones count compression circuit structure For N-bit test length with r ones the masking probability is shown as follows: Number of masking sequences = 1 r 2N possible output sequences with only one fault free.
N
N r 1 2 P(M) = N ( N ) The masking probabilities: ( 2 1)

It has low masking probability for very small and very large r. It always detects odd number of errors and it may detect even number of errors.
3.2 Transition Count

It is very similar to ones count technique. In this method the number of transitions in the CUT response, zero to one and/or one to zero is counted. Figure 40.11 shows a test structure of transition counting. It has simple hardware DFF with EXOR to detect a transition and counter to count number of transitions. It has less aliasing probability than ones counting. Test sequences cannot be permuted. Permutation of input sequences will change the number of transitions. On the other hand, one can reorder the test sequence to maximize or minimize the transitions, hence, minimize the aliasing probability.
Test Pattern
DFF CUT
Clock
Counter
Fig. 40.11 Transition count compression circuit structure For N-bit test length with r transitions the masking probability is shown as follows: For the test length of N, there are N-1 transitions. Number of masking sequences = 1 r Hence, is the number of sequences that has r transitions. r Since the first output can be either one or zero, therefore, the total number must be multiplied by 2. Therefore total number of sequences with same transition counts : 2 . Again, only one r of them is fault-free.
N 1 N 1
N
N 1 2 1 2 N 1 2 P(M) = ( ) Masking probabilities: 2N 1) (
3.3 Syndrome Testing

Syndrome is defined as the probability of ones of the CUT output response. The syndrome is 1/8 for a 3-input AND gate and 7/8 for a 3-input OR gate if the inputs has equal probability of ones and zeros. Figure 40.12 shows a BIST circuit structure for the syndrome count. It is very similar to ones count and transition count. The difference is that the final count is divided by the number of patterns being applied. The most distinguished feature of syndrome testing is that the syndrome is independent of the implementation. It is solely determined by its function of the circuit. random test pattern Clock Counter
CUT
Syndrome Counter
Syndrome Fig. 40.12 Syndrome testing circuit structure Version 2 EE IIT, Kharagpur 13
The originally design of syndrome test applies exhaustive patterns. Hence, the syndrome is S = K / 2 n , where n is the number of inputs and K is the number of minterms. A circuit is syndrome testable if all single stuck-at faults are syndrome detectable. The interesting part of syndrome testing is that any function can be designed as being syndrome testable.
3.4 LFSR Structure

External and internal type LFSR is used. Both types use D type flip-flop and exclusiveOR logic as shown in Figure 40.13. In external type LFSR, XOR gates are placed outside the shift path. It is also called type 1 LFSR [1]. In internal type LFSRs, also called type 2 LFSR, XOR gates are placed in between the flip-flops. (a) External Type D3 D2 D1 D0 D3 (b) Internal Type D2 D1 D0
Fig. 40.13 Two types of LFSR One of the most important properties of LFSRs is their recurrence relationship. The recurrence relation guarantees that the states of a LFSR are repeated in a certain order. For a given sequence of numbers a0, a1, a2,an,.. We can define a generating function: G(x) = a0 + a1x + a2x2 + + amxm + =
m=0
xm
{am } = {a0 , a1 , a2 ,......}

where ai = 1or 0 depending on the out put stage and time ti . The initial states are a-n, a-n+1,.,a-2, a-1. The recurrent relation defining {am}is
am = ci am i
i =1 n
where ci = 0, means output is not fed back = 1, otherwise
G ( x ) = ci am i x m
m = 0 i =1 n
= ci x i am i x m
i =1 n m=0
1 i = ci x a i x + .... + a1 x + am x m i =1 m=0
i
G ( x) =
c x (a
n i i =1 i
x i + .... + a1 x 1 )
n
1 ci x i
i =1
n
G(x) has been expressed in terms of the initial state and the feedback coefficients. The denominator of the polynomial G(x), f ( x ) = 1 ci x i is called the characteristic polynomial of
i =1
the LFSR.
3.5 LFSR for Response Compaction: Signature Analysis

It uses cyclic redundancy check code (CRCC) generator (LFSR) for response compacter In this method, data bits from circuit Pos to be compacted as a decreasing order coefficient polynomial CRCC divides the PO polynomial by its characteristic polynomial that leaves remainder of division in LFSR. LFSR must be initialized to seed value (usually 0) before testing. After testing, signature in LFSR is compared to known good machine signature For an output sequence of length N, there is a total of 2N-1 faulty sequence. Let the input sequence is represented as P(x) as P(x)=Q(X)G(x)+R(x). G(x) is the characteristic polynomial; Q(x) is the quotient; and R(x) is the remainder or signature. For those aliasing faulty sequence, the remainder R(x) will be the same as the fault-free one. Since, P(x) is of order N and G(x) is of order r, hence Q(x) has an order of N-r. Hence, there are 2N-r possible Q(x) or P(x). One of them is fault-free. Therefore, the aliasing probability is shown as follows:
2N r 1 r 2 for large N. Masking probabilities is independent of input sequence. P(M ) = N 2 1

Figure 40.14 illustrates a modular LFSR as a response compactor.
Characteristics Polynomial x5 + x3 + x + 1 01010001 D Q 1 CLOCK X0 X1 X2 X3 X4 D Q x D Q x2 D Q x3 D Q x4
Fig. 40.14 Modular LFSR as a response compactor Version 2 EE IIT, Kharagpur 15
Any divisor polynomial G(x) with two or more non-zero coefficients will detect all single-bit errors.
3.6 Multiple-Input Signature Register (MISR)

The problem with ordinary LFSR response compacter is too much hardware overhead if one of these is put on each primary output (PO). Multiole-input signature register (MISR) is the solution that compacts all outputs into one LFSR. It works because LFSR is linear and obeys superposition principle. All responses are superimposed in one LFSR. The final remainder is XOR sum of remainders of polynomial divisions of each PO by the characteristic polynomial.
Golden signature m L F S R . . . C U T M I S R Si(x) Test patterns Response Ri(x) Signature Analyzer
. . .
Fig. 40.15 Multiple input signature register
Figure 40.15 illustrates a m-stage MISR. After test cycle i, the test responses are stable on CUT outputs, but the shifting clock has not yet been applied. Ri(x)= (m-1)th polynomial representing the test responses after test cycle i. Si(x)=polynomial representing the state of the MISR after test cycle i. Ri ( x ) = ri , m 1 x m 1 + ri ,m 2 x m 2 + ........ + ri ,1 x + ri ,0 Si ( x ) = Si , m 1 x m 1 + Si ,m 2 x m 2 + ........ + Si ,1 x + Si ,0 G ( x ) is the characteristic polynomial Assume initial state of MISR is 0. So, S0 ( x ) = 0
S1 ( x ) = R0 ( x ) + xS0 ( x ) mod G ( x ) = R0 ( x ) S 2 ( x ) = R1 ( x ) + xS1 ( x ) mod G ( x ) = R1 ( x ) + R0 ( x ) mod G ( x ) . . S n ( x ) = x n 1 R0 ( x ) + x n 2 R1 ( x ) + ....... + xRn 2 ( x ) + Rn 1 ( x ) mod G ( x )
Si +1 ( x ) = Ri ( x ) + xSi ( x ) mod G ( x )
This is the signature left in MISR after n patterns are applied. Let us consider a n-bit response compactor with m-bit error polynomial. Then the error polynomial is of (m+n-2) degree that Version 2 EE IIT, Kharagpur 16
gives (2m+n-1-1) non-zero values. G(x) has 2n-1-1 nonzero multiples that result m polynomials of degree <=m+n-2. 2n1 1 P( M ) = m+ n 1 1 2 Probability of masking 1 m 2
3.7 Logic BIST Architecture

Test-per-clock system More hardware, less test time. BILBO: Built in logic bloc observer Test-per-scan system. Less hardware, more test time. STUMPS: Self-Test using a MISR and Parallel Shift register. Circular self-test path Lowest hardware, lowest fault coverage.
3.7.1 Test-Per-Clock BIST

Two different test-per-clock BIST structures are shown in Figure 40.16. For every test clock, LFSR generates a test vector and Signature Analyzer (MISR) compresses a response vector. In every clock period some new set of faults is tested. This system requires more hardware. It takes less test time. It can be used for exhaustive test, pseudo-exhaustive test, pseudorandom testing, and weight pseudorandom testing.
LFSR LFSR Shift Register
CUT
CUT
MISR Fig. 40.16 Test-Per-Clock BIST structure
MISR
3.7.2 Built-in Logic Block Observer (BILBO)[1]

Built-in logic block observation is a well known approach for pipelined architecture. It adds some extra hardware to the existing registers (D flip-flop, pattern generator, response compacter, & scan chain) to make them multifunctional. All FFs are reset to 0. The circuit diagram of a BILBO module is shown in Figure 40.17. The BILBO has two control signals (B1 and B2).
B1 B2 S1 Clock MUX 0 1
D1
D2
Dn-1
Dn
DQ C Q1
DQ C Q2
DQ C Qn-1
D Q SO C Qn
Fig. 40.17 BILBO Example
Four different modes of BILBO operation

(a) (b) (c) (d) Scan-in-Scan-out: shift register Normal register mode: PIPO register Pattern generator mode: LFSR Response compactor mode: MISR
3.7.3 BILBO Usage for multi-CUT structure [1]

As shown in Figure 40.18, in this BILBO structure, multiple modules can be tested simultaneously. The total operation is done in two phase as stated below.
C U T A B B C I I U L L T B B B O O 2 1 (a) Example test configuration. C U T C M I S R
L F S R
Fig. 40.18 Circuit configured with BILBO Phase 1
In this mode of operation BILBO1 operates in MISR mode and BILBO2 operates in LFSR mode. CUT A and CUT C are tested in parallel.
Phase 2
In this of operation BILBO1 operates in LFSR mode and BILBO2 operates in MISR mode. Only CUT B is tested in this mode of operation.
3.7.4 Test-Per-Scan BIST

Instead of using LFSR and MISR for every input/output pins, this approach combine LFSR/MISR with shift register to minimize the hardware overhead. Figure 40.19 shows the basic circuit structure of a test-per-scan BIST. In BIST mode, LFSR generates test vectors and shifted to the inputs of the CUT via scan register. At the same time, the response are scanned in and compressed by the LFSR. Due to the use of scan chain for the delivery of test patterns and Version 2 EE IIT, Kharagpur 18
responses, the test speed is much slower than the test-per-clock approach. The clocks required for a test cycle is the maximal of the scan stages of input and output scan registers. Also fall in this category include CEBS, LOCST, and STUMP.
SI LFSR Scan Register SRI LFSR SI Scan Register SRI
CUT SO MISR Scan Register SRO (a) Simple system MISR SO
CUT
Scan Register SRO (b) Alternative system
Fig. 40.19 Basic test-per-scan structure
3.7.5 Self-Testing Using MISR and Parallel Shift register sequence generator (STUMP)
The architecture of the self-testing using MISR and parallel SRSG (STUMP) is shown in Figure 40.20. Instead of using only one scan chain, it uses multiple scan chains to minimize the test time. Since the scan chains may have different lengths, the LFSR runs for N cycles (the length of the longest scan chain) to load all the chains. For such a design, the internal type LFSR is preferred. If the external type is used, the difference between two LFSR output bits is only the time shift. Hence, the correlation between two scan chains can be very high.
Pseudo-Random Test Pattern Generator
Input Phase Shifting Network
SR1
CUT
SR2
SR n-1
CUT
SR n
MISR Fig. 40.20 STUMPS test-per-scan testing system
Test Procedure of STUMP 1. 2. 3. 4. Scan in patterns from LFSR to all scan chain. Switch to normal function mode and apply one clock. Scan out chains into MISR. Overlap steps 1 and 3. Version 2 EE IIT, Kharagpur 19
4.
BIST for Structured Circuits
Structured design techniques are the keys to the high integration of VLSI circuits. The structured circuits include read only memories (ROM), random access memories (RAM), programmable logic array (PLA), and many others. In this section, we would like to focus on PLAs because they are tightly coupled with the logic circuits. While, memories are usually categorized as different category. Due to the regularity of the structure and the simplicity of the design, PLAs are commonly used in digital systems. PLAs are efficient and effective for the implementation of arbitrary logic functions, combinational or sequential. Therefore, in this section, we would like to discuss the BIST for PLAs. A PLA is conceptually a two level AND-OR structure realization of Boolean function. Figure 40.21 shows a general structure of a PLA. A PLA typically consists of three parts, input decoders, the AND plane, the OR plane, and the output buffer. The input decoders are usually implemented as single-bit decoders which produce the direct and the complement form of inputs. The AND plane is used to generate all the product terms. The OR plane sum the required product terms to form the output bits. In the physical implementation, they are implemented as NANDNAND or NOR-NOR structure.
AND Plane First NOR Plane
OR Plane product lines Second NOR Plane .

.
...
Input Decoders
...
Output Buffers
...
PLA Inputs
...
PLA Outputs
Fig. 40.21 A general structure of a PLA.
As mentioned earlier in the fault model section, PLAs has the following faults, stuck-at faults, bridging faults, and crosspoint faults. Test generation for PLAs is more difficult than that for the conventional logic. This is because that PLAs have more complicated fault models. Further, a typical PLA may have as many as 50 inputs, 67 inputs, and 190 product terms [10-11]. Functional testing of such PLAs can be a difficult task. PLAs often contain unintentional and unidentifiable redundancy which might cause fault masking. Further more, PLAs are often embedded in the logic which complicates the test application and response observation. Therefore, many people proposed the use of BIST to handle the test of PLAs.
5.
BIST Applications
Manufactures are increasingly employing BIST in real products. Examples of such applications are given to illustrate the use of BIST in semiconductor, communications, and computer industrial.
5.1 Exhaustive Test in the Intel 80386 [12]

Intel 80386 has BIST logic for the exhaustive test of three control PLAs and three control ROMs. For PLAs, the exhaustive patterns are generated by LFSRs embedded in the input registers. For ROMs, the patterns are generated by the microprogram counter which is part of the normal logic. The largest PLA has 19 input bits. Hence, the test length is 512K clock cycles. The test responses are compressed by MISRs at the outputs. The contents of MISRs are continuously shifted out to an LFSR. At the end of testing, the contents of LFSRs are compared.
5.2 Pseudorandom Test in the IBM RISC/6000 [13]

The RISC/6000 has extensive BIST structure to cover the entire system. In accord with their tradition, RISC/6000 has full serial scan. Hence, the BIST it uses is the pseudorandom testing in the form of STUMPS. For embedded RAMs, it performs self-test and delay testing. For the BIST, it has a on chip processor (COP) on each chip. In COP, there are an LFSR for pattern generation, a MISR for response compression, and a counter for address counting in RAM bist. The COP counts for less than 3% of the chip area.
5.3 Embedded Cache Memories BIST of MC68060 [14]

MC68060 has two test approaches for embedded memories. First it has adhoc direct memory access for manufacturing testing because it has the only memory approach that meets all the design goals. The adhoc direct memory acess uses additional logic to make address, data in, data out, and control line for each memory accessible through package pins. An additional set of control signals selects which memory is activated. The approach makes each memory visible through the chip pins as though it is a stand-alone memory array. For the burn-in test, it builds the BIST hardware around the adhoc test logic. The two-scheme approach is used because it meets the burn-in requirements with little additional logic.
5.4 ALU Based Programmable MISR of MC68HC11 [15]

Broseghini and Lenhert implemented an ALU-Based self-test system on a MC68HC11 Family microcontroller. A fully programmable pseudorandom pattern generator and MISR are used to reduce test length and aliasing probabilities. They added microcodes to configure ALU into a LFSR or MISR. It transforms the adder into a LFSR by forcing the carry input to 0. With such a feature, the hardware overhead is minimized. The overhead is only 25% as compare to the implementation by dedicated hardware.
References
[1] [2] [3] [4] M. L. Bushnell and V. D Agarwal, Essentials of Electronic Testing Kluwer academic Publishers, Norwell, MA, 2000. H. Al-Asaad, B. T. Murray, and J. P. Hayes, Online BIST for embedded systems IEEE Design & Test of Computers, Volume 15, Issue 4, Oct.-Dec. 1998 Page(s): 17 24 M. Abramovici, M.A. Breuer, AND A.D. Friedman, Digital Systems Testing and Testable Design, IEEE Press 1990. R. Zurawski, Embedded Systems Handbook, Taylor & Francis, 2005. Version 2 EE IIT, Kharagpur 21
[5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15]
C. V. Krishna, A. Jalas, and N. A. Tauba, Test vector encoding using partial LFSR reseeding, in Proceeding of the International Test Conference, pp. 885-893, 2001. J. Rajski, J. Tyszer, and N. Zacharia, Test data decompression for multiple scan designs with boundary scan, IEEE Transactions on Computers, 47, pp. 1188-1200, 1998. N. A. Tauba and E.J.MaCluskey, Altering a pseudo-random bit sequence for scan based, in Proceedings of International Test Conference, 1996, pp. 167-175. S. Wang, Low hardware overhead scan based 3-weight weighted random BIST, in Proceedings of International Test Conference, 2001, pp. 868-877. H. J. Wunderlich and G.Kiefer, Bit-flipping BIST, in Proceedings of International Conference on Computer-Aided Design, 1996, pp. 337-343. C.Y. Liu, K.K Saluja, and J.S. Ypadhyaya, BIST-PLA: A Built-in Self-Test Design of Large Programmable Logic Arrays, Proc. 24th Design Automation Conf., June 1987, pp. 385-391. C.Y.Liu and K.K.Saluja, Built -In Self-Test Techniques for Programmable logic Arrays, in VLSI Fault Modeling and Testing Techniques, G. W. Zobrist,ed., Ablex Publishing, Norwood, N.J.,1993. P. Gelsinger, Design and Test of the 80386, IEEE Design & Test of Computers, Vol. 4, No. 3, June 1987, pp.42-50. I.M. Ratiu and H.B. Bakouglu, Pseudorandom Built-In Self-Test Methodology and implementation for the IBM RISC System/6000 Processor, IBM J. Research and Development, Vol. 34. 1990, pp.78-84. A.L. Crouch, M. Pressly, J. Circello, Testability Features of the MC68060 Microprocessor, Proc. Intl Test Conf., 1994, pp. 60-69. J. Broseghini and D.H. Lenhert, An ALU-Based Programmable MISR/Pseudorandom Generator for a MC68HC11 Family Self-Test, Proc. Intl Test Conf., 1993, pp. 349-358.
Problems
1. What is Built-In-Self-Test? Discuss the issues and benefits of BIST. Describe BIST architecture and its operation. 2. Excluding the circuit under test, what are the four basic components of BIST and what function does each component perform? 3. Which two BIST components are necessary for system-level testing and why? 4. What are the different techniques for test pattern generation? 5. Discuss exhaustive and pseudo-exhaustive pattern generation. Give an example to show that pseudo-exhaustive testing requires less number of test pattern than exhaustive testing. 6. What is pseudorandom pattern generation? What is an LFSR? Describe pattern generation using LFSR. 7. Make a comparison of different test strategies based on fault coverage, hardware overhead, test time overhead and design effort. 8. An LFSR based signature register compresses an n-bit input pattern into an m-bit signature. Derive an expression for the probability of aliasing. Clearly state any assumptions you make. 9. Design a weighted pseudo-random pattern generator with programmable weights 1/2, 1/4, 11/32 and 1/16. 10. Prove that the number of 1s in an m-sequence differs from the number of 0s by one.
11. Consider a LFSR based pattern generator where the feedback network is a single XOR gate before the first stage. If the number of (feedback) inputs to the XOR is odd, is it possible for the LFSR to generate maximal length sequence? Justify or contradict. 12. Show the schematic diagram of a 4-bit BILBO register. 13. A given data path has p number of n-bit registers. For having BIST capability, suppose a% of the registers are converted to BILBO. Estimate the percentage overhead in the registers in terms of extra hardware. All gates may be assumed to have unit cost in your calculation. 14. It is said that by adding some extra hardware, a combinational circuit can be made syndrome testable for single stuck-at faults. Illustrate the process for a circuit realizing the Boolean function f = AB + BC. 15. Define the following: a) Compression b) Compaction c) Signature analysis d) Aliasing or masking 16. Describe different response compaction techniques. 17. What are different types of LFSR? What is modular LFSR? What is characteristic polynomial? 18. Implement a standard LFSR for the characteristic polynomial f(x) = x8+x7+x2+1. 19. Given the polynomial P(x)=x4+x2+x+1: a. Design an external feedback LSFR with characteristic polynomial P(x). b. Starting this LFSR in the all 1s state, determine the sequence produced. c. Is this a maximal length LFSR? d. Is the characteristic polynomial primitive? 20. Describe how LFSR is used in signature analysis for response compaction. 21. For an internal feedback Signature Analysis Register (SAR) with characteristic polynomial P(x)=x6+x2+1: a) Draw a logic diagram for the complete register. b) Determine the resultant signature that would be obtained for the following serial sequence of output responses produced by a known good CUT assuming the SAR is initialized to the all 0s state. Give the binary value of the resultant signature as it would be contained in the SAR in your logic diagram above. 101001010010 time 22. What is MISR? Give architecture of an m-stage MISR and derive its signature. What is the masking probability of MISR? 23. Describe with example and diagram what are test-per-clock system and test-per-scan system. What is the difference between them? 24. What is BILBO? Describe BILBO architecture and its operation? 25. Describe how BILBO is implemented in digital circuits? 26. Describe STUMPS testing system and its test procedure. 27. Give some examples of practical BIST application in industry.
Module 8
Lesson 41
Boundary Scan Methods and Standards
After going through this lesson the student would be able to Explain the meaning of the term Boundary Scan List the IEEE 1149 series of standards with their important features Describe the architecture of IEEE 1149.1 boundary scan and explain the functionality of each of its components Explain, with the help of an example, how a board-level design can be equipped with the boundary scan feature Describe the advantages and disadvantages of the boundary scan technique
Boundary Scan Methods and Standards 1. Boundary Scan History and Family
Boundary Scan is a family of test methodologies aiming at resolving many test problems: from chip level to system level, from logic cores to interconnects between cores, and from digital circuits to analog or mixed-mode circuits. It is now widely accepted in industry and has been considered as an industry standard in most large IC system designs. Boundary-scan, as defined by the IEEE Std. 1149.1 standard [1-3], is an integrated method for testing interconnects on printed circuit board that is implemented at the IC level. Earlier, most Printed Circuit Board (PCB) testing was done using bed-of-nail in-circuit test equipment. Recent advances with VLSI technology now enable microprocessors and Application Specific Integrated Circuits (ASICs) to be packaged into fine pitch, high count packages. The miniaturization of device packaging, the development of surface-mounted packaging, double-sided and multi-layer board to accommodate the extra interconnects between the increased density of devices on the board reduces the physical accessibility of test points for traditional bed-of-nails in-circuit tester and poses a great challenge to test manufacturing defects in future. The long-term solution to this reduction in physical probe access was to consider building the access inside the device i.e. a boundary scan register. In 1985, a group of European companies formed Joint European Test Action Group (JETAG) and by 1988 the Joint Test Action Group (JTAG) was formed by several companies to tackle these challenges. The JTAG has developed a specification for boundary-scan testing that was standardized in 1990 by IEEE as the IEEE Std. 1149.1-1990. In 1993 a new revision to the IEEE Std. 1149.1 standard was introduced (1149.1a) and it contained many clarifications, corrections, and enhancements. In 1994, a supplement that contains a description of the boundary-scan Description Language (BSDL) was added to the standard. Since that time, this standard has been adopted by major electronics companies all over the world. Applications are found in high volume, high-end consumer products, telecommunication products, defense systems, computers, peripherals, and avionics. Now, due to its economic advantages, smaller companies that cannot afford expensive in-circuit testers are using boundary-scan. Figure 41.1 gives an overview of the boundary scan family, now known as the IEEE 1149.x standards.
Number IEEE 1149.1
Description
Year
Testing of digital chips and interconnections Std 1149.1 1990 between chips
IEEE 1149.1a
Added supplement A. Rewrite of the chapter Std 1149.1a 1993 describing boundary register
IEEE 1149.1b
Supplement B - formal description of the Std 1149.1b 1994 boundary-scan Description Language (BSDL)
IEEE 1149.1c
Corrections, clarifications and enhancements of Std 1149.1 2001 IEEE Std 1149.1a and Std 1149.1b. Combines 1149.1a & 1149.1b
IEEE 1149.2
Extended Digital Serial Interface. It has merged Obsolete with 1149.1 group.
IEEE 1149.3 IEEE 1149.4 IEEE 1149.5
Direct Access Testability Interface Test Mixed-Signal and Analog assemblies
Obsolete Std. 1149.4 1999
Standard Module Test and Maintenance (MTM) Std. 1149.5 1995 Bus Protocol. Deals with test at system level, 1149.2 has merged with.
IEEE 1149.6 IEEE 1532
Includes AC-coupled and/or differential nets.
Std 1149.6 - 2002
It is a derivative standard for in-system 2000 programming (ISP) of digital devices. Fig. 41.1 IEEE 1149 Family
The Std. 1149.1, usually referred to as the digital boundary scan, is the one that has been used widely. It can be divided into two parts: 1149.1a, or the digital Boundary Scan Standard, and 1149.1b, or the Boundary Scan Description Language (BSDL) [1,6]. Std. 1149.1 defines the chip level test architecture for digital circuits, and Std. 1149.1b is a hardware description language used to describe boundary scan architecture. The 1149.2 defines the extended digital series interface in the chip level. It has merged with 1149.1 group. The 1149.3 defines the direct access interface in contrast to 1149.2. Unfortunately this work has been discontinued. 1149.4 IEEE Standard deals with Mixed-Signal Test Bus [4]. This standard extends the test structure defined in IEEE Std. 1149.1 to allow testing and measurement of mixed-signal circuits. The standard describes the architecture and the means of control and access to analog and digital test data. The Std.1149.5 defines the bus protocol at the module level. By combining this level and Std.1149.1a one can easily carry out the testing of a PC board. 1149.6 IEEE Standard for Boundary-Scan Testing of Advanced Digital Networks is released in 2002. This standard augments 1149.1 for the testing of conventional digital networks and 1149.4 for analog networks. The 1149.6 standard defines boundary-scan structures and methods Version 2 EE IIT, Kharagpur 4
required to test advanced digital networks that are not fully covered by IEEE Std. 1149.1, such as networks that are AC-coupled, differential, or both. 1532 IEEE Standard is developed for In-System Configuration of Programmable Devices [5]. This extension of 1149.1 standardizes programming access and methodology for programmable integrated circuit devices. Devices such as CPLDs and FPGAs, regardless of vendor, that implement this standard may be configured (written), read back, erased and verified, singly or concurrently, with a standardized set of resources based upon the algorithm description contained in the 1532 BSDL file. JTAG Technologies programming tools contain support for 1532-compliant devices and automatically generate the applications. Clearly the testing of mixed-mode circuits at the various levels of integration will be a critical test issue for the system-on-chip design. Therefore there is a demand to combine all the boundary scan standards into an integrated one.
2.
Boundary Scan Architecture
The boundary-scan test architecture provides a means to test interconnects between integrated circuits on a board without using physical test probes. It adds a boundary-scan cell that includes a multiplexer and latches, to each pin on the device. Figure 41.2 [1] illustrates the main elements of a universal boundary-scan device. The Figure 41.2 shows the following elements: Test Access Port (TAP) with a set of four dedicated test pins: Test Data In (TDI), Test Mode Select (TMS), Test Clock (TCK), Test Data Out (TDO) and one optional test pin Test Reset (TRST*). A boundary-scan cell on each device primary input and primary output pin, connected internally to form a serial boundary-scan register (Boundary Scan). A TAP controller with inputs TCK, TMS, and TRST*. An n-bit (n >= 2) instruction register holding the current instruction. A 1-bit Bypass register (Bypass). An optional 32-bit Identification register capable of being loaded with a permanent device identification code.
1149.1 Chip Architecture

Boundary-Scan Register
Internal Register Any Digital Chip 1 Bypass Register TDI Identification Register 1 TMS TCK Instruction Register TAP Controller 1 TRST* (optional) Fig. 41.2 Main Elements of a IEEE 1149.1 Device Architecture The test access ports (TAP), which define the bus protocol of boundary scan, are the additional I/O pins needed for each chip employing Std.1149.1a. The TAP controller is a 16-state final state machine that controls each step of the operations of boundary scan. Each instruction to be carried out by the boundary scan architecture is stored in the Instruction Register. The various control signals associated with the instruction are then provided by a decoder. Several Test Data Registers are used to stored test data or some system related information such as the chip ID, company name, etc. TDO
2.1 Bus Protocol

The Test Access Ports (TAPs) are genral purpose ports and provide access to the test function of the IC between the application circuit and the chips I/O pads. It includes four mandatory pins TCK, TDI, TDO and TMS and one optional pin TRST* as described below. All TAP inputs and outputs shall be dedicated connections to the component (i.e., the pins used shall not be used for any other purpose). Test Clock Input (TCK): a clock independent of the system clock for the chip so that test operations can be synchronized between the various parts of a chip. It also synchronizes the operations between the various chips on a printed circuit board. As a convention, the Version 2 EE IIT, Kharagpur 6
test instructions and data are loaded from system input pins on the rising edge of TCK and driven through system output pins on its falling edge. TCK is pulsed by the equipment controlling the test and not by the tested device. It can be pulsed at any frequency (up to a maximum of some MHz). It can be even pulsed at varying rates. Test Data Input (TDI): an input line to allow the test instruction and test data to be loaded into the instruction register and the various test data registers, respectively. Test Data Output (TDO): an output line used to serially output the data from the JTAG registers to the equipment controlling the test. Test Mode Selector (TMS): the test control input to the TAP controller. It controls the transitions of the test interface state machine. The test operations are controlled by the sequence of 1s and 0s applied to this input. Usually this is the most important input that has to be controlled by external testers or the on-board test controller.
Test Reset Input (TRST*): The optional TRST* pin is used to initialize the TAP controller, that is, if the TRST* pin is used, then the TAP controller can be asynchronously reset to a TestLogic-Reset state when a 0 is applied at TRST*. This pin can also be used to reset the circuit under test, however it is not recommended for this application.
2.2 Boundary Scan Cell

The IEEE Std. 1149.1a specifies the design of four test data registers as shown in Figure 41.2. Two mandatory test data registers, the bypass and the boundary-scan resisters, must be included in any boundary scan architecture. The boundary scan register, though may be a little confusing by its name, refers to the collection of the boundary scan cells. The other registers, such as the device identification register and the design-specific test data registers, can be added optionally.
Basic Boundary Scan Cell (BC 1)

Scan Out (SO) = 0, Functional mode Mode = 1, Test mode (for BC_1)
Data In (PI) 0 1
Capture Scan Cell D Clk Q
Update Hold Cell D Clk Q
0 1
Data Out (PO)
Scan in ShiftDR ClockDR (SI)
UpdateDR
C U S
Fig. 41.3 Basic Boundary Scan Cell Version 2 EE IIT, Kharagpur 7
Figure 41.3 [1] shows a basic universal boundary-scan cell, known as a BC_1. The cell has four modes of operation: normal, update, capture, and serial shift. The memory elements are two Dtype flip-flops with front-end and back-end multiplexing of data. It is important to note that the circuit shown in Figure 41.3 is only an example of how the requirement defined in the Standard could be realized. The IEEE 1149.1 Standard does not mandate the design of the circuit, only its functional specification. The four modes of operation are as follows: 1) During normal mode also called serial mode, Data_In is passed straight through to Data_Out. 2) During update mode, the content of the Update Hold cell is passed through to Data_Out. Signal values already present in the output scan cells to be passed out through the device output pins. Signal values already present in the input scan cells will be passed into the internal logic. 3) During capture mode, the Data_In signal is routed to the input Capture Scan cell and the value is captured by the next ClockDR. ClockDR is a derivative of TCK. Signal values on device input pins to be loaded into input cells, and signal values passing from the internal logic to device output pins to be loaded into output cells 4) During shift mode, the Scan_Out of one Capture Scan cell is passed to the Scan_In of the next Capture Scan cell via a hard-wired path. The Test ClocK, TCK, is fed in via yet another dedicated device input pin and the various modes of operation are controlled by a dedicated Test Mode Select (TMS) serial control signal. Note that both capture and shift operations do not interfere with the normal passing of data from the parallel-in terminal to the parallel-out terminal. This allows on the fly capture of operational values and the shifting out of these values for inspection without interference. This application of the boundary-scan register has tremendous potential for real-time monitoring of the operational status of a system a sort of electronic camera taking snapshots and is one reason why TCK is kept separate from any system clocks.
2.3 Boundary Scan Path

At the device level, the boundary-scan elements contribute nothing to the functionality of the internal logic. In fact, the boundary-scan path is independent of the function of the device. The value of the scan path is at the board level as shown in Figure 41.4 [1]. The figure shows a board containing four boundary-scan devices. It is seen that there is an edgeconnector input called TDI connected to the TDI of the first device. TDO from the first device is permanently connected to TDI of the second device, and so on, creating a global serial scan path terminating at the edge connector output called TDO. TCK is connected in parallel to each device TCK input. TMS is connected in parallel to each device TMS input. All cell boundary data registers are serially loaded and read from this single chain.
Chip 1 TMS TCK
Boundary-scan cell TMS TCK
Chip 2 TDI Serial data in
Chip 4 Chip 3 TMS TCK TMS TCK Serial data out TDO TCK TMS
Serial test interconnect
System interconnect
Fig. 41.4 MCM with Serial Boundary Scan Chain The advantage of this configuration is that only two pins on the PCB/MCM are needed for boundary scan data register support. The disadvantage is very long shifting sequences to deliver test patterns to each component, and to shift out test responses. This leads to expensive time on the external tester. As shown in Figure 41.5 [1], the single scan chain is broken into two parallel boundary scan chains, which share a common test clock (TCK). The extra pin overhead is one more pin. As there are two boundary scan chains, so the test patterns are half as long and test time is roughly halved. Here both chains share common TDI and TDO pins, so when the top two chips are being shifted, the bottom two chips must be disabled so that they do not drive their TDO lines. The opposite must hold true when the bottom two chips are being tested.
TDI
TDO
TDI
TDO
TDI
TDO
TDI
TDO
Fig. 41.5 MCM with two parallel boundary scan chains
2.4 TAP Controller

The operation of the test interface is controlled by the Test Access Port (TAP) controller. This is a 16-state finite state-machine whose state transitions are controller by the TMS signal; the statetransition diagram is shown in Figure 41.7. The TAP controller can change state only at the rising edge of TCK and the next state is determined by the logic level of TMS. In other words, the state transition in Figure 41.6 follows the edge with label 1 when the TMS line is set to 1, otherwise the edge with label 0 is followed. The output signals of the TAP controller corresponding to a subset of the labels associated with the various states. As shown in Figure 41.2, the TAP consists of four mandatory terminals plus one optional terminal. The main functions of the TAP controller are: To reset the boundary scan architecture, To select the output of instruction or test data to shift out to TDO, To provide control signals to load instructions into Instruction Register, To provide signals to shift test data from TDI and test response to TDO, and To provide signals to perform test functions such as capture and application of test data.
TCK TMS1 TMS2
TDI
TDO
TAP Controller
TMS TCK TRST* 16-state FSM TAP Controller (Moore machine) ClockDR ShiftDR UpdateDR Reset* Select ClockIR ShiftIR UpdateIR Enable
Fig. 41.6 Top level view of TAP Controller Figure 41.6 shows a top-level view of TAP Controller. TMS and TCK (and the optional TRST*) go to a 16-state finite-state machine controller, which produces the various control signals. These signals include dedicated signals to the Instruction register (ClockIR, ShiftIR, UpdateIR) and generic signals to all data registers (ClockDR, ShiftDR, UpdateDR). The data register that actually responds is the one enabled by the conditional control signals generated at the parallel outputs of the Instruction register, according to the particular instruction. The other signals, Reset, Select and Enable are distributed as follows: Reset is distributed to the Instruction register and to the target Data Register Select is distributed to the output multiplexer Enable is distributed to the output driver amplifier
It must be noted that the Standard uses the term Data Register to mean any target register except the Instruction register
TAP Controller State Diagram

1 Test_Logic Reset 0 Run_Test/ Idle 1 Select DR_Scan 0 1 Capture_DR 0 Shift_DR 1 Exit_DR 0 Pause_DR 1 0 Exit2_DR 1 Update_DR 1 0 0 0 1 0 1 1 Select IR_Scan 0 Capture_IR 0 Shift_IR 1 Exit1_IR 0 Pause_IR 1 Exit2_IR 1 Update_IR 1 0 0 1 0 1
Fig. 41.7 State transition diagram of TAP controller Figure 41.7 shows the 16-state state table for the TAP controller. The value on the state transition arcs is the value of TMS. A state transition occurs on the positive edge of TCK and the controller output values change on the negative edge of TCK. The 16 states can be divided into three parts. The first part contains the reset and idle states, the second and third parts control the operations of the data and instruction registers, respectively. Since the only difference between the second and the third parts are on the registers they deal with, in the following only the states in the first and second parts are described. Similar description on the second part can be applied to the third part. 1. Test-Logic-Reset: In this state, the boundary scan circuitry is disabled and the system is in its normal function. Whenever a Reset* signal is applied to the BS circuit, it also goes back to this state. One should also notice that whatever state the TAP controller is at, it will goes back to this state if 5 consecutive 1's are applied through TMS to the TAP controller. Run-Test/Idle: This is a state at which the boundary scan circuitry is waiting for some test operations such as BIST operations to complete. One typical example is that if a BIST operation requires 216 cycles to complete, then after setting up the initial condition for the BIST operation, the TAP controller will go back to this state and wait for 216 cycles before it starts to shift out the test results. Select-DR-Scan: This is a temporary state to allow the test data sequence for the selected test-data register to be initiated.
2.
3.
4. 5.
Capture-DR: In this state, data can be loaded in parallel to the data registers selected by the current instruction. Shift-DR: In this state, test data are scanned in series through the data registers selected by the current instruction. The TAP controller may stay at this state as long as TMS=0. For each clock cycle, one data bit is shifted into (out of) the selected data register through TDI (TDO). Exit-DR: All parallel-loaded (from the Capture-DR state) or shifted (from the Shift-DR state) data are held in the selected data register in this state. Pause-DR: The BS pauses its function here to wait for some external operations. For example, when a long test data is to be loaded to the chip(s) under test, the external tester may need to reload the data from time to time. The Pause-DR is a state that allows the boundary scan architecture to wait for more data to shift in. Exit2-DR: This state represents the end of the Pause-DR operation, allows the TAP controller to go back to ShiftDR state for more data to shift in. Update-DR: The test data stored in the first stage of boundary scan cells is loaded to the second stage in this state.
6. 7.
8. 9.
2.5 Bypass and Identification Registers

Figure 41.8 shows a typical design for a Bypass register. It is a 1-bit register, selected by the Bypass instruction and provides a basic serial-shift function. There is no parallel output (which means that the Update_DR control has no effect on the register), but there is a defined effect with the Capture_DR control the register captures a hard-wired value of logic 0.
0 D From TDI Q To TDO
ShiftDR ClockDR
Clk
Fig. 41.8 Bypass register
2.6 Instruction Register

As shown in Figure 41.9, an Instruction register has a shift scan section that can be connected between TDI and TDO, and a hold section that holds the current instruction. There may be some decoding logic beyond the hold section depending on the width of the register and the number of different instructions. The control signals to the Instruction register originate from the TAP controller and either cause a shift-in/shift-out through the Instruction register shift section, or cause the contents of the shift section to be passed across to the hold section (parallel Update Version 2 EE IIT, Kharagpur 13
operation). It is also possible to load (Capture) internal hard-wired values into the shift section of the Instruction register. The Instruction register must be at least two-bits long to allow coding of the four mandatory instructions Extest, Bypass, Sample, Preload but the maximum length of the Instruction register is not defined. In capture mode, the two least significant bits must capture a 01 pattern. (Note: by convention, the least-significant bit of any register connected between the device TDI and TDO pins, is always the bit closest to TDO.) The values captured into higher-order bits of the Instruction register are not defined in the Standard. One possible use of these higher-order bits is to capture an informal identification code if the optional 32-bit Identification register is not implemented. In practice, the only mandated bits for the Instruction register capture is the 01 pattern in the two least-significant bits. We will return to the value of capturing this pattern later in the tutorial.
Instruction Register
DR select and control signals routed to selected target register
Decode Logic
Hold register (Holds current instruction)
From TDI
TAP Controller
Scan Register Scan-in new instruction/scan-out capture bits)
To TDO
IR Control
0 1 Higher order bits: current instruction, status bits, informal ident, results of a power-up self test,
Fig. 41.9 Instruction register
2.7 Instruction Set

The IEEE 1149.1 Standard describes four mandatory instructions: Extest, Bypass, Sample, and Preload, and six optional instructions: Intest, Idcode, Usercode, Runbist, Clamp and HighZ. Whenever a register is selected to become active between TDI and TDO, it is always possible to perform three operations on the register: parallel Capture followed by serial Shift followed by parallel Update. The order of these operations is fixed by the state-sequencing design of the TAP controller. For some target Data registers, some of these operations will be effectively null operations, no ops.
Standard Instructions
Instruction Mandatory: Extest Bypass Sample Preload Optional: Intest Boundary scan Idcode identification (initialized state if present) Usercode Identification (for PLDs) Runbist Result register Clamp Bypass (output pins in safe state) HighZ Bypass (output pins in high-Z state) NB. All unused instruction codes must default to Bypass EXTEST: This instruction is used to test interconnect between two chips. The code for Extest used to be defined to be the all-0s code. The EXTEST instruction places an IEEE 1149.1 compliant device into an external boundary test mode and selects the boundary scan register to be connected between TDI and TDO. During this instruction, the boundary scan cells associated with outputs are preloaded with test patterns to test downstream devices. The input boundary cells are set up to capture the input data for later analysis. BYPASS: A device's boundary scan chain can be skipped using the BYPASS instruction, allowing the data to pass through the bypass register. The Bypass instruction must be assigned an all-1s code and when executed, causes the Bypass register to be placed between the TDI and TDO pins. This allows efficient testing of a selected device without incurring the overhead of traversing through other devices. The BYPASS instruction allows an IEEE 1149.1 compliant device to remain in a functional mode and selects the bypass register to be connected between the TDI and TDO pins. The BYPASS instruction allows serial data to be transferred through a device from the TDI pin to the TDO pin without affecting the operation of the device. SAMPLE/PRELOAD: The Sample and Preload instructions, and their predecessor the Sample/Preload instruction, selects the Boundary-Scan register when executed. The instruction sets up the boundary-scan cells either to sample (capture) values or to preload known values into the boundary-scan cells prior to some follow-on operation. During this instruction, the boundary scan register can be accessed via a data scan operation, to take a sample of the functional data entering and leaving the device. This instruction is also used to preload test data into the boundary-scan register prior to loading an EXTEST instruction. INTEST: With this command the boundary scan register (BSR) is connected between the TDI and the TDO signals. The chip's internal core-logic signals are sampled and captured by the BSR cells at the entry to the "Capture_DR" state as shown in TAP state transition diagram. The contents of the BSR register are shifted out via the TDO line at exits from the "Shift_DR" state. As the contents of the BSR (the captured data) are shifted out, new data are sifted in at the entries to the "Shift_DR" state. The new contents of the BSR are applied to the chip's core-logic signals during the "Update_DR" state. Version 2 EE IIT, Kharagpur 15 Selected Data Register Boundary scan (formerly all-0s code) Bypass (initialized state, all-1s code) Boundary scan (device in functional mode) Boundary scan (device in function mode)
IDCODE: This is used to select the Identification register between TDI and TDO, preparatory to loading the internally-held 32-bit identification code and reading it out through TDO. The 32 bits are used to identify the manufacturer of the device, its part number and its version number. USERCODE: This instruction selects the same 32-bit register as IDCODE, but allows an alternative 32 bits of identity data to be loaded and serially shifted out. This instruction is used for dual-personality devices, such as Complex Programmable Logic Devices and Field Programmable Gate Arrays. RUNBIST: An important optional instruction is RunBist. Because of the growing importance of internal self-test structures, the behavior of RunBist is defined in the Standard. The self-test routine must be self-initializing (i.e., no external seed values are allowed), and the execution of RunBist essentially targets a self-test result register between TDI and TDO. At the end of the self-test cycle, the targeted data register holds the Pass/Fail result. With this instruction one can control the execution of the memory BIST by the TAP controller, and hence reducing the hardware overhead for the BIST controller. CLAMP: Clamp is an instruction that uses boundary-scan cells to drive preset values established initially with the Preload instruction onto the outputs of devices, and then selects the Bypass register between TDI and TDO (unlike the Preload instruction which leaves the device with the boundary-scan register still selected until a new instruction is executed or the device is returned to the Test_Logic Reset state). Clamp would be used to set up safe guarding values on the outputs of certain devices in order to avoid bus contention problems, for example. HIGH-Z: It is similar to Clamp instruction, but it leaves the device output pins in a highimpedance state rather than drive fixed logic-1 or logic-0 values. HighZ also selects the Bypass register between TDI and TDO.
3.
On Board Test Controller
So far the test architecture of boundary scan inside the chip under test has been discussed. A major problem remains is "Who is going to control the whole boundary scan test procedure?" In general there are two solutions for this problem: using an external tester and using a special onboard controller. The former is usually expensive because of the involving of an IC tester. The latter provides an economic way to complete the whole test procedure. As clear from the above description, in addition to the test data, the most important signal that a test controller has to provide is the TMS signal. There exist two methods to provide this signal in a board: the star configuration and the ring configuration as shown in Figure 41.10. In the star configuration the TMS is broadcast to all chips. Hence all chips must execute the same operation at any time. For the ring structure, the test controller provides one independent TMS signal for each chip, therefore great flexibility of the test procedure is facilitated.
Application chips Bus master TD0 TDI TMS TCK

TDI TCK TMS TDO TDI TCK TMS TDO #1
Application chips Bus master

TD0 TDI TMS1 TMS2 TMSN TCK TDI TCK TMS TDO TDI TCK TMS TDO #1
#2
#2
TDI TCK TMS TDO
#N
TDI TCK TMS TDO
#N
(a)
(b)
Fig. 41.10 BUS master for chips with BS: (a) star structure, (b) ring structure
4.
How Boundary Scan Testing Is Done
In a board design there usually can be many JTAG compliant devices. All these devices can be connected together to form a single scan chain as illustrated in Figure 41.11, "Single Boundary Scan Chain on a Board." Alternatively, multiple scan chains can be established so parallel checking of devices can be performed simultaneously. Figure 41.11, "Single Boundary Scan Chain on a Board," illustrates the on onboard TAP controllers connected to an offboard TAP control device, such as a personal computer, through a TAP access connector. The offboard TAP control device can perform different tests during board manufacturing without the need of bed-of-nail equipment.
L O G I C TDI BP IR TCK DR TAP TMS TCK TDO TDI
L O G I C TDO BP IR DR TAP TMS TCK TDI
L O G I C TDO BP IR DR TAP TMS
TAP Control Device (Test Software on PC/WS) Figure 11
Test Connector
Fig. 41.11 Single Boundary Scan Chain on a Board
5.
Simple Board Level Test Sequence
One of the first tests that should be performed for a PCB test is called the infra-structure test. This test is used to determine whether all the components are installed correctly. This test relies on the fact that the last two bits of the instruction register (IR) are always ``01''. By shifting out the IR of each device in the chain, it can be determined whether the device is properly installed. This is accomplished through sequencing the TAP controller for IR read. After the infra-structure test is successful, the board level interconnect test can begin. This is accomplished through the EXTEST command. This test can be used to check out `òpens'' and ``shorts'' on the PCB. The test patterns are preloaded into the output pins of the driving devices. Then they are propagated to the receiving devices and captured in the input boundary scan cells. The result can then be shifted out through the TDO pin for analysis. These patterns can be generated and analyzed automatically, via software programs. This feature is normally offered through tools like Automatic Test Pattern Generation (ATPG) or Boundary Scan Test Pattern Generation (BTPG).
6.
Boundary Scan Description Language
Boundary Scan Description Language (BSDL) has been approved as the IEEE Std. 1149.1b (the original boundary scan standard is IEEE Std. 1149.1a) [1,6]. This VHDL compatible Version 2 EE IIT, Kharagpur 18
language can greatly reduce the effort to incorporate boundary scan into a chip, and hence is quite useful when a designer wishes to design boundary scan in his own style. Basically for those parts that are mandatory to the Std. 1149.1a such as the TAP controller and the BYPASS register, the designer does not need to describe them; they can be automatically generated. The designer only has to describe the specifications related to his own design such as the length of boundary scan register, the user-defined boundary scan instructions, the decoder for his own instructions, the I/O pins assignment. In general these descriptions are quite easy to prepare. In fact, currently many CAD tools already implement the boundary scan generation procedure and thus it may even not needed for a designer to write the BSDL file: the tools can automatically generate the needed boundary scan circuitry for any circuit design as long as the I/O of the design is specified. Any manufacturer of a JTAG compliant device must provide a BSDL file for that device. The BSDL file contains information on the function of each of the pins on the device - which are used as I/Os, power or ground. BSDL files describe the Boundary Scan architecture of a JTAGcompliant device, and are written in VHDL. The BSDL file includes: 1. Entity Declaration: The entity declaration is a VHDL construct that is used to identify the name of the device that is described by the BSDL file. 2. Generic Parameter: The Generic parameter specifies which package is described by the BSDL file. 3. Logical Port Description: lists all of the pads on a device, and states whether that pin is an input(in bit;), output(out bit;), bidirectional (inout bit;) or unavailable for boundary scan (linkage bit;). .4. Package Pin Mapping: The Package Pin Mapping shows how the pads on the device die are wired to the pins on the device package. 5. Use statements: The use statement calls VHDL packages that contain attributes, types, constants, etc. that are referenced in the BSDL File. 6. Scan Port Identification: The Scan Port Identification identifies the JTAG pins: TDI, TDO, TMS, TCK and TRST (if used). 7. TAP description: provides additional information on the device's JTAG logic; the Instruction Register length, Instruction Opcodes, device IDCODE, etc. These characteristics are device specific. 8. Boundary Register description: provides the structure of the Boundary Scan cells on the device. Each pin on a device may have up to three Boundary Scan cells, each cell consisting of a register and a latch.
12 D6 D5 D4 D3 D2 D1 CLK C O R E L O G I C Q6 Q5 Q4 Q3 Q2 Q1 D6 13 D5 14 D4 D3 D2 D1 15 16 17 1 12
TAP Controller
6 7 8 9 10 11
C O R E L O G I C
0 1 2 3 4 5
Q6 10 Q5 9 Q4 8 Q3 7 Q2 6 Q1
11
CLK
(a)
2 3 4 5 TDI TCK TMS TDO (b)
Fig. 41.12 Example to illustrate BSDL (a) core logic (b) after BS insertion
7.
Benefits and Penalties of Boundary Scan
The decision whether to use boundary-scan usually involves economics. Designers often hesitate to use boundary-scan due to the additional silicon involved. In many cases it may appear that the penalties outweigh the benefits for an ASIC. However, considering an analysis spanning all assembly levels and all test phases during the system's life, the benefits will usually outweigh the penalties.
Benefits
The benefits provided by boundary-scan include the following:

lower test generation costs reduced test time reduced time to market simpler and less costly testers compatibility with tester interfaces high-density packaging devices accommodation
By providing access to the scan chain I/Os, the need for physical test points on the board is eliminated or greatly reduced, leading to significant savings as a result of simpler board layouts, less costly test fixtures, reduced time on in-circuit test systems, increased use of standard interfaces, and faster time-to-market. In addition to board testing, boundary-scan allows programming almost all types of CPLDs and flash memories, regardless of size or package type, on the board, after PCB assembly. In-system programming saves money and improves throughput by reducing device handling, simplifying inventory management, and integrating the programming steps into the board production line. Version 2 EE IIT, Kharagpur 20
Penalties
The penalties incurred in using boundary-scan include the following:

extra silicon due to boundary scan circuitry added pins additional design effort degradation in performance due to gate delays through the additional circuitry increased power consumption
Boundary Scan Example

Since boundary-scan design is new to many designers, an example of gate count for a circuit with boundary scan is discussed here. This provides an estimate for the circuitry sizes required to implement the IEEE 1149.1 standard, but without the extensions defined in the standard. The example uses a library-based gate array design environment. The gate counts given are based on commercial cells and relate to a 10000 gate design in a 40-pin package. Table 1 gives the gate requirement. Logic Element Variable Size Boundary-scan Register (40 cells) Fixed Sizes TAP controller Instruction Register (2 bits) Bypass Register Miscellaneous Logic Gate Equivalent 680 Approx 131 28 9 20 Approx Total 868 Approx
Table: 1 Gate requirements for a Gate Array Boundary-scan Design It must be noted that in Table 1 the boundary-scan implementation requires 868 gates, requiring an estimated 8 percent overhead. It also be noted that the cells used in this example were created prior to publication of the IEEE 1149.1 standard. If specific cell designs had been available to support the standard or if the vendor had placed the boundary-scan circuitry in areas of the ASIC not available to the user, then the design would have required less.
9.
Conclusion
Board level testing has become more complex with the increasing use of fine pitch, high pin count devices. However with the use of boundary scan the implementation of board level testing is done more efficiently and at lower cost. This standard provides a unique opportunity to simplify the design debug and test processes by enabling a simple and standard means of automatically creating and applying tests at the device, board, and system levels. Boundary scan is the only solution for MCMs and limited-access SMT/ML boards. The standard supports external testing with an ATE. The IEEE 1532-2000 In-System Configuration (ISC) standard makes use of 1149.1 boundary-scan structures within the CPLD and FPGA devices.
References
[1] IEEE-SA Standards Board, 3 Park Avenue, New York, NY 10016-5997, USA, IEEE Standard Test Access Port and Boundary-Scan Architecture, IEEE Std 1149.1-2002, (Revision of IEEE Std 1149.1-1990), http://grouper.ieee.org/groups/1149/1or http://standards.ieee.org/catalog/ Parker, The boundary-scan handbook: analog and digital, Kluwer Academic Press, 1998 (2nd Edition). M. L. Bushnell and V. D Agarwal, Essentials of Electronic Testing Kluwer academic Publishers, Norwell, MA, 2000. IEEE 1149.4 Mixed-Signal Test Bus Standard web site: http://grouper.ieee.org/groups/1149/4 IEEE 1532 In-System Configuration Standard web site: http://grouper.ieee.org/groups/1532/ Agilent Technologies BSDL verification service: http://www.agilent.com/see/bsdl_service
[2] [3] [4] [5] [6]
Problems
1. What is Boundary Scan? What is the motivation of boundary scan? 2. How boundary scan technique differs from so-called bed-of-nails techniques? 3. What are the different device packaging styles? 4. What is JTAG? 5. Give an overview of the boundary scan family i.e., 1149. 6. Show boundary scan architecture and describe functions of its elements. 7. Show the basic cell of a boundary-scan register. Describe different modes of its operation. 8. A board is composed of 100 chips with 100 pins each. The length of the total scan chain is 10,000 bits. Find a possible testing strategy to reduce the scan chain length. 9. What is TAP controller? What are the main functions of TAP controller? 10. Describe a serial boundary scan chain and its operation. What are its disadvantages and discuss a strategy to overcome these. 11. Discuss different instruction sets and their functions. 12. Considering a board populated by IEEE 1149.1-compliant devices (a "pure" boundaryscan board), summarize a board-test strategy. 13. What is the goal of the infrastructure test? Is the infrastructure test mandatory or optional? Which are the main steps of an infrastructure test? 14. Consider the example depicted in the following figure.
TDO TDI A B IC1 C D IC2 E F
This circuit has two primary inputs, two primary outputs and two nets that connect the ICs one to the other. There is only 1 TAP, which connects the TDI and TDO of both ICs. Prepare a test plan for this circuit. 15. Consider a board composed of 100 40-pin Boundary-Scan devices, 2,000 interconnects, an 8-bit Instruction Register per device, a 32-bit Identification Register per device, and a 10 MHz test application rate. Compute the test time to execute a test session. 16. What is BSDL. What are the different BSDL files?
Module 8
Lesson 42
On-line Testing of Embedded Systems
After going through this lesson the student would be able to Explain the meaning of the term On-line Testing Describe the main issues in on-line testing and identify applications where on-line testing are required for embedded systems Distinguish among concurrent and non-concurrent testing and their relations with BIST and on-line testing Describe an application of on-line testing for System-on-Chip
On-line Testing of Embedded Systems 1. Introduction
EMBEDDED SYSTEMS are computers incorporated in consumer products or other devices to perform application-specific functions. The product user is usually not even aware of the existence of these systems. From toys to medical devices, from ovens to automobiles, the range of products incorporating microprocessor-based, software controlled systems has expanded rapidly since the introduction of the microprocessor in 1971. The lure of embedded systems is clear: They promise previously impossible functions that enhance the performance of people or machines. As these systems gain sophistication, manufacturers are using them in increasingly critical applications products that can result in injury, economic loss, or unacceptable inconvenience when they do not perform as required. Embedded systems can contain a variety of computing devices, such as microcontrollers, application-specific integrated circuits, and digital signal processors. A key requirement is that these computing devices continuously respond to external events in real time. Makers of embedded systems take many measures to ensure safety and reliability throughout the lifetime of products incorporating the systems. Here, we consider techniques for identifying faults during normal operation of the productthat is, online-testing techniques. We evaluate them on the basis of error coverage, error latency, space redundancy, and time redundancy.
2.
Embedded-system test issues
Cost constraints in consumer products typically translate into stringent constraints on product components. Thus, embedded systems are particularly cost sensitive. In many applications, low production and maintenance costs are as important as performance. Moreover, as people become dependent on computer-based systems, their expectations of these systems availability increase dramatically. Nevertheless, most people still expect significant downtime with computer systemsperhaps a few hours per month. People are much less patient with computer downtime in other consumer products, since the items in question did not demonstrate this type of failure before embedded systems were added. Thus, complex consumer products with high availability requirements must be quickly and easily repaired. For this reason, automobile manufacturers, among others, are increasingly providing online detection and diagnosis, capabilities previously found only in very complex and expensive applications Version 2 EE IIT, Kharagpur 3
such as aerospace systems. Using embedded systems to incorporate functions previously considered exotic in low-cost, everyday products is a growing trend. Since embedded systems are frequently components of mobile products, they are exposed to vibration and other environmental stresses that can cause them to fail. Embedded systems in automotive applications are exposed to extremely harsh environments, even beyond those experienced by most portable devices. These applications are proliferating rapidly, and their more stringent safety and reliability requirements pose a significant challenge for designers. Critical applications and applications with high availability requirements are the main candidates for online testing. Embedded systems consist of hardware and software, each usually considered separately in the design process, despite progress in the field of hardware-software co design. A strong synergy exists between hardware and software failure mechanisms and diagnosis, as in other aspects of system performance. System failures often involve defects in both hardware and software. Software does not break in the common sense of the term. However, it can perform inappropriately due to faults in the underlying hardware or specification or design flaws in either hardware or software. At the same time, one can exploit the software to test for and respond to the presence of faults in the underlying hardware. Online software testing aims at detecting design faults (bugs) that avoid detection before the embedded system is incorporated and used in a product. Even with extensive testing and formal verification of the system, some bugs escape detection. Residual bugs in well-tested software typically behave as intermittent faults, becoming apparent only in rare system states. Online software testing relies on two basic methods: acceptance testing and diversity [1]. Acceptance testing checks for the presence or absence of well-defined events or conditions, usually expressed as true-or-false conditions (predicates), related to the correctness or safety of preceding computations. Diversity techniques compare replicated computations, either with minor variations in data (data diversity) or with procedures written by separate, unrelated design teams (design diversity). This chapter focuses on digital hardware testing, including techniques by which hardware tests itself, built-in self-test (BIST). Nevertheless, we must consider the role of software in detecting, diagnosing, and handling hardware faults. If we can use software to test hardware, why should we add hardware to test hardware? There are two possible answers. First, it may be cheaper or more practical to use hardware for some tasks and software for others. In an embedded system, programs are stored online in hardware-implemented memories such as ROMs (for this reason, embedded software is sometimes called firmware). This program storage space is a finite resource whose cost is measured in exactly the same way as other hardware. A function such as a test is soft only in the sense that it can easily be modified or omitted in the final implementation. The second answer involves the time that elapses between a faults occurrence and a problem arising from that fault. For instance, a fault may induce an erroneous system state that can ultimately lead to an accident. If the elapsed time between the faults occurrence and the corresponding accident is short, the fault must be detected immediately. Acceptance tests can detect many faults and errors in both software and hardware. However, their exact fault coverage is hard to measure, and even when coverage is complete, acceptance tests may take a long time to detect some faults. BIST typically targets relatively few hardware faults, but it detects them quickly. These two issues, cost and latency, are the main parameters in deciding whether to use hardware or software for testing and which hardware or software technique to use. This decision requires system-level analysis. We do not consider software methods here. Rather, we emphasize the appropriate use of widely implemented BIST methods for online hardware testing. These methods are components in the hardware-software trade-off. Version 2 EE IIT, Kharagpur 4
3.
Online testing
Faults are physical or logical defects in the design or implementation of a digital device. Under certain conditions, they lead to errorsthat is, incorrect system states. Errors induce failures, deviations from appropriate system behavior. If the failure can lead to an accident, it is a hazard. Faults can be classified into three groups: design, fabrication, and operational. Design faults are made by human designers or CAD software (simulators, translators, or layout generators) during the design process. Fabrication defects result from an imperfect manufacturing process. For example, shorts and opens are common manufacturing defects in VLSI circuits. Operational faults result from wear or environmental disturbances during normal system operation. Such disturbances include electromagnetic interference, operator mistakes, and extremes of temperature and vibration. Some design defects and manufacturing faults escape detection and combine with wear and environmental disturbances to cause problems in the field. Operational faults are usually classified by their duration: Permanent faults remain in existence indefinitely if no corrective action is taken. Many are residual design or manufacturing faults. The rest usually occur during changes in system operation such as system start-up or shutdown or as a result of a catastrophic environmental disturbance such as a collision. Intermittent faults appear, disappear, and reappear repeatedly. They are difficult to predict, but their effects are highly correlated. When intermittent faults are present, the system works well most of the time but fails under atypical environmental conditions. Transient faults appear and disappear quickly and are not correlated with each other. They are most commonly induced by random environmental disturbances. One generally uses online testing to detect operational faults in computers that support critical or high-availability applications. The goal of online testing is to detect fault effects, or errors, and take appropriate corrective action. For example, in some critical applications, the system shuts down after an error is detected. In other applications, error detection triggers a reconfiguration mechanism that allows the system to continue operating, perhaps with some performance degradation. Online testing can take the form of external or internal monitoring, using either hardware or software. Internal monitoring, also called self-testing, takes place on the same substrate as the circuit under test (CUT). Today, this usually means inside a single ICa system on a chip. There are four primary parameters to consider in designing an online-testing scheme: error coveragethe fraction of modeled errors detected, usually expressed as a percentage. Critical and highly available systems require very good error coverage to minimize the probability of system failure. error latencythe difference between the first time an error becomes active and the first time it is detected. Error latency depends on the time taken to perform a test and how often tests are executed. A related parameter is fault latency, the difference between the onset of the fault and its detection. Clearly, fault latency is greater than or equal to error latency, so when error latency is difficult to determine, test designers often consider fault latency instead. space redundancythe extra hardware or firmware needed for online testing. time redundancythe extra time needed for online testing. The ideal online-testing scheme would have 100% error coverage, error latency of 1 clock cycle, no space redundancy, and no time redundancy. It would require no redesign of the CUT and impose no functional or structural restrictions on it. Most BIST methods meet some of these constraints without addressing others. Considering all four parameters in the design of an onlineVersion 2 EE IIT, Kharagpur 5
testing scheme may create conflicting goals. High coverage requires high error latency, space redundancy, and/or time redundancy. Schemes with immediate detection (error latency equaling 1) minimize time redundancy but require more hardware. On the other hand, schemes with delayed detection (error latency greater than 1) reduce time and space redundancy at the expense of increased error latency. Several proposed delayed-detection techniques assume equiprobability of input combinations and try to establish a probabilistic bound on error latency [2]. As a result, certain faults remain undetected for a long time because tests for them rarely appear at the CUTs inputs. To cover all the operational fault types described earlier, test engineers use two different modes of online testing: concurrent and non-concurrent. Concurrent testing takes place during normal system operation, and non-concurrent testing takes place while normal operation is temporarily suspended. One must often overlap these test modes to provide a comprehensive online-testing strategy at acceptable cost.
4.
Non-concurrent testing
This form of testing is either event-triggered (sporadic) or time-triggered (periodic) and is characterized by low space and time redundancy. Event triggered testing is initiated by key events or state changes such as start-up or shutdown, and its goal is to detect permanent faults. Detecting and repairing permanent faults as soon as possible is usually advisable. Eventtriggered tests resemble manufacturing tests. Any such test can be applied online, as long as the required testing resources are available. Typically, the hardware is partitioned into components, each exercised by specific tests. RAMs, for instance, are tested with manufacturing tests such as March tests [3]. Time-triggered testing occurs at predetermined times in the operation of the system. It detects permanent faults, often using the same types of tests applied by event-triggered testing. The periodic approach is especially useful in systems that run for extended periods during which no significant events occur to trigger testing. Periodic testing is also essential for detecting intermittent faults. Such faults typically behave as permanent faults for short periods. Since they usually represent conditions that must be corrected, diagnostic resolution is important. Periodic testing can identify latent design or manufacturing flaws that appear only under certain environmental conditions. Time-triggered tests are frequently partitioned and interleaved so that only part of the test is applied during each test period.
5.
Concurrent testing
Non-concurrent testing cannot detect transient or intermittent faults whose effects disappear quickly. Concurrent testing, on the other hand, continuously checks for errors due to such faults. However, concurrent testing is not particularly useful for diagnosing the source of errors, so test designers often combine it with diagnostic software. They may also combine concurrent and non-concurrent testing to detect or diagnose complex faults of all types. A common method of providing hardware support for concurrent testing, especially for detecting control errors, is a watchdog timer [4]. This is a counter that the system resets repeatedly to indicate that the system is functioning properly. The watchdog concept assumes that the system is fault-freeor at least aliveif it can reset the timer at appropriate intervals. The ability to perform this simple task implies that control flow is correctly traversing timer-reset points. One can monitor system sequencing very precisely by guarding the watchdog- reset operations with software-based acceptance tests that check signatures computed while control Version 2 EE IIT, Kharagpur 6
flow traverses various checkpoints. To implement this last approach in hardware, one can construct more complex hardware watchdogs. A key element of concurrent testing for data errors is redundancy. For example, the duplication-with-comparison (DWC) technique5 detects any single error at the expense of 100% space redundancy. This technique requires two copies of the CUT, which operate in tandem with identical inputs. Any discrepancy in their outputs indicates an error. In many applications, DWCs high hardware overhead is unacceptable. Moreover, it is difficult to prevent minor timing variations between duplicated modules from invalidating comparison. A possible lower-cost alternative is time redundancy. A technique called double execution, or retry, executes critical operations more than once at diverse time points and compares their results. Transient faults are likely to affect only one instance of the operation and thus can be detected. Another technique, re-computing with shifted operands (RESO) [5] achieves almost the same error coverage as DWC with 100% time redundancy but very little space redundancy. However, no one has demonstrated the practicality of double execution and RESO for online testing of general logic circuits. A third, widely used form of redundancy is information redundancythe addition of redundant coded information such as a parity-check bit[5]. Such codes are particularly effective for detecting memory and data transmission errors, since memories and networks are susceptible to transient errors. Coding methods can also detect errors in data computed during critical operations.
6.
Built-in self-test
For critical or highly available systems, a comprehensive online-testing approach that covers all expected permanent, intermittent, and transient faults is essential. In recent years, BIST has emerged as an important method of testing manufacturing faults, and researchers increasingly promote it for online testing as well. BIST is a design-for-testability technique that places test functions physically on chip with the CUT, as illustrated in Figure 42.1. In normal operating mode, the CUT receives its inputs from other modules and performs the function for which it was de-signed. In test mode, a test pattern generator circuit applies a sequence of test patterns to the CUT, and a response monitor evaluates the test responses. In the most common type of BIST, the response monitor compacts the test responses to form fault signatures. It compares the fault signatures with reference signatures generated or stored on chip, and an error signal indicates any discrepancies detected. We assume this type of BIST in the following discussion. In developing a BIST methodology for embedded systems, we must consider four primary parameters related to those listed earlier for online-testing techniques: fault coveragethe fraction of faults of interest that the test patterns produced by the test generator can expose and the response monitor can detect. Most monitors produce a faultfree signature for some faulty response sequences, an undesirable property called aliasing. test set sizethe number of test patterns produced by the test generator. Test set size is closely linked to fault coverage; generally, large test sets imply high fault coverage. However, for online testing, test set size must be small to reduce fault and error latency. hardware overheadthe extra hardware needed for BIST. In most embedded systems, high hardware overhead is not acceptable.
performance penaltythe impact of BIST hardware on normal circuit performance, such as worst-case (critical) path delays. Overhead of this type is sometimes more important than hardware overhead.
System designers can use BIST for non-concurrent, online testing of a systems logic and memory[6]. They can readily configure the BIST hardware for event-triggered testing, tying the BIST control to the system reset so that testing occurs during system start-up or shutdown. BIST can also be designed for periodic testing with low fault latency. This requires incorporating a test process that guarantees the detection of all target faults within a fixed time. Designers usually implement online BIST with the goals of complete fault coverage and low fault latency. Hence, they generally design the test generator and the response monitor to guarantee coverage of specific fault models, minimum hardware overhead, and reasonable test set size. Different parts of the system meet these goals by different techniques. Test generator and response monitor implementations often consist of simple, counter like circuits; especially linear- feedback shift registers [5]. An LFSR is formed from standard flipflops, with outputs of selected flip-flops being fed back (modulo 2) to its inputs. When used as a test generator, an LFSR is set to cycle rapidly through a large number of its states. These states, whose choice and order depend on the LFSRs design parameters, define the test patterns. In this mode of operation, an LFSR is a source of pseudorandom tests that are, in principle, applicable to any fault and circuit types. An LFSR can also serve as a response monitor by counting (in a special sense) the responses produced by the tests. After receiving a sequence of test responses, an LFSR response monitor forms a fault signature, which it compares to a known or generated good signature to determine whether a fault is present. Ensuring that fault coverage is sufficiently high and the number of tests is sufficiently low are the main problems with random BIST methods. Researchers have proposed two general approaches to preserve the cost advantages of LFSRs while greatly shortening the generated test sequence. One approach is to insert test points in the CUT to improve controllability and observability. However, this approach can result in performance loss. Alternatively, one can introduce some determinism into the generated test sequencefor example, by inserting specific seed tests known to detect hard faults. Some CUTs, including data path circuits, contain hard-to detect faults that are detectable by only a few test patterns, denoted Thard. An N-bit LSFR can generate a sequence that eventually includes 2N - 1 patterns (essentially all possibilities). However, the probability that the tests in Thard will appear early in the sequence is low. In such cases, one can use deterministic testing, which tailors the generated test sequence to the CUTs functional properties, instead of random testing. Deterministic testing is especially suited to RAMs, ROMs, and other highly regular components. A deterministic technique called transparent BIST [3] applies BIST to RAMs while preserving the RAM contentsa particularly desirable feature for online testing. Keeping hardware overhead acceptably low is the main difficulty with deterministic BIST. A straightforward way to generate a specific test set is to store it in a ROM and address each stored test pattern with a counter. Unfortunately, ROMs tend to be much too expensive for storing entire test sequences. An alternative method is to synthesize a finite-state machine that directly generates the test set. However, the relatively large test set size and test vector width, as well as the test sets irregular structure, are much more than current FSM synthesis programs can handle. Another group of test generator design methods, loosely called deterministic, attempt to embed a complete test set in a specific generated sequence. Again the generated tests must meet the coverage, overhead, and test size constraints weve discussed. An earlier article [7] presents a representative BIST design method for data path circuits that meets these requirements. The test Version 2 EE IIT, Kharagpur 8
generators structure, based on a twisted-ring counter, is tailored to produce a regular, deterministic test sequence of reasonable size. One can systematically rescale the test generator as the size of anon-bit-sliced data path CUT, such as a carry-look-ahead adder, changes. Instead of using an LFSR, a straightforward way to compress test response data and produce a fault signature is to use an FSM or an accumulator. However, FSM hardware overhead and accumulator aliasing are difficult parameters to control. Keeping hardware overhead acceptably low and reducing aliasing are the main difficulties in response monitor design.
Inputs Test pattern sequence Test generator Multiplexer Circuit under test (CUT) Response monitor Error
Outputs
Control
Fig. 42.1 A General BIST Scheme
An Example
IEEE 1149.4 based Architecture for OLT of a Mixed Signal SoC Analog/mixed signal blocks like DCDC converters, PLLs, ADCs, etc. and digital modules like application specific processors, micro controllers, UATRs, bus controllers etc. typically exist in SoCs. The have been used as cores of the SoC benchmark Controller for Electro-Hydraulic Actuators which is being used as the case study. It is to be noted that this case study is used only for illustration and the architecture is generic which applies for all Mixed Signal SoCs. All the digital blocks like instruction specific processor, microcontroller, bus controller etc. have been designed with OLT capability using the CAD tool descried in [8]. Further, all these digital cores are IEEE 1149.1 compliant. In other words, all the digital cores are designed with a blanket comprising an on-line monitor and IEEE 1149.1 compliance circuitry. For the analog modules the observer have been designed using ADCs and digital logic [9]. The test blanket for the analog/mixed signal cores comprises IEEE 1149.4 circuitry. A dedicated test controller is designed and placed on-chip that schedules the various lines tests during the operation of the SoC. The block diagram of the SoC being used as the case study is illustrated in Figure 42.2. The basic functionality of the SoC under consideration is discussed below.
Electronic Controller Electro Hydraulic system

Actuator systems are vital in the flight control system, providing the motive force necessary to move the flight control surfaces. Hydraulic actuators are very common in space vehicle and flight control systems, where force/ weight consideration is very much important. This system positions the control surface of aircraft meeting performance requirement which acting against external loads. The actuator commands are processed in four identical analog servo loops, which command the four coils of force motor driving the hydraulic servo valve used to control the Version 2 EE IIT, Kharagpur 9
motion of the dual tandem hydraulic jack. The motion of the spool of the hydraulic servo valve (Master control Valve), regulates the flow of oil to the tandem jacks, thereby determine the ram position. The Spool and ram positions are controlled by means of feedback loops. The actuator system is controlled by the on-board flight electronics. A lot of work has been done for On-line fault detection and diagnosis of the mechanical system, however OLT of the electronic systems were hardly looked into. It is to be noted that as Electro Hydraulic Actuators are mainly used in mission critical systems like avionics; for reliable operation on-line fault detection and diagnosis is required for both the mechanical and the electronic sub-systems. The IEEE 1149.1 and 1149.4 circuitry are utilized to perform the BIST of the interconnecting buses in between the cores. It may be noted that on-line tests are carried only for cores, which are more susceptible to failures. However, the interconnecting buses are tested during startup and at intervals when cores being connected by them are ideal. The test scheduling logic can be designed as suggested in [10]. The following three classes of tests are carried in the SoC:
1. Interconnect test of the interconnecting buses (BIST)

Interconnect testing is to detect open circuits in the interconnect betweens the cores, and to detect and diagnose bridging faults anywhere in the Interconnect --regardless of whether they are normally carry digital or analog signals. This test is performed by EXTEST instruction and digital test patterns are generated from the pre-programmed test controller.
2. Parametric test of the interconnecting buses (BIST)

Parametric test: Parametric test permits analog measurements using analog stimulus and responses. This test is also performed by EXTEST instruction. For this only three values of analog voltages viz., VH=VDD, VLow=VDD/3, VG= VSS are given as test inputs by the controller and the voltages at the output of the line under test is sampled after one bit coarse digitization as mentioned in the IEEE 1149.4 standard
3. Internal test of the cores (Concurrent tests)

This test is performed by INTEST instruction and this enables the on-line monitors placed on each of the cores present in the SoC. This test can be enabled concurrently with the SoC operation and need not be synchronized to start up of the normal operation of the SoC. The asynchronous startup/shutdown of the on-line testers facilitates power saving and higher reliability of the test circuitry if compared to the functional circuit.
7.
References
1) M.R. Lyu, ed., Software Fault Tolerance, John Wiley & Sons, New York, 1995. 2) K.K. Saluja, R. Sharma, and C.R. Kime, A Concurrent Testing Technique for Digital Circuits, IEEE Trans. Computer-Aided Design, Vol. 7, No. 12, Dec. 1988, pp. 12501259. 3) M. Nicolaidis, Theory of Transparent BIST for RAMs, IEEE Trans. Computers, Vol. 45, No. 10, Oct. 1996, pp. 1141-1156.
4) A. Mahmood and E. McCluskey, Concurrent Error Detection Using Watchdog ProcessorsA Survey, IEEE Trans. Computers, Vol. 37, No. 2, Feb. 1988, pp. 160-174. 5) B.W. Johnson, Design and Analysis of Fault Tolerant Digital Systems, Addison-Wesley, Reading, Mass., 1989. 6) B.T. Murray and J.P. Hayes, Testing ICs: Getting to the Core of the Problem, Computer, Vol. 29, No. 11, Nov. 1996, pp. 32-45. 7) H. Al-Asaad, J.P. Hayes, and B.T. Murray, Scalable Test Generators for High-Speed Data Path Circuits, J. Electronic Testing: Theory and Applications, Vol. 12, No. 1/2, Feb./Apr. 1998, pp. 111-125 (reprinted in On-Line Testing for VLSI, M. Nicolaidis, Y. Zorian, and D.K. Pradhan, eds., Kluwer, Boston, 1998). 8) A Formal Approach to On-Line Monitoring of Digital VLSI Circuits: Theory, Design and Implementation, Biswas, S Mukhopadhyay, A Patra, Journal of Electronic Testing: Theory and Applications, Vol. 20, October 2005, pp-503-537. 9) S. Biswas, B Chatterjee, S Mukhopadhyay, A Patra, A Novel Method for On-Line Testing of Mixed Signal System On a Chip: A Case study of Base Band Controller, 29th National System Conference, IIT Mumbai, INDIA 2005, pp 2.1-2.23. 10) An Optimal Test Sequence for the JTAG/IEEE P1149.1 Test Access Port Controller, A.T. Dahbura, M.U. Uyar, Chi. W. Yau, International Test Conference, USA, 1998, pp 55-62.
DATA RAM 16kB
Application Specific Processor
XTAL Timing Clock Divider
System Bus Interface System BUS
ADC DAC
TDI TMS TCK TDO VH VL VG
On Chip Test Controller (JTAG Interface)
Electro Hydraulic Actuator System (Simulation in Lab-View in a PC)
AB1 AB2
Power supply to the cores Data and Control paths IEEE 1149.4/1149.1 Boundary Scan Bus Analog Buses (1149.4) AB1 and AB2 Digital Cores with on line Digital monitors [6] (FPGA) Analog/Mixed Signal Cores with Along Monitors [3] (ASIC) Program running in PC and data I/O using cards HILS
DC/DC Converter Battery & Charger
Fig. 42.2 Block Diagram of the SOC Representing On-Line Test Capability

Complete Embedded System

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Complete Embedded System

Uploaded by

Copyright:

Available Formats

Module 1

Example, Definitions, Common Architecture Instructional Objectives

Version 2 EE IIT, Kharagpur 3

Fig. 1.1 Mobile Phones

Version 2 EE IIT, Kharagpur 4

133.7 x 69.7 x 20.2mm 137g

Integrated Digital Camera 1 M Pixel

Polyphonic and MP3

3.4 MB user memory built in.

Version 2 EE IIT, Kharagpur 5

Version 2 EE IIT, Kharagpur 6

Characteristics of an Rtes Single-Functioned

Reactive and Real Time

Version 2 EE IIT, Kharagpur 7

Common Architecture of Real Time Embedded Systems

Version 2 EE IIT, Kharagpur 8

Hard Disk drive

Interface Cables Mother Board

Version 2 EE IIT, Kharagpur 10

RTOS User Interface Process System Bus

DSP assembly code

Digital Signal Processor

Digital Signal Processor

DSP assembly code

CODEC Hardware Software

Version 2 EE IIT, Kharagpur 12

Questions and Answers

Definition of Real Time Systems

Version 2 EE IIT, Kharagpur 14

Version 2 EE IIT, Kharagpur 15

Structure and Design Instructional Objectives

Common Examples Of Embedded Systems

Fig. 2.1(a) Digital Camera

Version 2 EE IIT, Kharagpur 3

Fig. 2.1(b) Camcorder

Fig. 2.1(c) Personal Digital Assistants

Fig. 2.1(d) Microwave Oven

Fig. 2.1(e) Washer and Dryers

Version 2 EE IIT, Kharagpur 4

office automation fax machines, copiers, printers, and scanners

Fig. 2.1(f) Fax cum printer cum copier

Fig. 2.1(g) Electronic Cash Registers

Fig. 2.1(h)Electronic Card Readers

Fig. 2.1(i)Automated Teller Machines

Version 2 EE IIT, Kharagpur 5

Fig. 2.1(j)ECU of a Vehicle

Version 2 EE IIT, Kharagpur 6

Fig. 2.2 The Cell Phone Circuitry

RF receiver (Rx) DSP Antenna RF transmitter (Tx)

Components of an Embedded System

3. Input Output Devices and Interfaces

Version 2 EE IIT, Kharagpur 8

NRE cost (nonrecurring engineering cost)

The Performance Design Metric

The two main measures of performance are:

Latency or response time

Design Methodology (Fig. 2.4)

System level Design

Sub-system or Node Level design

Processor Level Design

Task Level Design

Overall System specifications

System level design Node Level Specifications