You are on page 1of 61

UNIT II PROGRAMMING FOR EMBEDDED SYSTEMS The advantages of C C is one of a large number of high-level languages designed for general-purpose

programming, in other words, for writing anything from small programs for personal amusement to complex industrial applications. C has many advantages:

Before C, machine-language programmers criticized high-level languages because, with their black box approach, they shielded the user from the working details of the computer and all its facilities. C, however, was designed to give access to any level of the computer down to raw machine language, and because of this, it is perhaps the most flexible high-level language. C has features that allow the programmer to organize programs in a clear, easy, logical way. For example, C allows meaningful names for variables without any loss of efficiency, yet it gives a complete freedom of programming style, including flexible ways of making decisions, and a set of flexible commands for performing tasks repetitively (for, while, do). C is succinct. It permits the creation of tidy, compact programs. This feature can be a mixed blessing, however, and the C programmer must balance simplicity and readability. C allows commands that are invalid in other languages. This is no defect, but a powerful freedom which, when used with caution, makes many things possible. It does mean that there are concealed difficulties in C, but if you write carefully and thoughtfully, you can create fast, efficient programs. With C, you can use every resource your computer offers. C tries to link closely with the local environment, providing facilities for gaining access to common peripherals like disk drives and printers. When new peripherals are invented, the GNU community quickly provides the ability to program them in C as well. In fact, most of the GNU project is written in C (as are many other operating systems).

DATA TYPES : Like most programming languages, C is able to use and process named variables and their contents. Variables are simply names used to refer to some location in memory a location that holds a value with which we are working. It may help to think of variables as a placeholder for a value. You can think of a variable as being equivalent to its assigned value. So, if you have a variable i that is initialized (set equal) to 4, then it follows that i+1 will equal 5. Since C is a relatively low-level programming language, before a C program can utilize memory to store a variable it must claim the memory needed to store the values for a variable. This is done by declaring variables. Declaring variables is the way in which a C program shows the number of variables it needs, what they are going to be named, and how much memory they will need.

All variables in C are typed. That is, every variable declared must be assigned as a certain type of variable. Declaring variables All type declarations using the above simple types follow a standard format type_name variable_name; where type_name represents one of the C type names, and variable_name would be replaced by the programmer with a name for the variable. According to the C standards there are a few restrictions on variable names. A variable name:

must be at least one character long

must be less than some maximum character length. 32 on some systems, 256 on others, and probably everything in between on yet others.

must start with a letter must be composed of letters, numbers and/or the underscore character must not contain spaces

must not be equal to reserved words such as "int", "char", "float", et cetera. Any keyword used for the C language itself is off-limits. is case sensitive. This means that "NUMBER", "number", "Number", "numbeR", and "NuMbEr" are all different variables.

For example, if we want to declare a variable called number and of type int, we write int number; In C, all declarations are terminated by semicolons just as if they were statements. If we wish to declare a number of variables of the same type, we can write a comma separated list of variable names after the type. For example: int variable1, variable2, variable3; which declares all variable1, variable2, and variable3 to be of type integer.

Literals Anytime within a program in which you specify a value explicitly instead of referring to a variable or some other form of data, that value is referred to as a literal. In the initialization example above, 3 is a literal. Literals can either take a form defined by their type (more on that soon), or one can use hexadecimal (hex) notation to directly insert data into a variable

regardless of its type.[citation needed] Hex numbers are always preceded with 0x. For now, though, you probably shouldn't be too concerned with hex. The Four Basic Types In Standard C there are four basic data types. They are int, char, float, and double. . The int type The int type stores integers in the form of "whole numbers". An integer is typically the size of one machine word, which on most modern home PCs is 32 bits (4 octets). Examples of literals are whole numbers (integers) such as 1,2,3, 10, 100... When int is 32 bits (4 octets), it can store any whole number (integer) between -2147483648 and 2147483647. A 32 bit word (number) has the possibility of representing any one number out of 4294967296 possibilities (2 to the power of 32). If you want to declare a new int variable, use the int keyword. For example: int numberOfStudents, i, j=5; In this declaration we declare 3 variables, numberOfStudents, i and j, j here is assigned the literal 5. The char type The char type is capable of holding any member of the execution character set. It stores the same kind of data as an int (i.e. integers), but always has a size of one byte. The size of a byte is specified by the macro CHAR_BIT which specifies the number of bits in a char (byte). In standard C it never can be less than 8 bits. A variable of type char is most often used to store character data, hence its name. Most implementations use the ASCII character set as the execution character set, but it's best not to know or care about that unless the actual values are important. Examples of character literals are 'a', 'b', '1', etc., as well as some special characters such as '\0' (the null character) and '\n' (newline, recall "Hello, World"). Note that the char value must be enclosed within single quotations. When we initialize a character variable, we can do it two ways. One is preferred, the other way is bad programming practice. The first way is to write char letter1 = 'a'; This is good programming practice in that it allows a person reading your code to understand that letter1 is being initialized with the letter 'a' to start off with.

The second way, which should not be used when you are coding letter characters, is to write char letter2 = 97; /* in ASCII, 97 = 'a' */ This is considered by some to be extremely bad practice, if we are using it to store a character, not a small number, in that if someone reads your code, most readers are forced to look up what character corresponds with the number 97 in the encoding scheme. In the end, letter1 and letter2 store both the same thing the letter "a", but the first method is clearer, easier to debug, and much more straightforward. One important thing to mention is that characters for numerals are represented differently from their corresponding number, i.e. '1' is not equal to 1. There is one more kind of literal that needs to be explained in connection with chars: the string literal. A string is a series of characters, usually intended to be displayed. They are surrounded by double quotations (" ", not ' '). An example of a string literal is the "Hello, world!\n" in the "Hello, World" example. The float type float is short for floating point. It stores real numbers also, but is only one machine word in size. Therefore, it is used when less precision than a double provides is required. floatliterals must be suffixed with F or f, otherwise they will be interpreted as doubles. Examples are: 3.1415926f, 4.0f, 6.022e+23f. float variables can be declared using the float keyword. The double type The double and float types are very similar. The float type allows you to store singleprecision floating point numbers, while the double keyword allows you to store doubleprecision floating point numbers real numbers, in other words, both integer and non-integer values. Its size is typically two machine words, or 8 bytes on most machines. Examples ofdouble literals are 3.1415926535897932, 4.0, 6.022e+23 (scientific notation). If you use 4 instead of 4.0, the 4 will be interpreted as an int. The distinction between floats and doubles was made because of the differing sizes of the two types. When C was first used, space was at a minimum and so the judicious use of a float instead of a double saved some memory. Nowadays, with memory more freely available, you do not really need to conserve memory like this it may be better to use doubles consistently. Indeed, some C implementations use doubles instead of floats when you declare a float variable. If you want to use a double variable, use the double keyword. Data type modifiers One can alter the data storage of any data type by preceding it with certain modifiers.

auto Unnecessary for local variables. Compare with static. const Allocates memory in ROM. extern Flags the reference for later resolution from within a library. far Depends upon addressing scheme of target. near Depends upon addressing scheme of target. signed Generates extra code compared with unsigned. static Preserves local variable across function calls. unsigned Creates significant savings in generated code. volatile (No specific notes; consult the ISO standard for more information)

Using the const keyword The const keyword helps eradicate magic numbers. By declaring a variable const corn at the beginning of a block, a programmer can simply change that const and not have to worry about setting the value elsewhere. There is also another method for avoiding magic numbers. It is much more flexible than const, and also much more problematic in many ways. It also involves the preprocessor, as opposed to the compiler. Behold... #define When you write programs, you can create what is known as a macro, so when the computer is reading your code, it will replace all instances of a word with the specified expression. Here's an example. If you write #define PRICE_OF_CORN 0.99 when you want to, for example, print the price of corn, you use the word PRICE_OF_CORN instead of the number 0.99 the preprocessor will replace all instances of PRICE_OF_CORNwith 0.99, which the compiler will interpret as the literal double 0.99. The preprocessor performs substitution, that is, PRICE_OF_CORN is replaced by 0.99 so this means there is no need for a semicolon. It is important to note that #define has basically the same functionality as the "find-andreplace" function in a lot of text editors/word processors. For some purposes, #define can be harmfully used, and it is usually preferable to use const if #define is unnecessary. It is possible, for instance, to #define, say, a macro DOGas the number 3, but if you try to print the macro, thinking that DOG represents a string that you can show on the screen, the program will have an error. #define also has no regard for type. It disregards the structure of your program, replacing the text everywhere (in effect, disregarding scope), which could be advantageous in some circumstances, but can be the source of problematic bugs.

You will see further instances of the #define directive later in the text. It is good convention to write #defined words in all capitals, so a programmer will know that this is not a variable that you have declared but a #defined macro.

Bit Manipulation In C and C++, values are represented as binary values. The exact values will vary from computer to computer, but the most common sizes at the time of this writing are 127 for char, 32767 for short, and 2147483647 for int and long. The unsigned values are 255, 65535, and 4294967295 respectively. Now, these values are confusing to many programmers, much less non-programmers. They are obviously one less than powers of two, but when seen in source code they appear to be magic numbers. An easier way to represent these values is with the hexadecimal numbering system which shows the binary structure more clearly than decimal values. In hexadecimal, each digit corresponds to four bits of the binary value, with values from 10 to 15 being represented by the letters A through F. So the values 0x7F and 0xFF are the hexadecimal equivalent to the decimal values for char shown above, signed and unsigned respectively. You'll notice that the hexadecimal number has two digits (the 0x merely states that the number is in hexadecimal format), each of these digits correspond to four bits of the binary value. 255 in binary is 1111 1111 We know that 1111 in decimal is 15, which is 0xF in hexadecimal. So to convert the binary value to hexadecimal, simply replace every four bits with the corresponding hexadecimal digit: 1111 = F 1111 = F -------0xFF Signed and Unsigned Integer values come in two flavors in C and C++, signed and unsigned. Unsigned values are represented by a format where each bit represents a power of two, each position has a weight (1, 2, 4, 8, 16, 32, etc..) and the value of the number is determined by adding the weights of each position whose bit is set to 1. A binary value of 0000 0010 is valued at 2 since the weight of the second position is 2 and no other bits are set to 1. Signed values are more complicated because they must also be able to represent negative numbers. There are many different ways to go about this, increasing the confusion. The more common ways include one's complement, two's complement, and sign-magnitude. All of

these methods use a particular bit to mark the sign of the value, the sign is whether the value is positive or negative, 0 is positive and 1 is negative. Each method goes about marking the sign in different ways: One's complement - This method inverts all of the bits corresponding to the positive number to create the negative number. Ex. --1 - 00000001 -1 - 11111110 Two's complement - This method performs a one's complement, but also adds one to the resulting number. Ex. --1 - 00000001 -1 - 11111111 Sign-magnitude - This method simply toggles the sign bit. Ex. --1 - 00000001 -1 - 10000001 Because of the different methods of calculating the signed-ness of a value and other complications when manipulating signed bits, it is highly recommended that unsigned values are used when working with individual bits, all of the code below will be using unsigned values to avoid many of the problems that can occur. We will also restrict ourselves to unsigned int as the smallest type because many of the bit operations promote char and short values to int. Even if the char and short were unsigned to begin with, the promotion could make the value signed, which is just begging for trouble. Bit Operations C and C++ programmers have several tools to work with bits effectively, but they appear arcane at first. We will be spending a little bit of time on what each of the operations does and how they can be chained together to manipulate bits in a simple and effective manner. There are six operators that C and C++ support for bit manipulation: & Bitwise AND | Bitwise OR ^ Bitwise Exclusive-OR << Bitwise left shift >> Bitwise right shift

~ Bitwise complement The bitwise AND tests two binary numbers and returns bit values of 1 for positions where both numbers had a one, and bit values of 0 where both numbers did not have one: 01001011 00010101 & -------00000001 Notice that a 0,0 combination being tested results in 0, as does a 1,0 combination. Only a 1,1 combination results in a binary 1 in the resulting value. The bitwise AND is often used to mask a set of bits for testing. The bitwise OR tests two binary numbers and returns bit values of 1 for positions where either bit or both bits are one, the result of 0 only happens when both bits are 0: 01001011 00010101 | -------01011111 Notice that a 1,0 combination being tested results in 1, as does a 1,1 combination. Only a 0,0 combination results in a binary 0 in the resulting value. The bitwise OR is used to turn bits on if they were off. The bitwise Exclusive-OR tests two binary numbers and returns bit values of 1 for positions where both bits are different, if they are the same then the result is 0: 01001011 00010101 ^ -------01011110 The bitwise left shift moves all bits in the number to the left and fills vacated bit positions with 0. 01001011 2 <<

-------00101100 Shifting is very useful for dealing with individual bits in a binary number. If you want to affect every bit position then instead of working out which bit position with each new bit, simply shift to the next bit and work with the same bit position. The bitwise right shift moves all bits in the number to the right. 01001011 2 >> -------??010010 Note the use of ? for the fill bits. Where the left shift filled the vacated positions with 0, a right shift will do the same only when the value is unsigned. If the value is signed then a right shift will fill the vacated bit positions with the sign bit or 0, which one is implementationdefined. So the best option is to never right shift signed values. The bitwise complement inverts the bits in a single binary number. ~01001011 --------10110100 The binary complement operator is unary, meaning it is only used on a single number (~num) instead of two numbers like the previous binary operators (num1 & num2, num1 << num2). The bitwise operators in C and C++ can be chained together and used for a huge number of operations, for example, if you wanted to clear the lowest order 1 bit you would say something like val & ( val - 1 ). To clear all 1 bits except for the lowest bit, the statement could be changed to val & -val. There are many different combinations that can be used to do just about anything with a binary number. Following are two functions which will help in playing around with the operators to figure out just how they work. A tutorial will never be able to explain such operations adequately, so you are encouraged to try things out for yourself. The following two functions will reverse the bits in a number and print all of the bits to an output stream. The print function prints the bits in reverse order, so the reversal function can be used to improve readability of the output: The rev_bits function is a template function which can be used with any reasonable type for bit manipulation, it works by calculating the number of bits in the type passed to it by multiplying the size of the type by CHAR_BIT from <climits>. It then copies val to ret in reverse simply by copying the lowest order bit of val to ret and then shifting ret left by one, then shifts val right by one. The sequence is as follows:

Start: val ret --- --1101 0000 copy: 1101 0001 shift: 0110 0010 copy: 0110 0010 shift: 0011 0100 copy: 0011 0101 shift: 0001 1010 copy: 0001 1011 shift: 0000 no_shift End: val ret --- --0000 1011 OPERATORS IN C Variables and constants can be used in conjunction with C operators to create more complex expressions. Table 6-1 presents the set of C operators. C Operators Operator Example () f() [] a[10] -> s->a . s.a + [unary] +a - [unary] -a * [unary] *a & [unary] &a Description/Meaning Function call Array reference Structure and union member selection Structure and union member selection Value of a Negative of a Reference to object at address a Address of a

~ ++ [prefix] ++ [postfix] - - [prefix] - - [postfix] sizeof sizeof

~a One's complement of a ++a The value of a after increment a++ The value of a before increment -a The value of a after decrement aThe value of a before decrement sizeof (t1) Size in bytes of object with type t1 sizeof e Size in bytes of object having the type of expression e a + ba plus b + [binary] a ba minus b [binary] a * ba times b * [binary] a / ba divided by b /% a%b Remainder of a/b >> a >> b a, right-shifted b bits << a << b a, left-shifted b bits < a < b1 if a < b; 0 otherwise > a > b1 if a > b; 0 otherwise <= a <= b 1 if a <= b; 0 otherwise >= a >= b 1 if a >= b; 0 otherwise == a == b 1 if a equal to b; 0 otherwise != a != b 1 if a not equal to b; 0 otherwise & [binary] a & b Bitwise AND of a and b | a | b Bitwise OR of a and b ^ a^b Bitwise XOR (exclusive OR) of a and b && a && b Logical AND of a and b (yields 0 or 1) || a || b Logical OR of a and b (yields 0 or 1) ! !a Logical NOT of a (yields 0 or 1) Expression e1 if a is nonzero; ?: a ? e1 : e2 Expression e2 if a is zero = a = b a, after b is assigned to it += a += b a plus b (assigned to a) -= a -= b a minus b (assigned to a) *= a *= b a times b (assigned to a) /= a /= b a divided by b (assigned to a) %= a %= b Remainder of a/b (assigned to a) >>= a >>= b a, right-shifted b bits (assigned to a) <<= a <<= b a, left-shifted b bits (assigned to a) &= a &= b a AND b (assigned to a) |= a |= b a OR b (assigned to a) ^= a ^= b a XOR b (assigned to a) , e1,e2 e2 (e1 evaluated first) The C operators fall into the following categories:

Postfix operators, which follow a single operand. Unary prefix operators, which precede a single operand. Binary operators, which take two operands and perform a variety of arithmetic and logical operations.

The conditional operator (a ternary operator), which takes three operands and evaluates either the second or third expression, depending on the evaluation of the first expression. Assignment operators, which assign a value to a variable. The comma operator, which guarantees left-to-right evaluation of comma-separated expressions.

Operator precedence determines the grouping of terms in an expression. This affects how an expression is evaluated. Certain operators have higher precedence than others; for example, the multiplication operator has higher precedence than the addition operator: x = 7 + 3 * 2; /* x is assigned 13, not 20 */ The previous statement is equivalent to the following: x = 7 + ( 3 * 2 ); Using parenthesis in an expression alters the default precedence. For example: x = (7 + 3) * 2; /* (7 + 3) is evaluated first */ In an unparenthesized expression, operators of higher precedence are evaluated before those of lower precedence. Consider the following expression: A+B*C The identifiers B and C are multiplied first because the multiplication operator (*) has higher precedence than the addition operator (+). Table 6-2 shows the precedence the compiler uses to evaluate the C operators. Operators with the highest precedence appear at the top of the table; those with the lowest appear at the bottom. Operators of equal precedence appear in the same row. Precedence of C Operators Category Operator Postfix () [] -> . ++ - Unary + - ! ~ ++ - - (type) * & sizeof Multiplicative * / % Additive +Shift << >> Relational < <= > >= Equality == != Bitwise AND & Bitwise XOR ^ Bitwise OR | Logical AND && Logical OR || Conditional ?: Assignment = += -= *= /= %= >>= <<= &= ^= |= Comma , Associativity Left to right Right to left Left to right Left to right Left to right Left to right Left to right Left to right Left to right Left to right Left to right Left to right Right to left Right to left Left to right

Associativity relates to precedence, and resolves any ambiguity over the grouping of operators with the same precedence. In the following statement, the rules of C specify that a * b is evaluated first:

y = a * b / c; In a more complicated example, associativity rules specify that b ? c : d is evaluated first in the following example: a ? b ? c : d : e; The associativity of the conditional operator is right-to-left on the line. The assignment operator also associates right-to-left; for example: int x = 0 , y = 5, z = 3; x = y = z; /* x has the value 3, not 5 */ Other operators associate left-to-right; for example, the binary addition, subtraction, multiplication, and division operators all have left-to-right associativity. Associativity applies to each row of operators in Table 6-2 and is right-to-left for some rows and left-to-right for others. The kind of associativity determines the order in which operators from the same row are evaluated in an unparenthesized expression. Consider the following expression: A*B%C This expression is evaluated as follows because the multiplicative operators (*, /, %) are evaluated from left to right: (A*B)%C Parentheses can always be used to control precedence and associativity within an expression. Introducing to C structure In some programming contexts, you need to access multiple data types under a single name for easier data manipulation; for example you want to refer to address with multiple data like house number, street, zip code, country. C supports structure which allows you to wrap one or more variables with different data types. A structure can contain any valid data types like int, char, float even arrays or even other structures. Each variable in structure is called a structure member. Defining structure To define a structure, you use struct keyword. Here is the common syntax of structure definition: struct struct_name{ structure_member }; The name of structure follows the rule of variable name. Here is an example of defining address structure: 1 struct address{ 2 unsigned int house_number; 3 char street_name[50]; 4 int zip_code; 5 char country[50];

};

The address structure contains house number as an positive integer, street name as a string, zip code as an integer and country as a string. Declaring structure The above example only defines an address structure without creating any structure instance. To create or declare a structure instance, you can do it in two ways: The first way is to declare a structure followed by structure definition like this : 1 struct struct_name { 2 structure_member; 3 ... 4 } instance_1,instance_2 instance_n; In the second way, you can declare the structure instance at a different location in your source code after structure definition. Here is structure declaration syntax : 1 struct struct_name instance_1,instance_2 instance_n; Complex structure If a structure contains arrays or other structures, it is called complex structure. For example address structure is a structure. We can define a complex structure

calledcustomer which contains address structure as follows: 1 struct customer{ 2 char name[50]; 3 structure address billing_addr; 4 structure address shipping_addr; 5 }; Accessing structure member To access structure members we can use dot operator (.) between structure name and structure member name as follows: structure_name.structure_member For example to access street name of structure address we do as follows: 1 struct address billing_addr; 2 billing_addr.country = "US"; If the structure contains another structure, we can use dot operator to access nested structure and use dot operator again to access variables of nested structure. 1 struct customer jack;

2 jack.billing_addr.country = "US"; Initializing structure C programming language treats a structure as a custom data type therefore you can initialize a structure like a variable. Here is an example of initialize productstructure: 1 struct product{ 2 char name[50]; 3 double price; 4 } book = { "C programming language",40.5}; In above example, we define product structure, then we declare and initialize book structure with its name and price. Structure and pointer A structure can contain pointers as structure members and we can create a pointer to a structure as follows: 1 struct invoice{ 2 char* code; 3 char date[20]; 4 }; 5 6 struct address billing_addr; 7 struct address *pa = &billing_addr; Shorthand structure with typedef keyword To make your source code more concise, you can use typedef keyword to create a synonym for a structure. This is an example of using typedef keyword to define address structure so when you want to create an instance of it you can omit the keyword struct 1 typedef struct{ 2 unsigned int house_number; 3 char street_name[50]; 4 int zip_code; 5 char country[50]; 6 } address; 7 8 address billing_addr; 9 address shipping_addr; Copy a structure into another structure One of major advantage of structure is you can copy it with = operator. The syntax as follows 1 struct_intance1 = struct_intance2

be noted that some old C compilers may not supports structure assignment so you have to assign each member variables one by one. Structure and sizeof function sizeof is used to get the size of any data types even with any structures. Let's take a look at simple program: 01 #include <stdio.h> 02 03 typedef struct __address{ 04 int house_number;// 4 bytes 05 char street[50]; // 50 bytes 06 int zip_code; // 4 bytes 07 char country[20];// 20 bytes 08 09 } address;//78 bytes in total 10 11 void main() 12 { 13 // it returns 80 bytes 14 printf("size of address is %d bytes\n",sizeof(address)); 15 } You will never get the size of a structure exactly as you think it must be. The sizeof function returns the size of structure larger than it is because the compiler pads struct members so that each one can be accessed faster without delays. So you should be careful when you read the whole structure from file which were written from other programs. Source code example of using C structure In this example, we will show you how to use structure to wrap student information and manipulate it by reading information to an array of student structure and print them on to console screen. 01 #include <stdio.h> 02 03 typedef struct _student{ 04 char name[50]; 05 unsigned int mark; 06 } student; 07 08 09

10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47

void print_list(student list[], int size); void read_list(student list[], int size);

void main(){ const int size = 3; student list[size]; read_list(list,size); print_list(list,size);

} void read_list(student list[], int size) { printf("Please enter the student information:\n"); for(int i = 0; i < size;i++){ printf("\nname:"); scanf("%S",&list[i].name); printf("\nmark:"); scanf("%U",&list[i].mark); } } void print_list(student list[], int size){ printf("Students' information:\n"); for(int i = 0; i < size;i++){ printf("\nname: %s, mark: %u",list[i].name,list[i].mark); } }

Here is program's output Please enter the student information:

name:Jack

mark:5

name:Anna

mark:7

name:Harry

mark:8 Students' information:

name: name: name: H, mark: 8

J, A,

mark: mark:

5 7

9.2. Using I/O Ports I/O ports are the means by which drivers communicate with many devices, at least part of the time. This section covers the various functions available for making use of I/O ports; we also touch on some portability issues. 9.2.1. I/O Port Allocation As you might expect, you should not go off and start pounding on I/O ports without first ensuring that you have exclusive access to those ports. The kernel provides a registrationinterface that allows your driver to claim the ports it needs. The core function in that interface is request_region: #include <linux/ioport.h> struct resource *request_region(unsigned long first, unsigned long n, const char *name); This function tells the kernel that you would like to make use of n ports, starting with first. The name parameter should be the name of your device. The return value is non-NULL if the allocation succeeds. If you get NULL back from request_region, you will not be able to use the desired ports. All port allocations show up in /proc/ioports. If you are unable to allocate a needed set of ports, that is the place to look to see who got there first.

When you are done with a set of I/O ports (at module unload time, perhaps), they should be returned to the system with: void release_region(unsigned long start, unsigned long n); There is also a function that allows your driver to check to see whether a given set of I/O ports is available: int check_region(unsigned long first, unsigned long n); Here, the return value is a negative error code if the given ports are not available. This function is deprecated because its return value provides no guarantee of whether an allocation would succeed; checking and later allocating are not an atomic operation. We list it here because several drivers are still using it, but you should always use request_region, which performs the required locking to ensure that the allocation is done in a safe, atomic manner. 9.2.2. Manipulating I/O ports After a driver has requested the range of I/O ports it needs to use in its activities, it must read and/or write to those ports. To this end, most hardware differentiates between 8-bit, 16bit, and 32-bit ports. Usually you can't mix them like you normally do with system memory access.[2]
[2]

Sometimes I/O ports are arranged like memory, and you can (for example) bind two 8-bit writes into a single 16-bit operation. This applies, for instance, to PC video boards. But generally, you can't count on this feature. A C program, therefore, must call different functions to access different size ports. As suggested in the previous section, computer architectures that support only memorymapped I/Oregisters fake port I/O by remapping port addresses to memory addresses, and the kernel hides the details from the driver in order to ease portability. The Linux kernel headers (specifically, the architecture-dependent header <asm/io.h>) define the following inline functions to access I/O ports: unsigned inb(unsigned port); void outb(unsigned char byte, unsigned port); Read or write byte ports (eight bits wide). The port argument is defined as unsigned long for some platforms and unsigned short for others. The return type of inb is also different across architectures. unsigned inw(unsigned port); void outw(unsigned short word, unsigned port); These functions access 16-bit ports (one word wide); they are not available when compiling for the S390 platform, which supports only byte I/O. unsigned inl(unsigned port); void outl(unsigned longword, unsigned port); These functions access 32-bit ports. longword is declared as either unsigned long or unsigned int, according to the platform. 9.2.3. I/O Port Access from User Space The functions just described are primarily meant to be used by device drivers, but they can also be used from user space, at least on PC-class computers. The GNU C library defines them in <sys/io.h>. The following conditions should apply in order for inb and friends to be used in user-space code:

o o

The program must be compiled with the -O option to force expansion of inline functions. The ioperm or iopl system calls must be used to get permission to perform I/O operations on ports. ioperm gets permission for individual ports, while iopl gets permission for the entire I/O space. Both of these functions are x86specific. The program must run as root to invoke ioperm or iopl.[3] Alternatively, one of its ancestors must have gained port access running as root.
[3]

Technically, it must have the CAP_SYS_RAWIO capability, but that is the same as running as root on most current systems. If the host platform has no ioperm and no iopl system calls, user space can still access I/O ports by using the /dev/port device file. Note, however, that the meaning of the file is very platform-specific and not likely useful for anything but the PC. The sample sources misc-progs/inp.c and misc-progs/outp.c are a minimal tool for reading and writing ports from the command line, in user space. They expect to be installed under multiple names (e.g., inb, inw, and inl and manipulates byte, word, or long ports depending on which name was invoked by the user). They use ioperm or iopl under x86, /dev/port on other platforms. The programs can be made setuid root, if you want to live dangerously and play with your hardware without acquiring explicit privileges. Please do not install them setuid on a production system, however; they are a security hole by design. 9.2.4. String Operations In addition to the single-shot in and out operations, some processors implement special instructions to transfer a sequence of bytes, words, or longs to and from a single I/O port or the same size. These are the so-called string instructions, and they perform the task more quickly than a C-language loop can do. The following macros implement the concept of string I/O either by using a single machine instruction or by executing a tight loop if the target processor has no instruction that performs string I/O. The macros are not defined at all when compiling for the S390 platform. This should not be a portability problem, since this platform doesn't usually share device drivers with other platforms, because its peripheral buses are different. The prototypes for string functions are: void insb(unsigned port, void *addr, unsigned long count); void outsb(unsigned port, void *addr, unsigned long count); Read or write count bytes starting at the memory address addr. Data is read from or written to the single port port. void insw(unsigned port, void *addr, unsigned long count); void outsw(unsigned port, void *addr, unsigned long count); Read or write 16-bit values to a single 16-bit port. void insl(unsigned port, void *addr, unsigned long count); void outsl(unsigned port, void *addr, unsigned long count); Read or write 32-bit values to a single 32-bit port. There is one thing to keep in mind when using the string functions: they move a straight byte stream to or from the port. When the port and the host system have different byte ordering rules, the results can be surprising. Reading a port with inw swaps the bytes, if need be, to

make the value read match the host ordering. The string functions, instead, do not perform this swapping. 9.2.5. Pausing I/O Some platformsmost notably the i386can have problems when the processor tries to transfer data too quickly to or from the bus. The problems can arise when the processor is overclocked with respect to the peripheral bus (think ISA here) and can show up when the device board is too slow. The solution is to insert a small delay after each I/O instruction if another such instruction follows. On the x86, the pause is achieved by performing an out b instruction to port 0x80 (normally but not always unused), or by busy waiting. See the io.hfile under your platform's asm subdirectory for details. If your device misses some data, or if you fear it might miss some, you can use pausing functions in place of the normal ones. The pausing functions are exactly like those listed previously, but their names end in _p; they are called inb_p, outb_p, and so on. The functions are defined for most supported architectures, although they often expand to the same code as nonpausing I/O, because there is no need for the extra pause if the architecture runs with a reasonably modern peripheral bus. 9.2.6. Platform Dependencies I/O instructions are, by their nature, highly processor dependent. Because they work with the details of how the processor handles moving data in and out, it is very hard to hide the differences between systems. As a consequence, much of the source code related to port I/O is platform-dependent. Once again, I/O space is memory-mapped. Versions of the port functions are defined to work with unsigned long ports. The curious reader can extract more information from the io.h files, which sometimes define a few architecture-specific functions in addition to those we describe in this chapter. Be warned that some of these files are rather difficult reading, however. It's interesting to note that no processor outside the x86 family features a different address space for ports, even though several of the supported families are shipped with ISA and/or PCI slots (and both buses implement separate I/O and memory address spaces). Moreover, some processors (most notably the early Alphas) lack instructions that move one or two bytes at a time.[4] Therefore, their peripheral chipsets simulate 8-bit and 16bit I/Oaccesses by mapping them to special address ranges in the memory address space. Thus, an inb and an inw instruction that act on the same port are implemented by two 32bitmemory reads that operate on different addresses. Fortunately, all of this is hidden from the device driver writer by the internals of the macros described in this section, but we feel it's an interesting feature to note. If you want to probe further, look for examples in include/asmalpha/core_lca.h.

9.3.1. An Overview of the Parallel Port Because we expect most readers to be using an x86 platform in the form called "personal computer," we feel it is worth explaining how the PC parallel port is designed. The parallel port is the peripheral interface of choice for running digital I/O sample code on a personal computer. Although most readers probably have parallel port specifications available, we summarize them here for your convenience. The parallel interface, in its minimal configuration (we overlook the ECP and EPP modes) is made up of three 8-bit ports. The PC standard starts the I/O ports for the first parallelinterface at 0x378 and for the second at 0x278. The first port is a bidirectional data register; it connects directly to pins 2-9 on the physical connector. The second port is a read-only status register; when the parallel port is being used for a printer, this register reports several aspects of printer status, such as being online, out of paper, or busy. The third port is an output-only control register, which, among other things, controls whether interrupts are enabled. The signal levels used in parallel communications are standard transistor-transistor logic (TTL) levels: 0 and 5 volts, with the logic threshold at about 1.2 volts. You can count on the ports at least meeting the standard TTL LS current ratings, although most modern parallel ports do better in both current and voltage ratings. The bit specifications are outlined in Figure 9-1. You can access 12 output bits and 5 input bits, some of which are logically inverted over the course of their signal path. The only bit with no associated signal pin is bit 4 (0x10) of port 2, which enables interrupts from the parallel port. We use this bit as part of our implementation of an interrupt handler in Chapter 10. Figure 9-1. The pinout of the parallel port

9.3.2. A Sample Driver The driver we introduce is called short (Simple Hardware Operations and Raw Tests). All it does is read and write a few 8-bit ports, starting from the one you select at load time. By default, it uses the port range assigned to the parallel interface of the PC. Each device node (with a unique minor number) accesses a different port. The short driver doesn't do anything

9.4. Using I/O Memory Despite the popularity of I/O ports in the x86 world, the main mechanism used to communicate with devices is through memory-mapped registers and device memory. Both are calledI/O memory because the difference between registers and memory is transparent to software. I/O memory is simply a region of RAM-like locations that the device makes available to the processor over the bus. This memory can be used for a number of purposes, such as holding video data or Ethernet packets, as well as implementing device registers that behave just like I/O ports (i.e., they have side effects associated with reading and writing them). The way to access I/O memory depends on the computer architecture, bus, and device being used, although the principles are the same everywhere. The discussion in this chapter touches mainly on ISA and PCI memory, while trying to convey general information as well. Although access to PCI memory is introduced here, a thorough discussion of PCI is deferred to Chapter 12. Depending on the computer platform and bus being used, I/O memory may or may not be accessed through page tables. When access passes though page tables, the kernel must first arrange for the physical address to be visible from your driver, and this usually means that you must call ioremap before doing any I/O. If no page tables are needed, I/O memorylocations look pretty much like I/O ports, and you can just read and write to them using proper wrapper functions. Whether or not ioremap is required to access I/O memory, direct use of pointers to I/O memory is discouraged. Even though (as introduced in Section 9.1) I/O memory is addressed like normal RAM at hardware level, the extra care outlined in the Section 9.1.1 suggests avoiding normal pointers. The wrapper functions used to access I/O memory are safe on all platforms and are optimized away whenever straight pointer dereferencing can perform the operation. Therefore, even though dereferencing a pointer works (for now) on the x86, failure to use the proper macros hinders the portability and readability of the driver. 9.4.1. I/O Memory Allocation and Mapping I/O memory regions must be allocated prior to use. The interface for of memory regions (defined in <linux/ioport.h>) is: struct resource *request_mem_region(unsigned long start, unsigned long len, char *name); allocation

This function allocates a memory region of len bytes, starting at start. If all goes well, a nonNULL pointer is returned; otherwise the return value is NULL. All I/O memory allocations are listed in /proc/iomem. Memory regions should be freed when no longer needed: void release_mem_region(unsigned long start, unsigned long len); There is also an old function for checking I/O memory region availability: int check_mem_region(unsigned long start, unsigned long len); But, as with check_region, this function is unsafe and should be avoided. Allocation of I/O memory is not the only required step before that memory may be accessed. You must also ensure that this I/O memory has been made accessible to the kernel. Getting

at I/O memory is not just a matter of dereferencing a pointer; on many systems, I/O memory is not directly accessible in this way at all. So a mapping must be set up first. This is the role of the ioremap function, introduced in Section 8.4 in Chapter 8. The function is designed specifically to assign virtual addresses to I/O memory regions. Once equipped with ioremap (and iounmap), a device driver can access any I/O memory address, whether or not it is directly mapped to virtual address space. Remember, though, that the addresses returned from ioremap should not be dereferenced directly; instead, accessor functions provided by the kernel should be used. Before we get into those functions, we'd better review the ioremap prototypes and introduce a few details that we passed over in the previous chapter. The functions are called according to the following definition: #include <asm/io.h> void *ioremap(unsigned long phys_addr, unsigned long size); void *ioremap_nocache(unsigned long phys_addr, unsigned long size); void iounmap(void * addr); First of all, you notice the new function ioremap_nocache. We didn't cover it in Chapter 8, because its meaning is definitely hardware related. Quoting from one of the kernel headers: "It's useful if some control registers are in such an area, and write combining or read caching is not desirable." Actually, the function's implementation is identical to ioremap on most computer platforms: in situations where all of I/O memory is already visible through noncacheable addresses, there's no reason to implement a separate, noncaching version ofioremap. 9.4.2. Accessing I/O Memory On some platforms, you may get away with using the return value from ioremap as a pointer. Such use is not portable, and, increasingly, the kernel developers have been working to eliminate any such use. The proper way of getting at I/O memory is via a set of functions (defined via <asm/io.h>) provided for that purpose. To read from I/O memory, use one of the following: unsigned int ioread8(void *addr); unsigned int ioread16(void *addr); unsigned int ioread32(void *addr); Here, addr should be an address obtained from ioremap (perhaps with an integer offset); the return value is what was read from the given I/O memory. There is a similar set of functions for writing to I/O memory: void iowrite8(u8 value, void *addr); void iowrite16(u16 value, void *addr); void iowrite32(u32 value, void *addr); If you must read or write a series of values to a given I/O memory address, you can use the repeating versions of the functions: void ioread8_rep(void *addr, void *buf, unsigned long count); void ioread16_rep(void *addr, void *buf, unsigned long count); void ioread32_rep(void *addr, void *buf, unsigned long count); void iowrite8_rep(void *addr, const void *buf, unsigned long count); void iowrite16_rep(void *addr, const void *buf, unsigned long count); void iowrite32_rep(void *addr, const void *buf, unsigned long count);

These functions read or write count values from the given buf to the given addr. Note that count is expressed in the size of the data being written; ioread32_rep reads count 32-bit values starting at buf. The functions described above perform all I/O to the given addr. If, instead, you need to operate on a block of I/O memory, you can use one of the following: void memset_io(void *addr, u8 value, unsigned int count); void memcpy_fromio(void *dest, void *source, unsigned int count); void memcpy_toio(void *dest, void *source, unsigned int count); These functions behave like their C library analogs. If you read through the kernel source, you see many calls to an older set of functions when I/O memory is being used. These functions still work, but their use in new code is discouraged. Among other things, they are less safe because they do not perform the same sort of type checking. Nonetheless, we describe them here: unsigned readb(address); unsigned readw(address); unsigned readl(address); These macros are used to retrieve 8-bit, 16-bit, and 32-bit data values from I/O memory. void writeb(unsigned value, address); void writew(unsigned value, address); void writel(unsigned value, address); Like the previous functions, these functions (macros) are used to write 8-bit, 16bit, and 32-bit data items. Some 64-bit platforms also offer readq and writeq, for quad-word (8byte) memory operations on the PCI bus. The quad-word nomenclature is a historical leftover from the times when all real processors had 16-bit words. Actually, the L naming used for 32bit values has become incorrect too, but renaming everything would confuse things even more. 9.4.3. Ports as I/O Memory Some hardware has an interesting feature: some versions use I/O ports, while others use I/O memory. The registers exported to the processor are the same in either case, but the access method is different. As a way of making life easier for drivers dealing with this kind of hardware, and as a way of minimizing the apparent differences between I/O port andmemory accesses, the 2.6 kernel provides a function called ioport_map: void *ioport_map(unsigned long port, unsigned int count); This function remaps count I/O ports and makes them appear to be I/O memory. From that point thereafter, the driver may use ioread8 and friends on the returned addresses andforget that it is using I/O ports at all. This mapping should be undone when it is no longer needed: void ioport_unmap(void *addr); These functions make I/O ports look like memory. Do note, however, that the I/O ports must still be allocated with request_region before they can be remapped in this way.

9.4.4. Reusing short for I/O Memory The short sample module, introduced earlier to access I/O ports, can be used to access I/O memory as well. To this aim, you must tell it to use I/O memory at load time; also, you need to change the base address to make it point to your I/O region. For example, this is how we used short to light the debug LEDs on a MIPS development board: mips.root# ./short_load use_mem=1 base=0xb7ffffc0 mips.root# echo -n 7 > /dev/short0 Use of short for I/O memory is the same as it is for I/O ports. The following fragment shows the loop used by short in writing to a memory location: while (count--) { iowrite8(*ptr++, address); wmb( ); } Note the use of a write memory barrier here. Because iowrite8 likely turns into a direct assignment on many architectures, the memory barrier is needed to ensure that the writes happen in the expected order. short uses inb and outb to show how that is done. It would be a straightforward exercise for the reader, however, to change short to remap I/O ports with ioport_map, and simplify the rest of the code considerably.C REGISTER USAGE A number of registers available on a processor and the operations that can be performed using those registers has a significant impact on the efficiency of code generated by optimizing compilers. The Strahler number defines the minimum number of registers required to evaluate an expression tree. REGISTER ALLOCATION In compiler optimization, register allocation is the process of assigning a large number of target program variables onto a small number of CPU registers. Register allocation can happen over a basic block (local register allocation), over a whole function/procedure (global register allocation), or in-between functions as a calling convention (interprocedural register allocation). 1.

Introduction
In many programming languages, the programmer has the illusion of allocating arbitrarily many variables. However, during compilation, the compiler must decide how to allocate these variables to a small, finite set of registers. Not all variables are in use (or "live") at the same time, so some registers may be assigned to more than one variable. However, two variables in use at the same time cannot be assigned to the same register without corrupting its value. Variables which cannot be assigned to some register must be kept in RAM and loaded in/out for every read/write, a process called spilling. Accessing RAM is significantly slower than accessing registers and slows down the execution speed of the compiled program, so an optimizing compiler aims to assign as many variables to registers as possible. Register pressure is the term used when there are fewer hardware registers available than would have been optimal; higher pressure usually means that more spills and reloads are needed.

In addition, programs can be further optimized by assigning the same register to a source and destination of a move instruction whenever possible. This is especially important if the compiler is using other optimizations such as SSA analysis, which artificially generates additional move instructions in the intermediate code. The most commonly used registers are: 1. CPU cache 2. Shift registers SHIFT REGISTERS Shift registersIn digital circuits, a shift register is a cascade of flip flops, sharing the same clock, which has the output of any one but the last flip-flop connected to the "data" input of the next one in the chain, resulting in a circuit that shifts by one position the one-dimensional "bit array" stored in it, shifting in the data present at its input and shifting out the last bit in the array, when enabled to do so by a transition of the clock input. More generally, a shift register may be multidimensional, such that its "data in" input and stage outputs are themselves bit arrays: this is implemented simply by running several shift registers of the same bit-length in parallel. One of the most common uses of a shift register is to convert between serial and parallel interfaces. This is useful as many circuits work on groups of bits in parallel, but serial interfaces are simpler to construct. Shift registers can be used as simple delay circuits. Several bidirectional shift registers could also be connected in parallel for a hardware implementation of a stack. CPU CACHE A CPU cache is a cache used by the central processing unit of a computer to reduce the average time to access memory. The cache is a smaller, faster memory which stores copies of the data from the most frequently used main memory locations. As long as most memory accesses are cached memory locations, the average latency of memory accesses will be closer to the cache latency than to the latency of main memory. When the processor needs to read from or write to a location in main memory, it first checks whether a copy of that data is in the cache. If so, the processor immediately reads from or writes to the cache, which is much faster than reading from or writing to main memory. Most modern desktop and server CPUs have at least three independent caches: an instruction cache to speed up executable instruction fetch, a data cache to speed up data fetch and store, and a translation lookaside buffer (TLB) used to speed up virtual-to-physical address translation for both executable instructions and data. Cache entry structure Cache row entries usually have the following structure: tag data blocks valid bit The data blocks (cache line) contain the actual data fetched from the main memory. The valid bit (dirty bit) denotes that this particular entry has valid data. An effective memory address is split (MSB to LSB) into the tag, the index and the displacement (offset), tag index displacement The index length is bits and describes which row the data has been put in. The displacement length is and specifies which block of the ones we have stored we need. The tag length is address_length index_length displacement_length and contains the most significant bits of the address, which are checked against the current row (the row has been retrieved by index) to see if it is the one we need or

another, irrelevant memory location that happened to have the same index bits as the one we want. TYPES OF CACHE direct mapped cachethe best (fastest) hit times, and so the best tradeoff for "large" caches 2-way set associative cache 2-way skewed associative cache "the best tradeoff for .... caches whose sizes are in the range 4K-8K bytes" Andr Seznec[2] 4-way set associative cache fully associative cache the best (lowest) miss rates, and so the best tradeoff when the miss penalty is very high

Mixed C and Assembly Embedded systems code lives in a much more spartan environment than traditional application software. Resorting directly to assembly code is undesirable, unless you have to observe fixed timing, or you want to use pre-existing assembly code in your current project. Calling Conventions Embedded C cross-compilers generate less-standardized code for calling functions. When debugging your program, you should know the answers to the following questions. Does your compiler set up page bits, or perform bank switching, prior to calling a subroutine? Does the compiler or processor handle saving and restoring state during an interrupt? How are function arguments passed? How are results returned? It's almost guaranteed that an 8-bit result will be left the accumulator. Access to C Variables from Assembly Does your assembly code properly address C identifiers? While the compiler may allow you to use a C identifier as an argument in an assembly mnemonic, it may not check the size of the value against the prescribed size of the instruction. As a result, the program may load one byte of a multiple byte value, without regard for its significance. Register usage : Registers are faster than memory to access, so the variables which are most frequently used in a C program can be put in registers using register keyword. The keyword register hints to compiler that a given variable can be put in a register. Its compilers choice to put it in a register or not. Generally, compilers themselves do optimizations and put the variables in register.

1) If you use & operator with a register variable then compiler may give an error or warning (depending upon the compiler you are using), because when we say a variable is a register, it may be stored in a register instead of memory and accessing address of a register is invalid. Try below program. ? int main() { register int i = 10; int *a = &i; printf("%d", *a); getchar(); return 0; } 2) register keyword can be used with pointer variables. Obviously, a register can have address of a memory location. There would not be any problem with the below program. ? int main() { int i = 10; register int *a = &i; printf("%d", *a); getchar(); return 0; } 3) Register is a storage class, and C doesnt allow multiple storage class specifiers for a variable. So,register can not be used with static . Try below program. ? int main() { int i = 10; register static int *a = &i; printf("%d", *a); getchar(); return 0; } 4) There is no limit on number of register variables in a C program, but the point is compiler may put some variables in register and some not.

Functions When the compiler reaches the function definition, it generates machine instructions to implement

the functionality, and reserves enough program memory to hold the statements in the function. The address of the function is available through the symbol table. A function definition includes a statement block that contains all function statements. Even if a function has only a single executable statement, it must be enclosed in a statement block. Embedded C supports function prototypes. Function prototype declarations ensure that the compiler knows about a function and its parameter types, even if its definition has yet to appear in the compiler's input. Prototypes assist in checking forward calls. The function name is recorded as an identifier, and is therefore known when invoked in code prior to its definition. Header files of function prototypes provide the foundation for using libraries. The syntax for a function call in C is the function name and a list of actual parameters surrounded by parentheses. Function calling is one area in which embedded C differs substantially from traditional C. The way that parameters are passed differs significantly, as well as the permitted number of parameters. Functions that produce extensive side effects are harder to maintain and debug, especially for members of a development team. To safely use abstract functions, you need to know only the data that goes in and comes out the function interface. When a function produces side effects, you need to know about the interface and behaviour to use it safely. Some C programmers insist that functions that just produce side effects should return a value to indicate success, failure, or error. Since ROM space is at a premium, the code needed to evaluate the return status is a luxury. Function Parameters C for embedded processors places some unique restrictions on function calls. Some compilers restrict the number of parameters that can be passed to a function. Two byte-sized parameters (or one 16-bit parameter) can be passed within the common processor registers (accumulator and index register). To pass by reference, pass a pointer as usual. See information on pointers in Section 6.7.1, Pointers, for extra information about the relative cost of using pointers. A function with no parameters can be declared with an empty parameter list. int myFunc() However, it is good practice to specify that the function has no parameters with the void parameter type. int myFunc(void) In embedded programs, main() does not accept any parameters WHY USE FUNCTIONS Two reasons : (i) Writing functions avoids rewriting the same code over and over. Suppose that there is

a section of code in a program that calculates area of a triangle. If, later in the program we want to calculate the area of a different triangle we wont like to write the same instructions all over again. Instead we would prefer to jump to a section of code that calculates area and then jump back to the place from where you left off. This section of code is nothing but a function. (ii) Using functions it becomes easier to write programs and keep track of what they are doing. If the operation of a program can be divided in to separate activities, and each activity placed in a different function, then each could be written and checked more or less independently. Separating the code in to modular functions also makes the program easier to design and understand. CALL BY VALUE In the preceding examples we have seen that whenever we called a function we have always passed the values of variables to the called function. Such function calls are called calls by value by this what it meant is that on calling a function we are passing values of variables to it. The example of call by value are shown below ; sum = calsum (a, b, c); f = factr (a); In this method the value of each of the actual arguments in the calling function is copied into corresponding formal arguments of the called function. With this method the changes made to the formal arguments in the called function have no effect on the values of actual argument in the calling function. the following program illustrates this main ( ) { int a = 10, b=20; swapy (a,b); printf (\na = % d b = % d, a,b); } swapy (int x, int y) { int t; t = x; x = y; y = t; printf ( \n x = % d y = % d , x, y); } The output of the above program would be; x = 20 y = 10 a =10 b =20 CALL BY REFERENCE In the second method the addresses of actual arguments in the calling function are copied in to formal arguments of the called function. This means that using these addresses we would have an access to the actual arguments and hence we would be able to manipulate them the following program illustrates this. main ( ) { int a = 10, b =20, swapr (&a, &b); printf (\n a = %d b= %d, a, b); }

swapr (int *x, int * y) { int t; t = *x *x = *y; *y = t; } The output of the above program would be a = 20 b =10 Memory Management The memory management subsystem is one of the most important parts of the operating system. Since the early days of computing, there has been a need for more memory than exists physically in a system. Strategies have been developed to overcome this limitation and the most successful of these is virtual memory. Virtual memory makes the system appear to have more memory than it actually has by sharing it between competing processes as they need it. Virtual memory does more than just make your computer's memory go further. The memory management subsystem provides: Large Address Spaces The operating system makes the system appear as if it has a larger amount of memory than it actually has. The virtual memory can be many times larger than the physical memory in the system, Protection Each process in the system has its own virtual address space. These virtual address spaces are completely separate from each other and so a process running one application cannot affect another. Also, the hardware virtual memory mechanisms allow areas of memory to be protected against writing. This protects code and data from being overwritten by rogue applications. Memory Mapping Memory mapping is used to map image and data files into a processes address space. In memory mapping, the contents of a file are linked directly into the virtual address space of a process. Fair Physical Memory Allocation The memory management subsystem allows each running process in the system a fair share of the physical memory of the system, Shared Virtual Memory Although virtual memory allows processes to have separate (virtual) address spaces, there are times when you need processes to share memory. For example there could be several processes in the system running the bash command shell. Rather than have several copies of bash, one in each processes virtual address space, it is better to have only one copy in physical memory and all of the processes running bash share it. Dynamic libraries are another common example of executing code shared between several processes.

Shared memory can also be used as an Inter Process Communication (IPC) mechanism, with two or more processes exchanging information via memory common to all of them. Linux supports the Unix TM System V shared memory IPC. 3.1 An Abstract Model of Virtual Memory

Figure 3.1: Abstract model of Virtual to Physical address mapping Before considering the methods that Linux uses to support virtual memory it is useful to consider an abstract model that is not cluttered by too much detail. As the processor executes a program it reads an instruction from memory and decodes it. In decoding the instruction it may need to fetch or store the contents of a location in memory. The processor then executes the instruction and moves onto the next instruction in the program. In this way the processor is always accessing memory either to fetch instructions or to fetch and store data. In a virtual memory system all of these addresses are virtual addresses and not physical addresses. These virtual addresses are converted into physical addresses by the processor based on information held in a set of tables maintained by the operating system. To make this translation easier, virtual and physical memory are divided into handy sized chunks called pages. These pages are all the same size, they need not be but if they were not, the system would be very hard to administer. Linux on Alpha AXP systems uses 8 Kbyte pages and on Intel x86 systems it uses 4 Kbyte pages. Each of these pages is given a unique number; the page frame number (PFN). In this paged model, a virtual address is composed of two parts; an offset and a virtual page frame number. If the page size is 4 Kbytes, bits 11:0 of the virtual address contain the offset and bits 12 and above are the virtual page frame number. Each time the processor encounters

a virtual address it must extract the offset and the virtual page frame number. The processor must translate the virtual page frame number into a physical one and then access the location at the correct offset into that physical page. To do this the processor uses page tables. Figure 3.1 shows the virtual address spaces of two processes, process X and process Y, each with their own page tables. These page tables map each processes virtual pages into physical pages in memory. This shows that process X's virtual page frame number 0 is mapped into memory in physical page frame number 1 and that process Y's virtual page frame number 1 is mapped into physical page frame number 4. Each entry in the theoretical page table contains the following information:

Valid flag. This indicates if this page table entry is valid, The physical page frame number that this entry is describing, Access control information. This describes how the page may be used. Can it be written to? Does it contain executable code?

The page table is accessed using the virtual page frame number as an offset. Virtual page frame 5 would be the 6th element of the table (0 is the first element). To translate a virtual address into a physical one, the processor must first work out the virtual addresses page frame number and the offset within that virtual page. By making the page size a power of 2 this can be easily done by masking and shifting. Looking again at Figures 3.1 and assuming a page size of 0x2000 bytes (which is decimal 8192) and an address of 0x2194 in process Y's virtual address space then the processor would translate that address into offset 0x194 into virtual page frame number 1. The processor uses the virtual page frame number as an index into the processes page table to retrieve its page table entry. If the page table entry at that offset is valid, the processor takes the physical page frame number from this entry. If the entry is invalid, the process has accessed a non-existent area of its virtual memory. In this case, the processor cannot resolve the address and must pass control to the operating system so that it can fix things up. Just how the processor notifies the operating system that the correct process has attempted to access a virtual address for which there is no valid translation is specific to the processor. However the processor delivers it, this is known as a page fault and the operating system is notified of the faulting virtual address and the reason for the page fault. Assuming that this is a valid page table entry, the processor takes that physical page frame number and multiplies it by the page size to get the address of the base of the page in physical memory. Finally, the processor adds in the offset to the instruction or data that it needs. Using the above example again, process Y's virtual page frame number 1 is mapped to physical page frame number 4 which starts at 0x8000 (4 x 0x2000). Adding in the 0x194 byte offset gives us a final physical address of 0x8194. By mapping virtual to physical addresses this way, the virtual memory can be mapped into the system's physical pages in any order. For example, in Figure 3.1 process X's virtual page frame number 0 is mapped to physical page frame number 1 whereas virtual page frame number 7 is mapped to physical page frame number 0 even though it is higher in virtual memory than virtual page frame number 0. This demonstrates an interesting byproduct of

virtual memory; the pages of virtual memory do not have to be present in physical memory in any particular order. 3.1.1 Demand Paging As there is much less physical memory than virtual memory the operating system must be careful that it does not use the physical memory inefficiently. One way to save physical memory is to only load virtual pages that are currently being used by the executing program. For example, a database program may be run to query a database. In this case not all of the database needs to be loaded into memory, just those data records that are being examined. If the database query is a search query then it does not make sense to load the code from the database program that deals with adding new records. This technique of only loading virtual pages into memory as they are accessed is known as demand paging. When a process attempts to access a virtual address that is not currently in memory the processor cannot find a page table entry for the virtual page referenced. For example, in Figure 3.1 there is no entry in process X's page table for virtual page frame number 2 and so if process X attempts to read from an address within virtual page frame number 2 the processor cannot translate the address into a physical one. At this point the processor notifies the operating system that a page fault has occurred. If the faulting virtual address is invalid this means that the process has attempted to access a virtual address that it should not have. Maybe the application has gone wrong in some way, for example writing to random addresses in memory. In this case the operating system will terminate it, protecting the other processes in the system from this rogue process. If the faulting virtual address was valid but the page that it refers to is not currently in memory, the operating system must bring the appropriate page into memory from the image on disk. Disk access takes a long time, relatively speaking, and so the process must wait quite a while until the page has been fetched. If there are other processes that could run then the operating system will select one of them to run. The fetched page is written into a free physical page frame and an entry for the virtual page frame number is added to the processes page table. The process is then restarted at the machine instruction where the memory fault occurred. This time the virtual memory access is made, the processor can make the virtual to physical address translation and so the process continues to run. Linux uses demand paging to load executable images into a processes virtual memory. Whenever a command is executed, the file containing it is opened and its contents are mapped into the processes virtual memory. This is done by modifying the data structures describing this processes memory map and is known as memory mapping. However, only the first part of the image is actually brought into physical memory. The rest of the image is left on disk. As the image executes, it generates page faults and Linux uses the processes memory map in order to determine which parts of the image to bring into memory for execution. 3.1.2 Swapping If a process needs to bring a virtual page into physical memory and there are no free physical pages available, the operating system must make room for this page by discarding another page from physical memory.

If the page to be discarded from physical memory came from an image or data file and has not been written to then the page does not need to be saved. Instead it can be discarded and if the process needs that page again it can be brought back into memory from the image or data file. However, if the page has been modified, the operating system must preserve the contents of that page so that it can be accessed at a later time. This type of page is known as a dirty page and when it is removed from memory it is saved in a special sort of file called the swap file. Accesses to the swap file are very long relative to the speed of the processor and physical memory and the operating system must juggle the need to write pages to disk with the need to retain them in memory to be used again. If the algorithm used to decide which pages to discard or swap (the swap algorithm is not efficient then a condition known as thrashing occurs. In this case, pages are constantly being written to disk and then being read back and the operating system is too busy to allow much real work to be performed. If, for example, physical page frame number 1 in Figure 3.1 is being regularly accessed then it is not a good candidate for swapping to hard disk. The set of pages that a process is currently using is called the working set. An efficient swap scheme would make sure that all processes have their working set in physical memory. Linux uses a Least Recently Used (LRU) page aging technique to fairly choose pages which might be removed from the system. This scheme involves every page in the system having an age which changes as the page is accessed. The more that a page is accessed, the younger it is; the less that it is accessed the older and more stale it becomes. Old pages are good candidates for swapping. 3.1.3 Shared Virtual Memory Virtual memory makes it easy for several processes to share memory. All memory access are made via page tables and each process has its own separate page table. For two processes sharing a physical page of memory, its physical page frame number must appear in a page table entry in both of their page tables. Figure 3.1 shows two processes that each share physical page frame number 4. For process X this is virtual page frame number 4 whereas for process Y this is virtual page frame number 6. This illustrates an interesting point about sharing pages: the shared physical page does not have to exist at the same place in virtual memory for any or all of the processes sharing it. 3.1.4 Physical and Virtual Addressing Modes It does not make much sense for the operating system itself to run in virtual memory. This would be a nightmare situation where the operating system must maintain page tables for itself. Most multi-purpose processors support the notion of a physical address mode as well as a virtual address mode. Physical addressing mode requires no page tables and the processor does not attempt to perform any address translations in this mode. The Linux kernel is linked to run in physical address space. The Alpha AXP processor does not have a special physical addressing mode. Instead, it divides up the memory space into several areas and designates two of them as physically

mapped addresses. This kernel address space is known as KSEG address space and it encompasses all addresses upwards from 0xfffffc0000000000. In order to execute from code linked in KSEG (by definition, kernel code) or access data there, the code must be executing in kernel mode. The Linux kernel on Alpha is linked to execute from address 0xfffffc0000310000. 3.1.5 Access Control The page table entries also contain access control information. As the processor is already using the page table entry to map a processes virtual address to a physical one, it can easily use the access control information to check that the process is not accessing memory in a way that it should not. There are many reasons why you would want to restrict access to areas of memory. Some memory, such as that containing executable code, is naturally read only memory; the operating system should not allow a process to write data over its executable code. By contrast, pages containing data can be written to but attempts to execute that memory as instructions should fail. Most processors have at least two modes of execution: kernel and user. You would not want kernel code executing by a user or kernel data structures to be accessible except when the processor is running in kernel mode.

Figure 3.2: Alpha AXP Page Table Entry The access control information is held in the PTE and is processor specific; figure 3.2 shows the PTE for Alpha AXP. The bit fields have the following meanings: V Valid, if set this PTE is valid, FOE ``Fault on Execute'', Whenever an attempt to execute instructions in this page occurs, the processor reports a page fault and passes control to the operating system, FOW ``Fault on Write'', as above but page fault on an attempt to write to this page,

FOR ``Fault on Read'', as above but page fault on an attempt to read from this page, ASM Address Space Match. This is used when the operating system wishes to clear only some of the entries from the Translation Buffer, KRE Code running in kernel mode can read this page, URE Code running in user mode can read this page, GH Granularity hint used when mapping an entire block with a single Translation Buffer entry rather than many, KWE Code running in kernel mode can write to this page, UWE Code running in user mode can write to this page, page frame number For PTEs with the V bit set, this field contains the physical Page Frame Number (page frame number) for this PTE. For invalid PTEs, if this field is not zero, it contains information about where the page is in the swap file. The following two bits are defined and used by Linux: _PAGE_DIRTY if set, the page needs to be written out to the swap file, _PAGE_ACCESSED Used by Linux to mark a page as having been accessed. 3.2 Caches If you were to implement a system using the above theoretical model then it would work, but not particularly efficiently. Both operating system and processor designers try hard to extract more performance from the system. Apart from making the processors, memory and so on faster the best approach is to maintain caches of useful information and data that make some operations faster. Linux uses a number of memory management related caches: Buffer Cache The buffer cache contains data buffers that are used by the block device drivers. These buffers are of fixed sizes (for example 512 bytes) and contain blocks of information that have either been read from a block device or are being written to it. A block device is one that can only be accessed by reading and writing fixed sized blocks of data. All hard disks are block devices. The buffer cache is indexed via the device identifier and the desired block number and is used to quickly find a block of data. Block devices are only ever accessed via the buffer cache. If data can be found in the buffer cache then it does not need to be read from the physical block device, for example a hard disk, and access to it is much faster.

Page Cache This is used to speed up access to images and data on disk. It is used to cache the logical contents of a file a page at a time and is accessed via the file and offset within the file. As pages are read into memory from disk, they are cached in the page cache. Swap Cache Only modified (or dirty) pages are saved in the swap file. So long as these pages are not modified after they have been written to the swap file then the next time the page is swapped out there is no need to write it to the swap file as the page is already in the swap file. Instead the page can simply be discarded. In a heavily swapping system this saves many unnecessary and costly disk operations. Hardware Caches One commonly implemented hardware cache is in the processor; a cache of Page Table Entries. In this case, the processor does not always read the page table directly but instead caches translations for pages as it needs them. These are the Translation Look-aside Buffers and contain cached copies of the page table entries from one or more processes in the system. When the reference to the virtual address is made, the processor will attempt to find a matching TLB entry. If it finds one, it can directly translate the virtual address into a physical one and perform the correct operation on the data. If the processor cannot find a matching TLB entry then it must get the operating system to help. It does this by signalling the operating system that a TLB miss has occurred. A system specific mechanism is used to deliver that exception to the operating system code that can fix things up. The operating system generates a new TLB entry for the address mapping. When the exception has been cleared, the processor will make another attempt to translate the virtual address. This time it will work because there is now a valid entry in the TLB for that address. The drawback of using caches, hardware or otherwise, is that in order to save effort Linux must use more time and space maintaining these caches and, if the caches become corrupted, the system will crash. 3.4 Page Allocation and Deallocation There are many demands on the physical pages in the system. For example, when an image is loaded into memory the operating system needs to allocate pages. These will be freed when the image has finished executing and is unloaded. Another use for physical pages is to hold kernel specific data structures such as the page tables themselves. The mechanisms and data structures used for page allocation and deallocation are perhaps the most critical in maintaining the efficiency of the virtual memory subsystem.

All of the physical pages in the system are described by the mem_map data structure which is a list of mem_map_t
1

structures which is initialized at boot time. Each mem_map_t describes a single physical page in the system. Important fields (so far as memory management is concerned) are: count This is a count of the number of users of this page. The count is greater than one when the page is shared between many processes, age This field describes the age of the page and is used to decide if the page is a good candidate for discarding or swapping, map_nr This is the physical page frame number that this mem_map_t describes. The free_area vector is used by the page allocation code to find and free pages. The whole buffer management scheme is supported by this mechanism and so far as the code is concerned, the size of the page and physical paging mechanisms used by the processor are irrelevant. Each element of free_area contains information about blocks of pages. The first element in the array describes single pages, the next blocks of 2 pages, the next blocks of 4 pages and so on upwards in powers of two. The list element is used as a queue head and has pointers to the page data structures in the mem_map array. Free blocks of pages are queued here. map is a pointer to a bitmap which keeps track of allocated groups of pages of this size. Bit N of the bitmap is set if the Nth block of pages is free. Figure free-area-figure shows the free_area structure. Element 0 has one free page (page frame number 0) and element 2 has 2 free blocks of 4 pages, the first starting at page frame number 4 and the second at page frame number 56. 3.4.1 Page Allocation Linux uses the Buddy algorithm 2 to effectively allocate and deallocate blocks of pages. The page allocation code attempts to allocate a block of one or more physical pages. Pages are allocated in blocks which are powers of 2 in size. That means that it can allocate a block 1 page, 2 pages, 4 pages and so on. So long as there are enough free pages in the system to grant this request (nr_free_pages min_free_pages) the allocation code will search the free_area for a block of pages of the size requested. Each element of thefree_area has a map of the allocated and free blocks of pages for that sized block. For example, element 2 of the array has a memory map that describes free and allocated blocks each of 4 pages long. The allocation algorithm first searches for blocks of pages of the size requested. It follows the chain of free pages that is queued on the list element of the free_area data structure. If no blocks of pages of the requested size are free, blocks of the next size (which is twice that of the size requested) are looked for. This process continues until all of the free_area has been searched or until a block of pages has been found. If the block of pages found is larger than that requested it must be broken down until there is a block of the right size. Because the

blocks are each a power of 2 pages big then this breaking down process is easy as you simply break the blocks in half. The free blocks are queued on the appropriate queue and the allocated block of pages is returned to the caller.

Figure 3.4: The free_area data structure For example, in Figure 3.4 if a block of 2 pages was requested, the first block of 4 pages (starting at page frame number 4) would be broken into two 2 page blocks. The first, starting at page frame number 4 would be returned to the caller as the allocated pages and the second block, starting at page frame number 6 would be queued as a free block of 2 pages onto element 1 of the free_area array. 3.4.2 Page Deallocation Allocating blocks of pages tends to fragment memory with larger blocks of free pages being broken down into smaller ones. The page deallocation code recombines pages into larger blocks of free pages whenever it can. In fact the page block size is important as it allows for easy combination of blocks into larger blocks. Whenever a block of pages is freed, the adjacent or buddy block of the same size is checked to see if it is free. If it is, then it is combined with the newly freed block of pages to form a new free block of pages for the next size block of pages. Each time two blocks of pages are recombined into a bigger block of free pages the page deallocation code attempts to recombine that block into a yet larger one. In this way the blocks of free pages are as large as memory usage will allow.

For example, in Figure 3.4, if page frame number 1 were to be freed, then that would be combined with the already free page frame number 0 and queued onto element 1 of the free_area as a free block of size 2 pages. 3.5 Memory Mapping When an image is executed, the contents of the executable image must be brought into the processes virtual address space. The same is also true of any shared libraries that the executable image has been linked to use. The executable file is not actually brought into physical memory, instead it is merely linked into the processes virtual memory. Then, as the parts of the program are referenced by the running application, the image is brought into memory from the executable image. This linking of an image into a processes virtual address space is known as memory mapping.

Figure 3.5: Areas of Virtual Memory Every processes virtual memory is represented by an mm_struct data structure. This contains information about the image that it is currently executing (for example bash) and also has pointers to a number of vm_area_struct data structures. Each vm_area_struct data structure describes the start and end of the area of virtual memory, the processes access rights to that memory and a set of operations for that memory. These operations are a set of routines that Linux must use when manipulating this area of virtual memory. For example, one of the virtual memory operations performs the correct actions when the process has attempted to access this virtual memory but finds (via a page fault) that the memory is not actually in physical memory. This operation is the nopage operation. The nopage operation is used when Linux demand pages the pages of an executable image into memory.

When an executable image is mapped into a processes virtual address a set of vm_area_struct data structures is generated. Each vm_area_struct data structure represents a part of the executable image; the executable code, initialized data (variables), unitialized data and so on. Linux supports a number of standard virtual memory operations and as the vm_area_struct data structures are created, the correct set of virtual memory operations are associated with them. 3.6 Demand Paging Once an executable image has been memory mapped into a processes virtual memory it can start to execute. As only the very start of the image is physically pulled into memory it will soon access an area of virtual memory that is not yet in physical memory. When a process accesses a virtual address that does not have a valid page table entry, the processor will report a page fault to Linux. The page fault describes the virtual address where the page fault occurred and the type of memory access that caused. Linux must find the vm_area_struct that represents the area of memory that the page fault occurred in. As searching through the vm_area_struct data structures is critical to the efficient handling of page faults, these are linked together in an AVL (Adelson-Velskii and Landis) tree structure. If there is no vm_area_struct data structure for this faulting virtual address, this process has accessed an illegal virtual address. Linux will signal the process, sending a SIGSEGV signal, and if the process does not have a handler for that signal it will be terminated. Linux next checks the type of page fault that occurred against the types of accesses allowed for this area of virtual memory. If the process is accessing the memory in an illegal way, say writing to an area that it is only allowed to read from, it is also signalled with a memory error. Now that Linux has determined that the page fault is legal, it must deal with it. Linux must differentiate between pages that are in the swap file and those that are part of an executable image on a disk somewhere. It does this by using the page table entry for this faulting virtual address. If the page's page table entry is invalid but not empty, the page fault is for a page currently being held in the swap file. For Alpha AXP page table entries, these are entries which do not have their valid bit set but which have a non-zero value in their PFN field. In this case the PFN field holds information about where in the swap (and which swap file) the page is being held. How pages in the swap file are handled is described later in this chapter. Not all vm_area_struct data structures have a set of virtual memory operations and even those that do may not have a nopage operation. This is because by default Linux will fix up the access by allocating a new physical page and creating a valid page table entry for it. If there is a nopage operation for this area of virtual memory, Linux will use it. The generic Linux nopage operation is used for memory mapped executable images and it uses the page cache to bring the required image page into physical memory.

However the required page is brought into physical memory, the processes page tables are updated. It may be necessary for hardware specific actions to update those entries, particularly if the processor uses translation look aside buffers. Now that the page fault has been handled it can be dismissed and the process is restarted at the instruction that made the faulting virtual memory access. 3.7 The Linux Page Cache

Figure 3.6: The Linux Page Cache The role of the Linux page cache is to speed up access to files on disk. Memory mapped files are read a page at a time and these pages are stored in the page cache. Figure 3.6 shows that the page cache consists of the page_hash_table, a vector of pointers to mem_map_t data structures. Each file in Linux is identified by a VFS inode data structure (described in Chapter filesystem-chapter) and each VFS inode is unique and fully describes one and only one file. The index into the page table is derived from the file's VFS inode and the offset into the file. Whenever a page is read from a memory mapped file, for example when it needs to be brought back into memory during demand paging, the page is read through the page cache. If the page is present in the cache, a pointer to the mem_map_t data structure representing it is returned to the page fault handling code. Otherwise the page must be brought into memory from the file system that holds the image. Linux allocates a physical page and reads the page from the file on disk. If it is possible, Linux will initiate a read of the next page in the file. This single page read ahead means that if the process is accessing the pages in the file serially, the next page will be waiting in memory for the process.

Over time the page cache grows as images are read and executed. Pages will be removed from the cache as they are no longer needed, say as an image is no longer being used by any process. As Linux uses memory it can start to run low on physical pages. In this case Linux will reduce the size of the page cache. 3.8 Swapping Out and Discarding Pages When physical memory becomes scarce the Linux memory management subsystem must attempt to free physical pages. This task falls to the kernel swap daemon (kswapd). The kernel swap daemon is a special type of process, a kernel thread. Kernel threads are processes have no virtual memory, instead they run in kernel mode in the physical address space. The kernel swap daemon is slightly misnamed in that it does more than merely swap pages out to the system's swap files. Its role is make sure that there are enough free pages in the system to keep the memory management system operating efficiently. The Kernel swap daemon (kswapd) is started by the kernel init process at startup time and sits waiting for the kernel swap timer to periodically expire. Every time the timer expires, the swap daemon looks to see if the number of free pages in the system is getting too low. It uses two variables, free_pages_high and free_pages_low to decide if it should free some pages. So long as the number of free pages in the system remains above free_pages_high, the kernel swap daemon does nothing; it sleeps again until its timer next expires. For the purposes of this check the kernel swap daemon takes into account the number of pages currently being written out to the swap file. It keeps a count of these in nr_async_pages; this is incremented each time a page is queued waiting to be written out to the swap file and decremented when the write to the swap device has completed. free_pages_low and free_pages_high are set at system startup time and are related to the number of physical pages in the system. If the number of free pages in the system has fallen below free_pages_high or worse still free_pages_low, the kernel swap daemon will try three ways to reduce the number of physical pages being used by the system: Reducing the size of the buffer and page caches, Swapping out System V shared memory pages, Swapping out and discarding pages. If the number of free pages in the system has fallen below free_pages_low, the kernel swap daemon will try to free 6 pages before it next runs. Otherwise it will try to free 3 pages. Each of the above methods are tried in turn until enough pages have been freed. The kernel swap daemon remembers which method it was using the last time that it attempted to free physical pages. Each time it runs it will start trying to free pages using this last successful method. After it has free sufficient pages, the swap daemon sleeps again until its timer expires. If the reason that the kernel swap daemon freed pages was that the number of free pages in the system had fallen belowfree_pages_low, it only sleeps for half its usual time. Once the number of free pages is more than free_pages_low the kernel swap daemon goes back to sleeping longer between checks. 3.8.1 Reducing the Size of the Page and Buffer Caches

The pages held in the page and buffer caches are good candidates for being freed into the free_area vector. The Page Cache, which contains pages of memory mapped files, may contain unneccessary pages that are filling up the system's memory. Likewise the Buffer Cache, which contains buffers read from or being written to physical devices, may also contain unneeded buffers. When the physical pages in the system start to run out, discarding pages from these caches is relatively easy as it requires no writing to physical devices (unlike swapping pages out of memory). Discarding these pages does not have too many harmful side effects other than making access to physical devices and memory mapped files slower. However, if the discarding of pages from these caches is done fairly, all processes will suffer equally. Every time the Kernel swap daemon tries to shrink these caches it examines a block of pages in the mem_map page vector to see if any can be discarded from physical memory. The size of the block of pages examined is higher if the kernel swap daemon is intensively swapping; that is if the number of free pages in the system has fallen dangerously low. The blocks of pages are examined in a cyclical manner; a different block of pages is examined each time an attempt is made to shrink the memory map. This is known as the clock algorithm as, rather like the minute hand of a clock, the whole mem_map page vector is examined a few pages at a time. Each page being examined is checked to see if it is cached in either the page cache or the buffer cache. You should note that shared pages are not considered for discarding at this time and that a page cannot be in both caches at the same time. If the page is not in either cache then the next page in the mem_map page vector is examined. Pages are cached in the buffer cache (or rather the buffers within the pages are cached) to make buffer allocation and deallocation more efficient. The memory map shrinking code tries to free the buffers that are contained within the page being examined. If all the buffers are freed, then the pages that contain them are also be freed. If the examined page is in the Linux page cache, it is removed from the page cache and freed. When enough pages have been freed on this attempt then the kernel swap daemon will wait until the next time it is periodically woken. As none of the freed pages were part of any process's virtual memory (they were cached pages), then no page tables need updating. If there were not enough cached pages discarded then the swap daemon will try to swap out some shared pages. 3.8.2 Swapping Out System V Shared Memory Pages System V shared memory is an inter-process communication mechanism which allows two or more processes to share virtual memory in order to pass information amongst themselves. How processes share memory in this way is described in more detail in Chapter IPC-chapter. For now it is enough to say that each area of System V shared memory is described by a shmid_ds data structure. This contains a pointer to a list of vm_area_struct data structures, one for each process sharing this area of virtual memory. The vm_area_struct data structures describe where in each processes virtual memory this area of System V shared memory goes. Each vm_area_struct data structure for this System V shared memory is linked together using the vm_next_shared and vm_prev_shared pointers. Eachshmid_ds data structure also contains

a list of page table entries each of which describes the physical page that a shared virtual page maps to. The kernel swap daemon also uses a clock algorithm when swapping out System V shared memory pages. . Each time it runs it remembers which page of which shared virtual memory area it last swapped out. It does this by keeping two indices, the first is an index into the set of shmid_ds data structures, the second into the list of page table entries for this area of System V shared memory. This makes sure that it fairly victimizes the areas of System V shared memory. As the physical page frame number for a given virtual page of System V shared memory is contained in the page tables of all of the processes sharing this area of virtual memory, the kernel swap daemon must modify all of these page tables to show that the page is no longer in memory but is now held in the swap file. For each shared page it is swapping out, the kernel swap daemon finds the page table entry in each of the sharing processes page tables (by following a pointer from each vm_area_struct data structure). If this processes page table entry for this page of System V shared memory is valid, it converts it into an invalid but swapped out page table entry and reduces this (shared) page's count of users by one. The format of a swapped out System V shared page table entry contains an index into the set of shmid_dsdata structures and an index into the page table entries for this area of System V shared memory. If the page's count is zero after the page tables of the sharing processes have all been modified, the shared page can be written out to the swap file. The page table entry in the list pointed at by the shmid_dsdata structure for this area of System V shared memory is replaced by a swapped out page table entry. A swapped out page table entry is invalid but contains an index into the set of open swap files and the offset in that file where the swapped out page can be found. This information will be used when the page has to be brought back into physical memory. 3.8.3 Swapping Out and Discarding Pages The swap daemon looks at each process in the system in turn to see if it is a good candidate for swapping. Good candidates are processes that can be swapped (some cannot) and that have one or more pages which can be swapped or discarded from memory. Pages are swapped out of physical memory into the system's swap files only if the data in them cannot be retrieved another way. A lot of the contents of an executable image come from the image's file and can easily be reread from that file. For example, the executable instructions of an image will never be modified by the image and so will never be written to the swap file. These pages can simply be discarded; when they are again referenced by the process, they will be brought back into memory from the executable image. Once the process to swap has been located, the swap daemon looks through all of its virtual memory regions looking for areas which are not shared or locked.

Linux does not swap out all of the swappable pages of the process that it has selected; instead it removes only a small number of pages. Pages cannot be swapped or discarded if they are locked in memory. The Linux swap algorithm uses page aging. Each page has a counter (held in the mem_map_t data structure) that gives the Kernel swap daemon some idea whether or not a page is worth swapping. Pages age when they are unused and rejuvinate on access; the swap daemon only swaps out old pages. The default action when a page is first allocated, is to give it an initial age of 3. Each time it is touched, it's age is increased by 3 to a maximum of 20. Every time the Kernel swap daemon runs it ages pages, decrementing their age by 1. These default actions can be changed and for this reason they (and other swap related information) are stored in the swap_control data structure. If the page is old (age = 0), the swap daemon will process it further. Dirty pages are pages which can be swapped out. Linux uses an architecture specific bit in the PTE to describe pages this way (see Figure 3.2). However, not all dirty pages are necessarily written to the swap file. Every virtual memory region of a process may have its own swap operation (pointed at by the vm_ops pointer in thevm_area_struct) and that method is used. Otherwise, the swap daemon will allocate a page in the swap file and write the page out to that device. The page's page table entry is replaced by one which is marked as invalid but which contains information about where the page is in the swap file. This is an offset into the swap file where the page is held and an indication of which swap file is being used. Whatever the swap method used, the original physical page is made free by putting it back into the free_area. Clean (or rather not dirty) pages can be discarded and put back into the free_area for re-use. If enough of the swappable processes pages have been swapped out or discarded, the swap daemon will again sleep. The next time it wakes it will consider the next process in the system. In this way, the swap daemon nibbles away at each processes physical pages until the system is again in balance. This is much fairer than swapping out whole processes. 3.9 The Swap Cache When swapping pages out to the swap files, Linux avoids writing pages if it does not have to. There are times when a page is both in a swap file and in physical memory. This happens when a page that was swapped out of memory was then brought back into memory when it was again accessed by a process. So long as the page in memory is not written to, the copy in the swap file remains valid. Linux uses the swap cache to track these pages. The swap cache is a list of page table entries, one per physical page in the system. This is a page table entry for a swapped out page and describes which swap file the page is being held in together with its location in the swap file. If a swap cache entry is non-zero, it represents a page which is being held in a swap file that has not been modified. If the page is subsequently modified (by being written to), its entry is removed from the swap cache. When Linux needs to swap a physical page out to a swap file it consults the swap cache and, if there is a valid entry for this page, it does not need to write the page out to the swap file.

This is because the page in memory has not been modified since it was last read from the swap file. The entries in the swap cache are page table entries for swapped out pages. They are marked as invalid but contain information which allow Linux to find the right swap file and the right page within that swap file. 3.10 Swapping Pages In The dirty pages saved in the swap files may be needed again, for example when an application writes to an area of virtual memory whose contents are held in a swapped out physical page. Accessing a page of virtual memory that is not held in physical memory causes a page fault to occur. The page fault is the processor signalling the operating system that it cannot translate a virtual address into a physical one. In this case this is because the page table entry describing this page of virtual memory was marked as invalid when the page was swapped out. The processor cannot handle the virtual to physical address translation and so hands control back to the operating system describing as it does so the virtual address that faulted and the reason for the fault. The format of this information and how the processor passes control to the operating system is processor specific. The processor specific page fault handling code must locate the vm_area_struct data structure that describes the area of virtual memory that contains the faulting virtual address. It does this by searching thevm_area_struct data structures for this process until it finds the one containing the faulting virtual address. This is very time critical code and a processes vm_area_struct data structures are so arranged as to make this search take as little time as possible. Having carried out the appropriate processor specific actions and found that the faulting virtual address is for a valid area of virtual memory, the page fault processing becomes generic and applicable to all processors that Linux runs on. The generic page fault handling code looks for the page table entry for the faulting virtual address. If the page table entry it finds is for a swapped out page, Linux must swap the page back into physical memory. The format of the page table entry for a swapped out page is processor specific but all processors mark these pages as invalid and put the information neccessary to locate the page within the swap file into the page table entry. Linux needs this information in order to bring the page back into physical memory. At this point, Linux knows the faulting virtual address and has a page table entry containing information about where this page has been swapped to. The vm_area_struct data structure may contain a pointer to a routine which will swap any page of the area of virtual memory that it describes back into physical memory. This is its swapin operation. If there is a swapin operation for this area of virtual memory then Linux will use it. This is, in fact, how swapped out System V shared memory pages are handled as it requires special handling because the format of a swapped out System V shared page is a little different from that of an ordinairy swapped out page. There may not be a swapin operation, in which case Linux will assume that this is an ordinairy page that does not need to be specially handled.

It allocates a free physical page and reads the swapped out page back from the swap file. Information telling it where in the swap file (and which swap file) is taken from the the invalid page table entry. If the access that caused the page fault was not a write access then the page is left in the swap cache and its page table entry is not marked as writable. If the page is subsequently written to, another page fault will occur and, at that point, the page is marked as dirty and its entry is removed from the swap cache. If the page is not written to and it needs to be swapped out again, Linux can avoid the write of the page to its swap file because the page is already in the swap file. If the access that caused the page to be brought in from the swap file was a write operation, this page is removed from the swap cache and its page table entry is marked as both dirty and writable. Memory-Mapped I/O: Memory-Mapped I/O is a mechanism by which the processor performs I/O access by using memory access techniques. This is often put into effect because the memory bus is frequently much faster then the I/O bus. Another reason that memory mapped I/O might be used is that the architecture in use does not have a separate I/O bus. In memory mapped IO, certain range of CPU's address space is kept aside for the external peripherals. These locations can be accessed using the same instructions as used for other memory accesses. But instead, the read/writes to these addresses are interpreted as access to device rather than a location on the main memory. A CPU may expect a particular device at a fixed location or can dynamically assign a space for it. The way this works is that memory interfaces are often designed as a bus (a shared communications resource), where many devices are attached. These devices are usually arranged as master and slave devices, where a master device can send and receive data from any of the slave devices. A typical system would have: A CPU as the master One or more RAM and/or ROM devices for program code and data storage Peripheral devices for interfacing with the outside world. Examples of these might be a UART (serial communications), Display device or Input device MEMORY ALLOCATION (AUTOMATIC, STATIC & DYNAMIC) The C programming language manages memory statically, automatically, or dynamically. Static-duration variables are allocated in main (fixed) memory and persist for the lifetime of the program; automatic-duration variables are allocated on the stack and come and go as functions are called and return. For static-duration and, before C99 (which allows variablelength automatic arrays), automatic-duration variables, the size of the allocation is required to be compile-time constant. If the required size is not known until run-time (for example, if data of arbitrary size is being read from the user or from a disk file), then using fixed-size data objects is inadequate. The lifetime of allocated memory is also a concern. Neither static- nor automatic-duration memory is adequate for all situations. Automatic-allocated data cannot persist across multiple function calls, while static data persists for the life of the program whether it is needed or not. In many situations the programmer requires greater flexibility in managing the lifetime of allocated memory.

These limitations are avoided by using dynamic memory allocation in which memory is more explicitly (but more flexibly) managed, typically, by allocating it from the heap, an area of memory structured for this purpose. In C, the library function malloc is used to allocate a block of memory on the heap. The program accesses this block of memory via a pointer that malloc returns. When the memory is no longer needed, the pointer is passed to free which deallocates the memory so that it can be used for other purposes. Some platforms provide library calls which allow run-time dynamic allocation from the C stack rather than the heap (e.g. Unix alloca(), Microsoft Windows CRTL's malloca()). This memory is automatically freed when the calling function ends. The need for this is lessened by changes in the C99 standard, which added support for variable-length arrays of block scope having sizes determined at runtime. Static memory allocation: Definition Static memory allocation refers to the process of allocating memory at compile-time before the associated program is executed, unlike dynamic memory allocation or automatic memory allocation where memory is allocated as required at run-time. An application of this technique involves a program module (e.g. function or subroutine) declaring static data locally, such that these data are inaccessible in other modules unless references to it are passed as parameters or returned. A single copy of static data is retained and accessible through many calls to the function in which it is declared. Static memory allocation therefore has the advantage of modularising data within a program design in the situation where these data must be retained through the runtime of the program. The use of static variables within a class in object oriented programming enables a single copy of such data to be shared between all the objects of that class. Object constants known at compile-time, like string literals, are usually allocated statically. In object-oriented programming, the virtual method tables of classes are usually allocated statically. A statically defined value can also be global in its scope ensuring the same immutable value is used throughout a run for consistency. Automatic variable allocation (Memory stack): Definition In computer programming, an automatic variable is a lexically-scoped variable which is allocated and de-allocated automatically when program flow enters and leaves the variable's scope. The term local variable is usually synonymous with automatic variable, since these are the same thing in many programming languages. Automatic variables may be allocated in the stack frame of the procedure in which they are declared; this has the useful effect of allowing recursion and re-entrancy. (For efficiency, the optimizer will try to allocate some of these variables in processor registers.) In specific programming languages (C/C++) all variables declared within a block of code are automatic by default, but this can be made explicit with the auto keyword.[1] An uninitialized automatic variable has an undefined value until it is assigned a valid value of its type.[2] Using the storage class register instead of auto is a hint to the compiler to cache the variable in a processor register. Other than not allowing the referencing operator (&) to be used on the variable or any of its subcomponents, the compiler is free to ignore the hint. In C++ the constructor of local objects called when the execution reaches the place of declaration and the destructor is called when it reaches the end of the given program block (program blocks are surrounded by curly brackets). This feature is often used to manage resource allocation and deallocation, like opening and then automatically closing files or freeing up memory. Dynamic memory allocation:

Definition Dynamic memory allocation (also known as heap-based memory allocation) is the allocation of memory storage for use in a computer program during the runtime of that program. It can be seen also as a way of distributing ownership of limited memory resources among many pieces of data and code. Dynamically allocated memory exists until it is released either explicitly by the programmer, or by the garbage collector. This is in contrast to static memory allocation, which has a fixed duration. It is said that an object so allocated has a dynamic lifetime. Constraints: The task of fulfilling an allocation request consists of finding a block of unused memory of sufficient size. Problems during fulfilling allocation request o Internal and external fragmentation. o Allocator's metadata can inflate the size of (individually) small allocations; Usually, memory is allocated from a large pool of unused memory area called the heap (also called the free store). Since the precise location of the allocation is not known in advance, the memory is accessed indirectly, usually via a pointer reference. The precise algorithm used to organize the memory area and allocate and de-allocate chunks is hidden behind an abstract interface and may use any of the methods described below. Methods to reduce the constraints 3. Fixed-size-blocks allocation Fixed-size-blocks allocation, also called memory pool allocation, uses a free list of fixed-size blocks of memory (often all of the same size). This works well for simple embedded systems. 4. Buddy blocks In this system, memory is allocated from a large block in memory that is a power of two in size. If the block is more than twice as large as desired, it is broken in two. One of the halves is selected, and the process repeats (checking the size again and splitting if needed) until the block is just large enough. All the blocks of a particular size are kept in a sorted linked list or tree. When a block is freed, it is compared to its buddy. If they are both free, they are combined and placed in the next-largest size buddy-block list. (When a block is allocated, the allocator will start with the smallest sufficiently large block avoiding needlessly breaking blocks) Dynamic memory allocation in C The malloc function is one of the functions in standard C to allocate memory. Its function prototype is void *malloc(size_t size); which allocates size bytes of memory. If the allocation succeeds, a pointer to the block of memory is returned which is guaranteed to be suitable aligned to any type (including struct and such), otherwise a NULL pointer is returned. Memory allocated via malloc is persistent: it will continue to exist until the program terminates or the memory is explicitly deallocated by the programmer (that is, the block is said to be "freed"). This is achieved by use of the free function. Its prototype is void free(void *pointer); which releases the block of memory pointed to by pointer. pointer must have been previously returned by malloc, calloc, or realloc and must only be passed to free once. It is safe to call free on a NULL pointer, which has no effect.

Example: The standard method of creating an array of 10 int objects: int array[10]; However, if one wishes to allocate a similar array dynamically, the following code could be used: /* Allocate space for an array with ten elements of type int. */ int *ptr = malloc(10 * sizeof (int)); if (ptr == NULL) { /* Memory could not be allocated, the program should handle the error here as appropriate. */ } else { /* Allocation succeeded. Do something. */ free(ptr); /* We are done with the int objects, and free the associated pointer. */ ptr = NULL; /* The pointer must not be used again, unless re-assigned by using malloc again. */ } malloc returns a null pointer to indicate that no memory is available, or that some other error occurred which prevented memory being allocated. Reentrant Function / Reentrancy: Virtually every embedded system uses interrupts; many support multitasking or multithreaded operations. These sorts of applications can expect the program's control flow to change contexts at just about any time. When that interrupt comes, the current operation gets put on hold and another function or task starts running. What happens if functions and tasks share variables? Disaster surely looms if one routine corrupts the other's data. By carefully controlling how data is shared, we create "reentrant" functions, those that allow multiple concurrent invocations that do not interfere with each other. The word "pure" is sometimes used interchangeably with "reentrant". Like so many embedded concepts, reentrancy came from the mainframe era, in the days when memory was a valuable commodity. In those days compilers and other programs were often written to be reentrant, so a single copy of the tool lived in memory, yet was shared by perhaps a hundred users. Each person had his or her own data area, yet everyone running the compiler quite literally executed identical code. As the operating system changed contexts from user to user it swapped data areas so one person's work didn't effect any other. Share the code, but not the data. In the embedded world a routine must satisfy the following conditions to be reentrant: 2. It uses all shared variables in an atomic way, unless each is allocated to a specific instance of the function. 3. It does not call non-reentrant functions. 4. It does not use the hardware in a non-atomic way. Quite a mouthful! Let's look at each of these in more detail.

Atomic Variables Both the first and last rules use the word "atomic", which comes from the Greek word meaning "indivisible". In the computer world "atomic" means an operation that cannot be interrupted. Consider the assembly language instruction: mov ax,bx Since nothing short of a reset can stop or interrupt this instruction it's atomic. It will start and complete without any interference from other tasks or interrupts The first part of rule 1 requires the atomic use of shared variables. Suppose two functions each share the global variable "foobar". Function A contains: temp=foobar; temp+=1; foobar=temp; This code is not reentrant, because foobar is used non-atomically. That is, it takes three statements to change its value, not one. The foobar handling is not indivisible; an interrupt can come between these statements, switch context to the other function, which then may also try and change foobar. Clearly there's a conflict; foobar will wind up with an incorrect value, the autopilot will crash and hundreds of screaming people will wonder "why didn't they teach those developers about reentrancy?" Suppose, instead, function A looks like: foobar+=1; Now the operation is atomic; an interrupt will not suspend processing with foobar in a partially-changed state, so the routine is reentrant. Except! do you really know what your C compiler generates? On an x86 processor the code might look like: mov ax,[foobar] inc ax mov [foobar],ax which is clearly not atomic, and so not reentrant. The atomic version is: inc [foobar] The moral is to be wary of the compiler; assume it generates atomic code and you may find 60 Minutes knocking at your door. The second part of the first reentrancy rule reads "!unless each is allocated to a specific instance of the function.". This is an exception to the atomic rule that skirts the issue of shared variables. An "instance" is a path through the code. There's no reason a single function can't be called from many other places. In a multitasking environment it's quite possible that several copies of the function may indeed be executing concurrently. (Suppose the routine is a driver that retrieves data from a queue; many different parts of the code may want queued data more or less simultaneously). Each execution path is an "instance" of the code. Consider: int foo; void some_function(void){ foo++; } foo is a global variable whose scope exists beyond that of the function. Even if no other routine uses foo, some_function can trash the variable if more than one instance if it runs at any time.

C and C++ can save us from this peril. Use automatic variables. That is, declare foo inside of the function. Then, each instance of the routine will use a new version of foo created from the stack, as follows: void some_function(void){ int foo; foo++; } Another option is to dynamically assign memory (using malloc), again so each incarnation uses a unique data area. The fundamental reentrancy problem is thus avoided, as it's impossible for multiple instances to stamp on a common version of the variable. Two More Rules The rest of the rules are very simple. Rule 2 tells us a calling function inherits the reentrancy problems of the callee. That makes sense; if other code inside the function trashes shared variables, the system is going to crash. Using a compiled language, though, there's an insidious problem. Are you sure - really sure that the runtime package is reentrant? Obviously string operations and a lot of other complicated things use runtime calls to do the real work. An awful lot of compilers also generate runtime calls to do, for instance, long math, or even integer multiplications and divisions. If a function must be reentrant, talk to the compiler vendor to insure that the entire runtime package is pure. If you buy software packages (like a protocol stack) that may be called from several places, take similar precautions to insure the purchased routines are also reentrant. Rule 3 is a uniquely embedded caveat. Hardware looks a lot like a variable; if it takes more than a single I/O operation to handle a device, reentrancy problems can develop. Consider Zilog's SCC serial controller. Accessing any of the device's internal registers requires two steps: first write the register's address to a port, then read or write the register from the same port, the same I/O address. If an interrupt comes between setting the port and accessing the register another function might take over and access the device. When control returns to the first function the register address you set will be incorrect. Keeping Code Reentrant What are our best options for eliminating non-reentrant code? The first rule of thumb is to avoid shared variables. Globals are the source of no end of debugging woes and failed code. Use automatic variables or dynamically allocated memory. Yet globals are also the fastest way to pass data around. It's not entirely possible to eliminate them from real time systems. So, when using a shared resource (variable or hardware) we must take a different sort of action. The most common approach is to disable interrupts during non-reentrant code. With interrupts off the system suddenly becomes a single-process environment. There will be no context switches. Disable interrupts, do the non-reentrant work, and then turn interrupts back on. Shutting interrupts down does increase system latency, reducing its ability to respond to external events in a timely manner. A kinder, gentler approach is to use a semaphore to indicate when a resource is busy. Semaphores are simple on-off state indicators whose processing is inherently atomic, often used as "in-use" flags to have routines idle when a shared resource is not available. Nearly every commercial real time operating system includes semaphores; if this is your way of achieving reentrant code, by all means use an RTOS.

Device Drivers (Accessing the shared memory device driver) A device driver is the set of kernel routines that makes a hardware device respond to the programming interface defined by the canonical set of VFS functions that control a device. The actual implementation of all these functions is delegated to the device driver. Because each device has a different I/O controller, and thus different commands and different state information, most I/O devices have their own drivers. There are many types of device drivers. They mainly differ in the level of support that they offer to the User Mode applications, as well as in their buffering strategies for the data collected from the hardware devices. Because these choices greatly influence the internal structure of a device driver, we discuss them in the sections "Direct Memory Access (DMA)" and "Buffering Strategies for Character Devices." A device driver does not consist only of the functions that implement the device file operations. Before using a device driver, several activities must have taken place. We'll examine them in the following sections Device Driver Registration We know that each system call issued on a device file is translated by the kernel into an invocation of a suitable function of a corresponding device driver. To achieve this, a device driver must register itself. In other words, registering a device driver means allocating a new device_driver descriptor, inserting it in the data structures of the device driver model and linking it to the corresponding device file(s). Accesses to device files whose corresponding drivers have not been previously registered return the error code. If a device driver is statically compiled in the kernel, its registration is performed during the kernel initialization phase. Conversely, if a device driver is compiled as a kernel module its registration is performed when the module is loaded. In the latter case, the device driver can also unregister itself when the module is unloaded. Let us consider, for instance, a generic PCI device. To properly handle it, its device driver must allocate a descriptor of type pci_driver, which is used by the PCI kernel layer to handle the device. After having initialized some fields of this descriptor, the device driver invokes the pci_register_driver( ) function. Actually, the pci_driver descriptor includes an embedded device_driver descriptor the pci_register_function( ) simply initializes the fields of the embedded driver descriptor and invokes driver_register( ) to insert the driver in the data structures of the device driver model. When a device driver is being registered, the kernel looks for unsupported hardware devices that could be possibly handled by the driver. To do this, it relies on the match method of the relevant bus_type bus type descriptor, and on the probe method of the device_driver object. If a hardware device that can be handled by the driver is discovered, the kernel allocates a device object and invokes device_register( ) to insert the device in the device driver model. Device Driver Initialization Registering a device driver and initializing it are two different things. A device driver is registered as soon as possible, so User Mode applications can use it through the corresponding device files. In contrast, a device driver is initialized at the last possible moment. In fact, initializing a driver means allocating precious resources of the system, which are therefore not available to other drivers. The assignment of IRQs to devices is usually made dynamically, right before using them, because several devices may share the same IRQ line. Other resources that can be allocated at the last possible moment are page frames for DMA transfer buffers and the DMA channel itself (for old non-PCI devices such as the floppy disk driver).

To make sure the resources are obtained when needed but are not requested in a redundant manner when they have already been granted, device drivers usually adopt the following schema: A usage counter keeps track of the number of processes that are currently accessing the device file. The counter is increased in the open method of the device file and decreased in the release method. The open method checks the value of the usage counter before the increment. If the counter is zero, the device driver must allocate the resources and enable interrupts and DMA on the hardware device. The release method checks the value of the usage counter after the decrement. If the counter is zero, no more processes are using the hardware device. If so, the method disables interrupts and DMA on the I/O controller, and then releases the allocated resources. Monitoring I/O Operations The duration of an I/O operation is often unpredictable. It can depend on mechanical considerations (the current position of a disk head with respect to the block to be transferred), on truly random events (when a data packet arrives on the network card), or on human factors (when a user presses a key on the keyboard or when she notices that a paper jam occurred in the printer). In any case, the device driver that started an I/O operation must rely on a monitoring technique that signals either the termination of the I/O operation or a time-out. In the case of a terminated operation, the device driver reads the status register of the I/O interface to determine whether the I/O operation was carried out successfully. In the case of a time-out, the driver knows that something went wrong, because the maximum time interval allowed to complete the operation elapsed and nothing happened. The two techniques available to monitor the end of an I/O operation are called the polling mode and the interrupt mode. Polling mode According to this technique, the CPU checks (polls) the device's status register repeatedly until its value signals that the I/O operation has been completed. We have already encountered a technique based on polling when a processor tries to acquire a busy spin lock, it repeatedly polls the variable until its value becomes 0. However, polling applied to I/O operations is usually more elaborate, because the driver must also remember to check for possible time-outs. A simple example of polling looks like the following: for (;;) { if (read_status(device) & DEVICE_END_OPERATION) break; if (--count == 0) break; } The count variable, which was initialized before entering the loop, is decreased at each iteration, and thus can be used to implement a rough time-out mechanism. Alternatively, a more precise time-out mechanism could be implemented by reading the value of the tick counter jiffies at each iteration and comparing it with the old value read before starting the wait loop. If the time required to complete the I/O operation is relatively high, say in the order of milliseconds, this schema becomes inefficient because the CPU wastes precious machine cycles while waiting for the I/O operation to complete. In such cases, it is preferable to voluntarily relinquish the CPU after each polling operation by inserting an invocation of the schedule( ) function inside the loop.

Interrupt mode Interrupt mode can be used only if the I/O controller is capable of signaling, via an IRQ line, the end of an I/O operation. We'll show how interrupt mode works on a simple case. Let's suppose we want to implement a driver for a simple input character device. When the user issues a read( ) system call on the corresponding device file, an input command is sent to the device's control register. After an unpredictably long time interval, the device puts a single byte of data in its input register. The device driver then returns this byte as the result of the read( ) system call. This is a typical case in which it is preferable to implement the driver using the interrupt mode. Essentially, the driver includes two functions: 1. The foo_read( ) function that implements the read method of the file object. 2. The foo_interrupt( ) function that handles the interrupt. The foo_read( ) function is triggered whenever the user reads the device file: ssize_t foo_read(struct file *filp, char *buf, size_t count, loff_t *ppos) { foo_dev_t * foo_dev = filp->private_data; if (down_interruptible(&foo_dev->sem) return -ERESTARTSYS; foo_dev->intr = 0; outb(DEV_FOO_READ, DEV_FOO_CONTROL_PORT); wait_event_interruptible(foo_dev->wait,(foo_dev->intr= =1)); if (put_user(foo_dev->data, buf)) return -EFAULT; up(&foo_dev->sem); return 1; } The device driver relies on a custom descriptor of type foo_dev_t; it includes a semaphore sem that protects the hardware device from concurrent accesses, a wait queue wait, a flag intr that is set when the device issues an interrupt, and a single-byte buffer data that is written by the interrupt handler and read by the read method. In general, all I/O drivers that use interrupts rely on data structures accessed by both the interrupt handler and the read and write methods. The address of the foo_dev_t descriptor is usually stored in the private_data field of the device file's file object or in a global variable. The main operations of the foo_read( ) function are the following: 1. Acquires the foo_dev->sem semaphore, thus ensuring that no other process is accessing the device. 2. Clears the intr flag. 3. Issues the read command to the I/O device. 4. Executes wait_event_interruptible to suspend the process until the intr flag becomes 1. After some time, our device issues an interrupt to signal that the I/O operation is completed and that the data is ready in the proper DEV_FOO_DATA_PORT data port. The interrupt handler sets the intr flag and wakes the process. When the scheduler decides to reexecute the process, the second part of foo_read( ) is executed and does the following: 1. Copies the character ready in the foo_dev->data variable into the user address space. 2. Terminates after releasing the foo_dev->sem semaphore.

For simplicity, we didn't include any time-out control. In general, time-out control is implemented through static or dynamic timers the timer must be set to the right time before starting the I/O operation and removed when the operation terminates. Let's now look at the code of the foo_interrupt( ) function: irqreturn_t foo_interrupt(int irq, void *dev_id, struct pt_regs *regs) { foo->data = inb(DEV_FOO_DATA_PORT); foo->intr = 1; wake_up_interruptible(&foo->wait); return 1; } The interrupt handler reads the character from the input register of the device and stores it in the data field of the foo_dev_t descriptor of the device driver pointed to by the foo global variable. It then sets the intr flag and invokes wake_up_interruptible( ) to wake the process blocked in the foo->wait wait queue.

Accessing the I/O Shared Memory with Device Driver Depending on the device and on the bus type, I/O shared memory in the PC's architecture may be mapped within different physical address ranges. Typically: For most devices connected to the ISA bus The I/O shared memory is usually mapped into the 16-bit physical addresses ranging from 0xa0000 to 0xfffff; this gives rise to the "hole" between 640 KB and 1 MB For devices connected to the PCI bus The I/O shared memory is mapped into 32-bit physical addresses near the 4 GB boundary. This kind of device is much simpler to handle. A few years ago, Intel introduced the Accelerated Graphics Port (AGP) standard, which is an enhancement of PCI for high-performance graphic cards. Besides having its own I/O shared memory, this kind of card is capable of directly addressing portions of the motherboard's RAM by means of a special hardware circuit named Graphics Address Remapping Table (GART ). The GART circuitry enables AGP cards to sustain much higher data transfer rates than older PCI cards. From the kernel's point of view, however, it doesn't really matter where the physical memory is located, and GART-mapped memory is handled like the other kinds of I/O shared memory. How does a device driver access an I/O shared memory location? Let's start with the PC's architecture, which is relatively simple to handle, and then extend the discussion to other architectures. Remember that kernel programs act on linear addresses, so the I/O shared memory locations must be expressed as addresses greater than PAGE_OFFSET. In the following discussion, we assume that PAGE_OFFSET is equal to 0xc0000000 that is, that the kernel linear addresses are in the fourth gigabyte. Device drivers must translate I/O physical addresses of I/O shared memory locations into linear addresses in kernel space. In the PC architecture, this can be achieved simply by ORing the 32-bit physical address with the 0xc0000000 constant. For instance, suppose the kernel needs to store the value in the I/O location at physical address 0x000b0fe4 in t1 and the value in the I/O location at physical address 0xfc000000 in t2. One might think that the following statements could do the job:

t1 = *((unsigned char *)(0xc00b0fe4)); t2 = *((unsigned char *)(0xfc000000)); During the initialization phase, the kernel maps the available RAM's physical addresses into the initial portion of the fourth gigabyte of the linear address space. Therefore, the Paging Unit maps the 0xc00b0fe4 linear address appearing in the first statement back to the original I/O physical address 0x000b0fe4, which falls inside the "ISA hole" between 640 KB and 1 MB There is a problem, however, for the second statement, because the I/O physical address is greater than the last physical address of the system RAM. Therefore, the 0xfc000000 linear address does not correspond to the 0xfc000000 physical address. In such cases, the kernel Page Tables must be modified to include a linear address that maps the I/O physical address. This can be done by invoking the ioremap( ) or ioremap_nocache( ) functions. The first function, which is similar to vmalloc( ), invokes get_vm_area( ) to create a new vm_struct descriptor for a linear address interval that has the size of the required I/O shared memory area. The functions then update the corresponding Page Table entries of the canonical kernel Page Tables appropriately. The ioremap_nocache( ) function differs from ioremap( ) in that it also disables the hardware cache when referencing the remapped linear addresses properly. The correct form for the second statement might therefore look like: io_mem = ioremap(0xfb000000, 0x200000); t2 = *((unsigned char *)(io_mem + 0x100000)); The first statement creates a new 2 MB linear address interval, which maps physical addresses starting from 0xfb000000; the second one reads the memory location that has the 0xfc000000 address. To remove the mapping later, the device driver must use the iounmap( ) function. On some architectures other than the PC, I/O shared memory cannot be accessed by simply dereferencing the linear address pointing to the physical memory location. Therefore, Linux defines the following architecture-dependent functions, which should be used when accessing I/O shared memory: readb( ), readw( ), readl( ) Reads 1, 2, or 4 bytes, respectively, from an I/O shared memory location writeb( ), writew( ), writel( ) Writes 1, 2, or 4 bytes, respectively, into an I/O shared memory location memcpy_fromio( ), memcpy_toio( ) Copies a block of data from an I/O shared memory location to dynamic memory and vice versa memset_io( ) Fills an I/O shared memory area with a fixed value The recommended way to access the 0xfc000000 I/O location is thus: io_mem = ioremap(0xfb000000, 0x200000); t2 = readb(io_mem + 0x100000); Variable Scope Variable declared inside a function is local. Variable declared outside a function is visible anywhere inside that .c file.

Variable declared outside a function can also be visible in other .c files if extern is used. Variables declared with prefix extern must be declared outside a function in another .c file. Function Scope Functions cannot be used without a function prototype. Each .c file must have a function prototype for each function which is used in that .c file. #include can help manage your function prototypes. Productivity Tools: refer : K.V.K.K.Prasad Embedded /Real-Time Systems:Concepts,Design and ProgrammingDream tech,Wiley 2003.

You might also like