You are on page 1of 9

I.

INSTRUCTION SET ARCHITECTURE

The Instruction Set Architecture (ISA) is the part of the processor that is visible to the programmer or compiler writer. The ISA serves as the boundary between software and hardware. We will briefly describe the instruction sets found in many of the microprocessors used today. The ISA of a processor can be described using 5 categories: Operand Storage in the CPU Where are the operands kept other than in memory? Number of explicit named operands How many operands are named in a typical instruction? Operand location Can any ALU instruction operand be located in memory? Or must all operands be kept internally in the CPU? Operations What operations are provided in the ISA.? Type and size of operands What is the type and size of each operand and how is it specified? Of all the above the most distinguishing factor is the first.

The 3 most common types of ISAs are: 1. Stack - The operands are implicitly on top of the stack. 2. Accumulator - One operand is implicitly the accumulator. 3. General Purpose Register (GPR) - All operands are explicitly mentioned, they are either registers or memory locations.

Lets look at the assembly code of A = B + C; in all 3 architectures: Stack PUSH A PUSH B ADD POP C Accumulator LOAD A ADD B STORE C GPR LOAD R1,A ADD R1,B STORE R1,C -

Not all processors can be neatly tagged into one of the above categories. The i8086 has many instructions that use implicit operands although it has a general register set. The i8051 is another example, it has 4 banks of GPRs but most instructions must have the A register as one of its operands. What are the advantages and disadvantages of each of these approaches?

Stack
Advantages: Simple Model of expression evaluation (reverse polish). Short instructions. Disadvantages: A stack can't be randomly accessed. This makes it hard to generate efficient code. The stack itself is accessed every operation and becomes a bottleneck.

Accumulator
Advantages: Short instructions. Disadvantages: The accumulator is only temporary storage so memory traffic is the highest for this approach.

GPR
Advantages: Makes code generation easy. Data can be stored for long periods in registers. Disadvantages: All operands must be named leading to longer instructions. Earlier CPUs were of the first 2 types but in the last 15 years all CPUs made are GPR processors. The 2 major reasons are that registers are faster than memory, the more data that can be kept internally in the CPU the faster the program will run. The other reason is that registers are easier for a compiler to use.

Reduced Instruction Set Computer (RISC)

As we mentioned before most modern CPUs are of the GPR (General Purpose Register) type. A few examples of such CPUs are the IBM 360, DEC VAX, Intel 80x86 and Motorola 68xxx. But while these CPUS were clearly better than previous stack and accumulator based CPUs they were still lacking in several areas: 1. Instructions were of varying length from 1 byte to 6-8 bytes. This causes problems with the pre-fetching and pipelining of instructions. 2. ALU (Arithmetic Logical Unit) instructions could have operands that were memory locations. Because the number of cycles it takes to access memory varies so does the whole instruction. This isn't good for compiler writers, pipelining and multiple issue. 3. Most ALU instruction had only 2 operands where one of the operands is also the destination. This means this operand is destroyed during the operation or it must be saved before somewhere. Thus in the early 80's the idea of RISC was introduced. The SPARC project was started at Berkeley and the MIPS project at Stanford. RISC stands for Reduced Instruction Set Computer. The ISA is composed of instructions that all have exactly the same size, usualy 32 bits. Thus they can be pre-fetched and pipelined succesfuly. All ALU instructions have 3 operands which are only registers. The only memory access is through explicit LOAD/STORE instructions. Thus A = B + C will be assembled as: LOAD R1,A LOAD R2,B ADD R3,R1,R2 STORE C,R3 Although it takes 4 instructions we can reuse the values in the registers. Why is this architecture called RISC? What is Reduced about it? The answer is that to make all instructions the same length the number of bits that are used for the opcode is reduced. Thus less instructions are provided. The instructions that were thrown out are the less important string and BCD (binary-coded decimal) operations. In fact, now that memory access is restricted there aren't several kinds of MOV or ADD instructions. Thus the older architecture is called CISC (Complete Instruction Set Computer). RISC architectures are also called LOAD/STORE architectures. The number of registers in RISC is usualy 32 or more. The first RISC CPU the MIPS 2000 has 32 GPRs as opposed to 16 in the 68xxx architecture and 8 in the 80x86 architecture. The only disadvantage of RISC is its code size. Usualy more instructions are needed and there is a waste in short instructions (POP, PUSH).

II.

LEVELS OF PROGRAMMING LANGUAGE

There is only one programming language that any computer can actually understand and execute: its own native binary machine code. This is the lowest possible level of language in which it is possible to write a computer program. All other languages are said to be high level or low level according to how closely they can be said to resemble machine code. In this context, a low-level language corresponds closely to machine code, so that a single lowlevel language instruction translates to a single machine-language instruction. A high-level language instruction typically translates into a series of machine-language instructions. Low-level languages have the advantage that they can be written to take advantage of any peculiarities in the architecture of the central processing unit (CPU) which is the "brain" of any computer. Thus, a program written in a low-level language can be extremely efficient, making optimum use of both computer memory and processing time. However, to write a lowlevel program takes a substantial amount of time, as well as a clear understanding of the inner workings of the processor itself. Therefore, low-level programming is typically used only for very small programs, or for segments of code that are highly critical and must run as efficiently as possible. High-level languages permit faster development of large programs. The final program as executed by the computer is not as efficient, but the savings in programmer time generally far outweigh the inefficiencies of the finished product. This is because the cost of writing a program is nearly constant for each line of code, regardless of the language. Thus, a high-level language where each line of code translates to 10 machine instructions costs only one tenth as much in program development as a low-level language where each line of code represents only a single machine instruction. In addition to the distinction between high-level and low-level languages, there is a further distinction between compiler languages and interpreter languages. Let's take a look at the various levels. Absolute Machine Code The very lowest possible level at which you can program a computer is in its own native machine code, consisting of strings of 1's and 0's and stored as binary numbers. The main problems with using machine code directly are that it is very easy to make a mistake, and very hard to find it once you realize the mistake has been made.

Assembly Language Assembly language is nothing more than a symbolic representation of machine code, which also allows symbolic designation of memory locations. Thus, an instruction to add the contents of a memory location to an internal CPU register called the accumulator might be add a number instead of a string of binary digits (bits). No matter how close assembly language is to machine code, the computer still cannot understand it. The assembly-language program must be translated into machine code by a separate program called an assembler. The assembler program recognizes the character strings that make up the symbolic names of the various machine operations, and substitutes the required machine code for each instruction. At the same time, it also calculates the required address in memory for each symbolic name of a memory location, and substitutes those addresses for the names. The final result is a machinelanguage program that can run on its own at any time; the assembler and the assemblylanguage program are no longer needed. To help distinguish between the "before" and "after" versions of the program, the original assembly-language program is also known as the source code, while the final machine-language program is designated the object code. If an assembly-language program needs to be changed or corrected, it is necessary to make the changes to the source code and then re-assemble it to create a new object program.

Compiler Language Compiler languages are the high-level equivalent of assembly language. Each instruction in the compiler language can correspond to many machine instructions. Once the program has been written, it is translated to the equivalent machine code by a program called a compiler. Once the program has been compiled, the resulting machine code is saved separately, and can be run on its own at any time. As with assembly-language programs, updating or correcting a compiled program requires that the original (source) program be modified appropriately and then recompiled to form a new machine-language (object) program. Typically, the compiled machine code is less efficient than the code produced when using assembly language. This means that it runs a bit more slowly and uses a bit more memory than the equivalent assembled program. To offset this drawback, however, we also have the fact that it takes much less time to develop a compiler-language program, so it can be ready to go sooner than the assembly-language program.

Interpreter Language An interpreter language, like a compiler language, is considered to be high level. However, it operates in a totally different manner from a compiler language. Rather, the interpreter program resides in memory, and directly executes the high-level program without preliminary translation to machine code. This use of an interpreter program to directly execute the user's program has both advantages and disadvantages. The primary advantage is that you can run the program to test its operation, make a few changes, and run it again directly. There is no need to recompile because no new machine code is ever produced. This can enormously speed up the development and testing process. On the down side, this arrangement requires that both the interpreter and the user's program reside in memory at the same time. In addition, because the interpreter has to scan the user's program one line at a time and execute internal portions of itself in response, execution of an interpreted program is much slower than for a compiled program. III.

You might also like