You are on page 1of 13

Abstract

Modern day coders are the coders of the 3GL and 4GL languages. These languages are efficient but are not machine readable. To generate the machine code from them is the task of a translator. This paper deals with such translators. Here we have listed some of the most common translators used by the modern day coders. It includes a descriptive study of the various translators along with their types, working techniques, advantages and demerits.

Introduction
In the three decades of scientific development of human, computer programming has become one of the most important skills needed to work on any kind of work associated with digital electronics. And while this enormous growth was taking place, attempts have been made to simplify the coding language to make it more suitable for humans. This approach has paved way for the development of high level languages; but such languages are not what machine can understand. So while working with such languages, it has always been mandatory to have software or a medium, which can convert human-friendly codes into machine friendly binary bits. Such a software which can generate machine understandable and executable binary data as output by taking human-friendly code as input can be regarded as a translator. In real life situations, a translator is a person who is capable of translating i.e., reproducing statements given in one language into another language by applying the given set of grammatical rules for translation. Similarly a language translator in computers is system software which has been designed to translate the codes written in one programming language in other on the basis of the given set of rules.

Why Language Translator Are Needed: Types of Languages


Now, before moving forward with our discussion about the different types of translators, lets first talk about the various languages between which the translation need to be done. In general, there are 3 categories under which all computer languages can be classified. These categories of compute languages are: Machine level language: - Machine language is a system of atomic instructions executed directly by a computer's central processing unit, where each instruction performs a very specific task. Every executable program is made up of a series of these atomic instructions. Machine code may be regarded as a primitive (and cumbersome) programming language or as the lowest-level representation of a compiled and/or assembled computer program.

Assembly level language: - An assembly level language is a low-level, computer architecture specific programming language for computers, microprocessors, microcontrollers, and other programmable devices, which implements a symbolic representation of the machine codes and other constants needed to program a given CPU architecture. This representation is usually defined by the hardware manufacturer, and is based on mnemonics that symbolize processing steps (instructions), processor registers, memory locations, and other language features. High level language: - A high-level programming language is a programming language with strong abstraction from the details of the computer. it uses easy to use and understand natural language elements, making the process of developing a program simpler and more understandable with respect to a low-level language. The amount of abstraction provided defines how "high-level" a programming language is.

Types of Translators
There are a large variety of language translators available in the present day computing scenario to perform the task of translation of code from one language to another. Among them the most common ones are: Compiler: - A compiler is a special program that takes written source code and turns it into machine language. On execution a compiler analyses all of the language statements in the source code and builds the machine language object code. Assembler: - An assembler translates assembly language into machine language. It uses computer-specific commands and structure similar to machine language, but assembly language uses names instead of numbers. It is similar to a compiler, but is specific to translating programs written in assembly language into machine language. To do this, the assembler takes basic computer instructions from assembly language and converts them into a pattern of bits for the computer processor to use to perform its operations. Interpreter: - An interpreter is a translator which converts programs into machineexecutable form each time they are executed. It analyses and executes each line of source code, in order, without looking at the entire program. Instead of requiring a step before program execution, an interpreter processes the program as it is being executed. Now along with these most basic types of translators, there exist some other varieties of translators which are used for specific purposes in some specific scenarios which are not faced on the day-today basis. Some of such compliers are: Decompiler: - It is a computer program that performs, as far as possible, the reverse operation to that of a compiler i.e., it translates an executable file into human readable format. While working with decompiler, it must be kept in mind that it does not

reconstruct the original source code, and its output is far less intelligible to a human than original source code. Disassembler: - It is a computer program that translates machine language into assembly language. Principally a disassembler is a reverse-engineering tool because Disassembly, the output of a disassembler, is often formatted for human-readability rather than suitability for input to an assembler. Binary recompiler: - It is software that takes executable binaries as input, analyses the structure, applies transformations and optimizations, and outputs new optimized executable binaries. Source-to-Source compiler: - It is a type of compiler that takes a high level programming language as its input and outputs another high level language.

Now as we have seen that there are different types of translators available to us. Lets get into the detail description of some of the most basic translators. A compact description of a few of the translators is given below: -

Compiler
As described above, the term compiler is primarily used for programs that translate high-level programming language to a lower level language.

Objectives of Compiler
Compilers bridge source programs in high-level languages with the underlying hardware. A objectives of most basic working of a compiler requires it to: Determining the correctness of the syntax of programs, Generating correct and efficient object code, Run-time organization, and Format output according to assembler and/or linker conventions.

Passes of a Compiler
The task of compiling a program is not a cake-walk that can be completed in one single go. A complex algorithm is implemented through a number of steps. Every such step is called a pass and each pass has been designed to perform a very specific function to fulfil the final goal of translation of code. The various passes of compiler through which a code passes during the translation are:-

Lexical analysis: - It is the process of converting a sequence of characters into a sequence of tokens. Pre-processing: - It is a program that processes its input data to produce output that is used as input to another program. The output is said to be a pre-processed form of the input data, which is often used by some subsequent programs. The amount and kind of processing done depends on the nature of the pre-processor; some pre-processors are only capable of performing relatively simple textual substitutions and macro expansions, while others have the power of full-fledged programming languages. Parsing: - It is the process of analysing a text, made of a sequence of tokens, to determine its grammatical structure with respect to a given formal grammar. Semantic analysis (Syntax-directed translation): - It is a method of translating a string into a sequence of actions by attaching one such action to each rule of a grammar. Thus, parsing a string of the grammar produces a sequence of rule applications and this provides a simple way to attach semantics to any such syntax. Code generation: - It is the process by which a compiler converts some intermediate representation of source code into a machine code that can be readily executed by a machine. Code optimization: - It is the process of modifying a program or code to make some aspect of it work more efficiently or use fewer resources.

Fig. 1 working of a compiler (schematic)

The Structure of a Compiler


A compiler consists of three main parts, which are: The front end checks whether the program is correctly written in terms of the programming language syntax and semantics. Here legal and illegal programs are recognized. Errors are reported, if any, in a useful way. Type checking is also performed by collecting type information. The frontend then generates an intermediate representation or IR of the source code for processing by the middle-end. The middle end is where optimization takes place. Typical transformations for optimization are removal of useless or unreachable code, discovery and propagation of constant values, relocation of computation to a less frequently executed place (e.g., out of a loop), or specialization of computation based on the context. The middle-end generates another IR for the following backend. Most optimization efforts are focused on this part.

The back end is responsible for translating the IR from the middle-end into assembly code. The target instruction(s) are chosen for each IR instruction. Register allocation assigns processor registers for the program variables where possible. The backend utilizes the hardware by figuring out how to keep parallel execution units busy, filling delay slots, and so on. Although most algorithms for optimization are in NP, heuristic techniques are well-developed.

The Compiler Design Issue: One-pass versus Multi-pass compilers


Classifying compilers by number of passes has its background in the hardware resource limitations of computers. Compiling involves performing lots of work and early computers did not have enough memory to contain one program that did all of this work. So compilers were split up into smaller programs which each made a pass over the source (or some representation of it) performing some of the required analysis and translations. The ability to compile in a single pass has classically been seen as a benefit because it simplifies the job of writing a compiler and one-pass compilers generally perform compilations faster than multi-pass compilers. Thus, partly driven by the resource limitations of early systems, many early languages were specifically designed so that they could be compiled in a single pass. The disadvantage of compiling in a single pass is that it is not possible to perform many of the sophisticated optimizations needed to generate high quality code. It can be difficult to count exactly how many passes an optimizing compiler makes. For instance, different phases of optimization may analyse one expression many times but only analyse another expression once.

Advantages of a Compiler
Fast in execution The object/executable code produced by a compiler can be distributed or executed without having to have the compiler present. The object program can be used whenever required without the need to of recompilation.

Disadvantages of a Compiler
Debugging a program is much harder. Therefore not so good at finding errors When an error is found, the whole program has to be re-compiled

Interpreter
An interpreter behaves very differently from compilers and assemblers. It converts programs into machine-executable form each time they are executed. It analyses and executes each line of

source code, in order, without looking at the entire program. Instead of requiring a step before program execution, an interpreter processes the program as it is being executed. While an interpreter is used to execute a code then no object code is produced, i.e., the program has to be interpreted each time it is to be run. For example if the program performs a section code 1000 times, then the section is translated into machine code 1000 times since each line is interpreted and then executed. So basically an interpreter is a computer program that executes, i.e. performs, instructions written in a programming language. It is a program that: Executes the source code directly. Translates source code into some efficient intermediate representation (code) and immediately executes this. Explicitly executes stored precompiled code made by a compiler which is part of the interpreter system. While interpreting and compiling are the two main means by which programming languages are implemented, these are not fully mutually exclusive categories, one of the reasons being that most interpreting systems also perform some translation work, just like compilers. The terms "interpreted language" or "compiled language" merely mean that the canonical implementation of that language is an interpreter or a compiler; a high level language is basically an abstraction which is (ideally) independent of particular implementations.

Fig. 2 working of an assembler (schematic)

Advantages of an Interpreter
Good at locating errors in programs Debugging is easier since the interpreter stops when it encounters an error. If an error is deducted there is no need to retranslate the whole program.

Disadvantages of an Interpreter
Rather slow No object code is produced, so a translation has to be done every time the program is running.

For the program to run, the Interpreter must be present.

Difference between Compiler and Interpreter


Though, both a compiler and an interpreter are used to translate the code from high level language to low level language, still they got some differences between them. The major differences between an interpreter and a compiler: A complier converts the high level instruction into machine language while an interpreter converts the high level instruction into an intermediate form. Before execution, entire program is executed by the compiler whereas after translating the first line, an interpreter then executes it and so on. List of errors is created by the compiler after the compilation process while an interpreter stops translating after the first error. An independent executable file is created by the compiler whereas interpreter is required by an interpreted program each time.

Fig. 3 comparative flow-charts of interpreters and compilers

Assembler
An assembler is a utility program used to translate assembly language statements into the target computer's machine code. It performs a more or less isomorphic translation (a one-to-one mapping) from mnemonic statements into machine instructions and data. It implements a symbolic representation of the machine codes and other constants needed to program a given CPU architecture. This representation is usually defined by the hardware manufacturer, and is based on mnemonics that symbolize processing steps (instructions), processor registers, memory locations, and other language features. An assembly language is thus specific to certain physical (or virtual) computer architecture. This is in contrast to most high-level programming languages, which, ideally, are portable. Typically a modern assembler creates object code by translating assembly instruction mnemonics into opcodes, and by resolving symbolic names for memory locations and other entities. The use of symbolic references is a key feature of assemblers, saving tedious calculations and manual address updates after program modifications. Most assemblers also include macro facilities for performing textual substitution which is to generate common short sequences of instructions as inline, instead of called subroutines.

Fig. 4 working of an assembler (schematic)

Types of Assembler
There are basically two types of assemblers based on how many times the assembler scans the source code to produce the executable program. One-pass assemblers go through the source code once. Any symbol used before it is defined will require "errata" at the end of the object code (or, at least, no earlier than the point where the symbol is defined) telling the linker or the loader to "go back" and overwrite a placeholder which had been left where the as yet undefined symbol was used.

Fig. 5 flow-chart of single pass assembler Two-pass assemblers create a table with all symbols and their values in the first pass, and then use the table in a second pass to generate code.

Fig. 6 flow-chart of 1st pass of 2-pass assembler Fig. 7 flow-chart of 2nd pass of 2-pass assembler

In both cases, the assembler must be able to determine the size of each instruction on the first or only pass in order to calculate the addresses of symbols. This means that if the size of an operation referring to an operand defined later depends on the type or distance of the operand,

the assembler will make a pessimistic estimate when first encountering the operation, and if necessary pad it with one or more "no-operation" instructions in the second pass or the errata. The original reason for the use of one-pass assemblers was speed of assembly; however, modern computers perform two-pass assembly without unacceptable delay. The advantage of the twopass assembler is that the absence of a need for errata makes the linker (or the loader if the assembler directly produces executable code) simpler and faster.

Basic Elements of Assembly Design


Whenever an assembler is being designed, there are 3 things which must be considered while creating the assembly language that will be assembled by the said assembler. These three factors are:-

Opcode mnemonics and extended mnemonics: - A mnemonic is a symbolic name for


a single executable machine language instruction (an opcode), and there is at least one opcode mnemonic defined for each machine language instruction. Each instruction typically consists of an operation or opcode plus zero or more operands. Most instructions refer to a single value, or a pair of values. Operands can be immediate (typically one byte values, coded in the instruction itself), registers specified in the instruction, implied or the addresses of data located elsewhere in storage. This is determined by the underlying processor architecture: the assembler merely reflects how this architecture works. Extended mnemonics are often used to specify a combination of an opcode with a specific operand which is to support specialized uses of instructions, often for purposes not obvious from the instruction name.

Data sections: - There are instructions used to define data elements to hold data and
variables. They define the type of data, the length and the alignment of data. These instructions can also define whether the data is available to outside programs (programs assembled separately) or only to the program in which the data section is defined.

Assembly directives: - Assembly directives, also called pseudo opcodes, are instructions
that are executed by an assembler at assembly time, not by a CPU at run time. They can make the assembly of the program dependent on parameters input by a programmer, so that one program can be assembled different ways, perhaps for different applications. They also can be used to manipulate presentation of a program to make it easier to read and maintain.

Implementation of Macros in Assemblers


Many assemblers support predefined macros, and others support programmer-defined macros involving sequences of text lines in which variables and constants are embedded. This sequence of text lines may include opcodes or directives. Once a macro has been defined its name may be

used in place of a mnemonic. When the assembler processes such a statement, it replaces the statement with the text lines associated with that macro, and then processes them as if they existed in the source code file (including, in some assemblers, expansion of any macros existing in the replacement text). Macros are used to customize large scale software systems for specific customers in the mainframe era and are also used by customer personnel to satisfy their employers' needs by making specific versions of manufacturer operating systems.

Prejudices Against Assemblers


Though assemblers are being widely used in the present day scenario, where software are being embedded in the hardware for more efficient uses of the devices but still there are several prejudices against working with assembler. The most important ones are: In assembler structured programming is impossible. This is untrue. In this area assembler actually offers more facilities than most 3GLs. Maintaining assembler programs is vastly more costly than maintaining 3GLs. When 3GLs were introduced this may have been true. Now however, this statement is highly debatable. Assembler is a cumbrous language, and hard to learn. Assembler is indeed a little less readable to the layman than e.g. COBOL. Such languages as C and C++ on the other hand are more difficult to master.

Advantages of Assemblers
The major advantages of using assemblers i.e. applying assembly level coding for your desired projects are: Working with assembler offers you a range of capabilities, which are not (all) available for 3GL- of 4GL-programmers. Easy resolving of parry errors Efficient usage of available memory Dynamic memory management Optimization Usage of operating system facilities Virtual look-aside facility Concurrent access to several datasets Subtasks Re-enter ability

Disadvantages of Assemblers

The main disadvantages of assembler over high level languages is that assembler is not portable (it is written for a particular instruction set) and that programmers are less productive since assembler is less expressive than high level languages.

References:
http://en.wikipedia.com http://wisegeek.com http://sciencepapers.com http://computerworld.com http://differencebetween.com http://bookrags.com http://wiki.answer.com http://mcargpv.blogspot.com http://bixoft.nl Leland L. Beck, System Software Donovan, Compiler Design

You might also like