You are on page 1of 22

CS 2304 SYSTEM SOFTWARE

G.PRABHAKARAN AP/CSE
S.SELVARANI AP/CSE

UNIT II

ASSEMBLERS

Basic assembler functions


A simple SIC assembler
Assembler algorithm and data structures
Machine dependent assembler features
Instruction formats and addressing modes
Program relocation
Machine independent assembler features
Literals
Symbol-defining statements
Expressions
Program blocks
One pass assemblers and Multi pass assemblers
Implementation examples
MASM assembler.

/
k
t

.
e
b

u
t
e

Assemblers1:
(i)Translation mnemonic operation codes to their machine language equivalents and
assigning machine address to symbolic labels used by the programmer.
(ii)There are some features of an assembler language that have no direct relation to machine
architecture.

s
c
/

/
:
p

t
t
h

1. Basic Assembler Functions2:

START-Specific name and starting address for the program.


END-Indicate the end of the source program and specify the first executable instruction in
the program.
BYTE-Generate character or hexadecimal constant,occupying an many bytes as neede to
represent the constant.
WORD-Generate one-word integer constant.
RESB- Reserve the indicated number of bytes for a data area.
RESW-Reserve the indicated number of words for a data area.
2 MARKS
1.Define assembler.
2. What are the basic functions in assembler. Explain.

III CSE

UNIT II

http://csetube.weebly.com/

CS 2304 SYSTEM SOFTWARE

G.PRABHAKARAN AP/CSE
S.SELVARANI AP/CSE

Example:
SIC assambler language program.
The program contains a main routine that reads records from an input device and copies
them to an output device.
This main routine calls subroutine RDREC to read a record into a buffer and subroutine
WRREC to write the record from the buffer to the output device.
Each subroutine must transfer the record one character at a time.
Because the only instructions available are RD and WD.
The buffer is necessary because the I/O rates for the two devices,such as a disk and a slow
printing terminal may be very different.
The end of each record is marked with a null character.If a record is longer than the length
of the buffer (4096 bytes),only the first 4096 bytes are copied.
The program does not deal with error recovery.
The end of the file to be copied is indicated by zero-length record.
When the end of file is detected,the program writes EOF on the output device and terminates
by executing RSUB instruction.
This program was called by the operating system using a JSUB instruction,Thus the RSUB
will return control to the operating system.

/
k
t

.
e
b

u
t
e

s
c
/

PROGRAM

:/

SOURCE STATEMENT
LINE LOCCTR LABEL

p
t

EXPLANATION

OPCODE OPERAND
START

1000

COPY FILE FROM I/P TO O/P

10

ht

FIRST

STL

RETADR

SAVE RETURN ADDRESS

15

CLOOP

JSUB

RDREC

READ I/P RECORD

20

LDA

LENGTH

TEST FOR EOF(LENGTH=0)

25

COMP

ZERO

30

JEQ

ENDFIL

EXIT IF EOF FOUND

35

JSUB

WRREC

WRITE O/P RECORD

40

CLOOP

LOOP

LDA

EOF

INSERT END OF FILE MARKER

50

STA

BUFFER

55

LDA

THREE

60

STA

LENGTH

COPY

45

ENDFIL

SET LENGTH=3

III CSE

UNIT II

http://csetube.weebly.com/

CS 2304 SYSTEM SOFTWARE

G.PRABHAKARAN AP/CSE
S.SELVARANI AP/CSE

65

JSUB

WRREC

WRITE EOF

70

LDL

RETADR

GET RETURN ADDRESS

75

RSUB

RETURN TO CALLER

80

EOF

BYTE

C'EOF'

85

THREE

WORD

90

ZERO

WORD

100

RETADR RESW

105

LENGTH RESW

110

BUFFER RESB

4096

SUB ROUTINE TO READ RECORD INTO BUFFER


PROGRAM
EXPLANATION
SOURCE STATEMENT
LINE LOCCTR LABEL
OPCODE OPRAND

/
k
t

125

RDREC

130
135

RLOOP

LDX

ZERO

LDA

ZERO

TD

INPUT

140

JEQ

145

RD

CLEAR LOOP COUNTER

u
t
e

s
c
/
RLOOP

.
e
b

CLEAR A TO ZERO
TEST I/P DEVICE
LOOP UNTIL READY

INPUT

READ
CHARACTER
REGISTER A

COMP

ZERO

TEST FOR END OF RECORD

JEQ

EXIT

EXIT LOOP IF EOR

STCH

BUFFER,X

STORE CHARACTER IN BUFFER

165

TIX

MAXLEN

LOOP UNLESS MAX LENGTH HAS


BEEN REACHED

170

JLT

RLOOP

STX

LENGTH

:/

p
t

150
155
160

175

ht
EXIT

180

RSUB

185

INPUT

BYTE

190

MAXLEN WORD

INTO

SAVE RECORD LENGTH


RETURN TO CALLER

X'F1'

CODE FOR I/P DEVICE

4096

SUBROUTINE TO WRITE RECORD FROM BUFFER

III CSE

UNIT II

http://csetube.weebly.com/

CS 2304 SYSTEM SOFTWARE

G.PRABHAKARAN AP/CSE
S.SELVARANI AP/CSE

PROGRAM
SOURCE STATEMENT
LINE LOCCTR LABEL

EXPLANATION

OPCODE OPERAND

200

WRREC

LDX

ZERO

CLEAR LOOP COUNTER

210

WLOOP

TD

OUTPUT

TEST OUTPUT DEVICE

215

JEQ

WLOOP

LOOP UNTIL READY

220

LDCH

BUFFER,X

GET CHARACTER FROM BUFFER

225

WD

OUTPUT

WRITE CHARACTER

230

TIX

LENGTH

LOOP UNTIL ALL CHARACTERS


HAVE BEEN WRITTEN

235

RSUB

240

OUTPUT

245

RETURN TO CALLER

BYTE

X'05'

END

FIRST

/
k
t

CODE FOR O/P DEVICE

.
e
b

2. A Simple SIC Assembler:

u
t
e

Convert mnemonic operation codes to their machine language equivalents.


(examlpe:translate STL to 14)
Convert symbolic operands to their equivalent machine addresses.(example:translate
RETADR to 1033)
Build the machine instructions in the proper format.
Convert the data constants specified in the source program into their internal machine
representations(example:EOF to 454f46)
Write the object program and the assembler.
Consider the statement,
10
1000 FIRST STL
RETADR
To translate the program line by line,we will be unable to process this statement because we
do not know the address that will be assigned to RETADR.
Because of this,most of assemblers make two passes over the source program.
The first pass does little more than scan the source program for label definitions and assign
addresses.
The second pass performs most of the actual translation previously described.
In addition,to translating the instructions of the source program, the assembler must process
statements called assembler directives or pseudo-instructions.
These statements are not translated into machine instructions.Instead,they provide
instructions to the assembler itself.(example:BYTE,WORD)
In our example program
START-Specifies the starting memory address for the object program.
END-Specific end of the program.
Finally,the assembler must write generated object code onto some output device.

s
c
/

/
:
p

t
t
h

III CSE

UNIT II

http://csetube.weebly.com/

CS 2304 SYSTEM SOFTWARE

G.PRABHAKARAN AP/CSE
S.SELVARANI AP/CSE

Object program format is divided into three types of records3,


Header
Text
End

The header record constains the program name,starting address and length.
Header record:
col:1
H
col:2-7
program name
col:8-13
Starting address of object program
col:14-19
Length of object program in bytes

The text records contain the translated instructions and data of the program,together with an
indication of the addresses where these are to be loaded.
Text Record:
col:1
col:2-7
col:8-9
col:10-69

/
k
t

.
e
b

T
Starting address for object code in thid record
Length of object code in this record in bytes
Object code,represented in hexadecimal

u
t
e

s
c
/

3. Define record.Explain. ( 2 MARKS)

/
:
p

The end record marks the end of the object program and specifies the address in the program
where execution is to begin.
End record:
col:1
E
col:2-7
Address of first executable instruction in object program.

t
t
h

The scope of the assembler is, to generate object code. But assembler does not know the
address exactly.so that the assembler choose pass1 algorithm and pass 2 algorithm4.
Pass:1
1.Assign addresses to are statements in the program.
2.Save the values assigned to are labels for use in pass 2.
3.Perform some processing of assembler directives.
Pass:2
1.Assemble instructions.
2.Generate data values.
3.Perform processing of assembler directives not done during pass 1.
4.Write the object program and the assembly listing.

III CSE

UNIT II

http://csetube.weebly.com/

CS 2304 SYSTEM SOFTWARE

G.PRABHAKARAN AP/CSE
S.SELVARANI AP/CSE

3. Assembler Algorithm and Datastructures:


Our simple assembler uses two major internal data stuctures5:
-The operation code tabel(OPTAB)
-The symbol table(SYMTAB)
OPTAB is used to look up mnemonic operation codes and translate them to their machine
language equivalents6.
SYMTAB is used to store values assigned to labels7.
LOCCTR-This is a variable that is used to help in the assignment of addresses8.
LOCCTR is intialized to the beginning address specified in the START statement.
After each source statement is processed, the length of the assembled instruction or data area
to be generated is added to LOCCTR.
Whenever we reach a label in the source program,the current values of LOCCTR gives the
addresss to be associated with that label.

/
k
t

u
t
e

2 MARKS
4.
5.
6.
7.
8.

.
e
b

Why you go for pass 1 & pass 2 algorithm?. State the reason.
What are the data structures ised in assembler?
Define optab.
Define symtab.
Define LOCCTR.

s
c
/

/
:
p

t
t
h

III CSE

UNIT II

http://csetube.weebly.com/

CS 2304 SYSTEM SOFTWARE

G.PRABHAKARAN AP/CSE
S.SELVARANI AP/CSE

3.1 PASS 1 ASSEMBLER ALGORITHM 9:

/
k
t

.
e
b

u
t
e

s
c
/

/
:
p

t
t
h

9. Explain in detail about pass1 assembler algorithm. (8 Marks)

III CSE

UNIT II

http://csetube.weebly.com/

CS 2304 SYSTEM SOFTWARE

G.PRABHAKARAN AP/CSE
S.SELVARANI AP/CSE

3.2 PASS 2 ASSEMBLER ALGORITHM 10

/
k
t

.
e
b

u
t
e

s
c
/

/
:
p

t
t
h

10. Explain in detail about pass2 assembler algorithm.(8 MARKS)

III CSE

UNIT II

http://csetube.weebly.com/

CS 2304 SYSTEM SOFTWARE

G.PRABHAKARAN AP/CSE
S.SELVARANI AP/CSE

4. Machine Dependent Assembler features:


Eg: SIC-XE assembler program.
Immediate and indirect addressing can be adopted in programs written in SIC/XE version.
*Immediate operands are denoted with the profix #
*Indirect addressing is indicated by adding the prefix @ to the operand.
Instructions that refeer to memory are assembleed normally using program counter relative
or base relative mode.
If the displacement required for pc relative and base relative addressing are too large then
the 4 byte extende format instruction is used.
The main difference between SIC and SIC/XE programs is the use of register to register
instruction.
4.1 Advantages of SIC/XE Program:
Execution speed is good since register to register instruction execution speed is faster than
register to memory instruction.
Immediate operand need not be fetched from anywhere as it is present as a part of
instruction.
The large main memory of SIC/XE provides room to load and run several programs at the
same time.
4.2 Instruction formats and Addressing Modes:

/
k
t

.
e
b

The START statement specifies the starting address of the location where the program is to
be loaded.
Eg:START 0 statement will allow a program to be loaded in the address 0.
SYMTAB would be preloaded with the register names (A,X...etc) and their values(0,1...ets)
Register to memory instruction is assembled using either program counter relative or base
relative addressing.
The assembler must calculate the displacement, which must be added as a part of the object
instruction.
The displacement is calculated so that the correct target address is ngot when content of
program counter(pc) or base register(B) is added with the displacement.
Displacement must be between 0 and 4095(for base relative mode) or between -2048 and
-2047(for program counter relative mode).
If neither program counter rerlative nor base relative addressing can be used then the 4-byte
extended instruction format is used.
Examples for code generation:

u
t
e

s
c
/

/
:
p

t
t
h

10

12
95

0000 FIRST

STL RETADR

17
machine
equivalent
+
first two bits
of register

0003
.
0030 RETADR

LDB #LENGTH

2
0 2 D
last displacement value
four
bits
of
reg

69 2 0 2 D

RESW

III CSE

UNIT II

http://csetube.weebly.com/

CS 2304 SYSTEM SOFTWARE

G.PRABHAKARAN AP/CSE
S.SELVARANI AP/CSE

PRODEDURE:
THERE ARE THREE OPRATIONS TO FIND OBJECT CODE.
OPERATION 1: FIND MACHINE LANGUAGE EQUIVALENT AND SUM WITH
FIRST TWO BITS OF REGISTER.
STEP 1: FIND MACHINE LANGUAGE EQUIVALENT
STEP 2: FIND FIRST TWO BITS OF PC OR BASE REGISTER.
STEP 3: CALCULATE DECIMAL VALUE OF TWO BITS.
E.G: STA - 14
FIRST TWO BITS OF PROGRAM COUNTER ( 11 0010) IS 11.
THE DECIMAL EQUIVALANT IS 1
1
1 X 20= 1
1 X 21= 2

k/

t
.
e

AS PER OUR PROCEDURE, MACHINE LANGUAGE SUM WITH


REGISTER DECIMAL EQUIVALANT. 14+3 = 17.

b
u
t

OPERATION 2: FIND LAST TWO BITS OF REGISTER. AND CALCULATE


DECIMAL EQUIVALENT.

e
s
c

E.G: THE LAST FOUR DIGITS OF REGISTER IS 0010. THE


DECIMAL EQUIVALANT IS 2.

/
/
:

p
t

OPERATION 3: FIND DISPLACEMENT VALUE


STEP 1: FIND THE OPERAND ADDRESS.
STEP 2: FIND THE NEXT INSTRUCTION ADDRESS OF THE CURRENT
LINE.
STEP3: CONVERT THE STEP 1 HEXADECIMAL VALUE INTO DECIMAL.
STEP 4: CONVERT THE STEP 2 HEXADECIMAL VALUE INTO DECIMAL.
STEP 5: SUBTRACT STEP 4 ANSWER FROM STEP 1.
STEP6: CONVERT STEP 5 ANSWER INTO HEXADECIMAL.
STEP7: SUPPOSE STEP 5 ANSWER IS NEGATIVE VALUE MEANS, FIND
2'S COMPLEMENT VALUE

ht

E.G: OPEARND IS RETADR. THE ADDRESS IS 30.


NEXT INS ADDRESS IS 3.
3
0
3
0
0 X 16 =0
3 X 161=48

3 X 160= 3

48
III CSE

UNIT II

http://csetube.weebly.com/

CS 2304 SYSTEM SOFTWARE

G.PRABHAKARAN AP/CSE
S.SELVARANI AP/CSE

48 3 = 45.

16 45
2 - D
HEXADECIMAL 0F 45 IS

2 D.

Example 1:
15
.
.
.
40
45

0006 CLOOP

+JSUB

0017
0014 ENDFIL

J
LDA

RDREC

CLOOP
EOF

1
i

0
x

0
b

1
p

-----03201D

/
k
t

.
e
b

u
t
e

Program counter relative addressing is,


1
n

4B101036

0
e
Ist two bits are 1(As n and i=1)
Hexadecimal equivalent of 11 is 3.
find the displacement values,
CLOOP location is 0006 and pc values is 1A.

s
c
/

/
:
p

t
t
h

Decimal equivalent of 6 is 6,decimal equivalant if 1A is 26.


6-26 = -20.
Hexadecimal equivalent of -20 is -14.
The hexadecimal values 14 is written as binary values.Because,the
hexadecimal value is have
'-ve' signed.As per our
concept,calculate 2's complement for that value.
14 is written as,
0000
0001 0100 -->14 (based on displacement. Address foeld is 12 bits)
(0
1
4 )
The 2's complement is,
The 2's complement procedure is,
Take 1's complement then add 1 to the answer of 1's complement.

III CSE

UNIT II

http://csetube.weebly.com/

CS 2304 SYSTEM SOFTWARE

G.PRABHAKARAN AP/CSE
S.SELVARANI AP/CSE

0000 0001 0100 -->1111


+

1110

1011 (1's complement)


1

1111

1110

1100 (2's complement)

F=1111, E=1110,C = 1100


The object code is 3F2FEC.
Difference between pc relative and base relative addressing11:
1. When pc relative addressing is used the assembler will know the content of pc,only during
execution time.
2. But in base relative addressing ,the programmer must tell the assembler what the base
register will contain during the execution of the program and the assembler will calculate the
displacement.

/
k
t

4.3 Program Relocation:


More than one program can share the memory and other resources of the machine.
If we knew in advance,which program would execute concurrently,we could assign
address,when the program were assembled so that they would fit together without
overlap.But practically this may not be possible.
So it is desirable to load a program into the memory whenever there is a space for it.
In such cases actual starting address of the program is not know until load time.
If the program is loaded beginning at the location 1000,the variable THREE value will
located at address 102D.
If the program is loaded starting at some other addresss 2000,the address 102D will not
contain the actual value of THREE.
So we have to make some changes in the address portion of the instruction in order to
retrieve the correct value.

.
e
b

u
t
e

s
c
/

/
:
p

Eg:
0006
.
.
1036

t
t
h

CLOOP

+JSUB

RDREC

4B101036

RDREC

CLEAR

B410

11. Difference between program counter addressing and base relative addressing. (2 marks)

III CSE

UNIT II

http://csetube.weebly.com/

CS 2304 SYSTEM SOFTWARE

G.PRABHAKARAN AP/CSE
S.SELVARANI AP/CSE

Case 1:
The statement RDREC is present at the memory location 1036,if the program loaded beginning
at address 0000.
0000
.
0006
.
.
1036

4B101036
B410

<--+JSUB

RDREC

<--RDREC

Case 2:
5000
.
.
5006
4B106036
<--+JSUB
RDREC
.
.
6036
B410
<--RDREC
The address of the instruction JSUB the address of label RDREC.
The assembler does not know the actual location where the program will be loaded.However the
assembler can identify for the loader those parts of the object program that need modification.
An object program that contain the information to perform this kind of modification is called a
relocatable program.

/
k
t

.
e
b

u
t
e

s
c
/

Relocation Program Solving Steps:


When the assembler generates the object code for the JSUB instruction,it will insert the
address of RDREC,relative to the start of the program.(This is the reason we intialized the
location counter to 0 for the assembly)
The assembler will also produce a command for the loader,instructing it to add the
beginning address of the program to the address field in the JSUB instruction at load time.

/
:
p

t
t
h

Modification Record12:
col:1
col:2-7

M
Starting location of the address field to be modified relative to the beginning of
the program.
Col:8-9
Length of address field to be modified in half bytes.
(ie. 4 bits=1 half byte)
For all the instruction which uses extended format instruction,relocation must be performed,
so modification record must be added.
Other lines in the program do not require modification as they use pc relative or base
relative addressing.
12. Define Modification record.Explain.(2 marks).

III CSE

UNIT II

http://csetube.weebly.com/

CS 2304 SYSTEM SOFTWARE

G.PRABHAKARAN AP/CSE
S.SELVARANI AP/CSE

5. Machine Independent Assembler Features:


This features that are commonly found in implementation of this type of software and that are
relatively machine independent.
5.1 Literals13:
Programmer is convenient to write the value of a constant operand as a part of the
instruction that uses it.
This avoids having defined the constantss where in the program and make up a label for it.
Such an operand is called as literal,because the value is stated literally in the instruction.
Literal is identified with the prefix '=',which is followed by a specification of the literal
value.
Eg:
45
001A
ENDFIL
LDA =C'EOF'032010
215 1062
WLOOP
TD
=X'05'E32011

/
k
t

Difference between Literal and Immediate Operand14:


An immediate addressing ,the operand value ios assembled as part of the machine
instruction.In literal the assembler must generate the value as a constant in any of the
memory location.
Address of the constant is assigned as the target address.

.
e
b

u
t
e

5.1.1 Literal Pool15:


Literals are stored in literal pool.This operation is carried out the end of the program.
LTORG-->Assembler directives
It creates the literals pool immediately and store the literals until the previous LTORG.
Once a literal is stored in the literal pool then it is nnot repeated again.
In some program the LTORG is placed in the middle of the program, this is because the
literals are placed in the pool at the end of the program.
When there is a literal at the beginning of the program and the program has 300 lines means
then the starting address of the literal pool is at the end of the program.
The reference for the operand make the pc to go for to reach literal and this waste the
time.So it is possible to use as much LTORG statement in the program.
Most of the assembler does not allow duplication of literals in the literal pool.They allow the
same literal used more than one place in the program.
In literal pool only one copy of the specified date value is stored.
Before allocating space for a literal in the pool,it is verified that is there the same literal is
already in the pool by means of comparing the literals in the pool character with the new
literal.

s
c
/

/
:
p

t
t
h

13. Define literals.(2 marks)


14. Diferntiate literal and immediate operand.(2 marks)
15. Define literal pool.( 2marks)

III CSE

UNIT II

http://csetube.weebly.com/

CS 2304 SYSTEM SOFTWARE

G.PRABHAKARAN AP/CSE
S.SELVARANI AP/CSE

For example,
Same literal is used more than once,and the literal has different values during the execution of
the program.Here according to the duplication of literals in the pool,the above mentioned literal is
appeared once in the pool and the execution may be a problem.

The solution is created basic data structure literal tabel[LITTAB16]


Literal tabel contains,
-Literal name
-The operand value and length
-Address assigned to the operand
During pass 1 the assembler searches the LITTAB for a literal name.If the literal is present
means no problem.If it is not the literal is added to the literal tabel.
During pass 2 the assembler searches the LITTAB for the literal address for object code
generation.

/
k
t

17

5.2 Symbol-defining Statements :


User defined symbols in assembler language program have appeared as labels on instruction
or data areas.
The value of such a label is the address assigned to the statement on which it appears.
Most assembler provides an assembler directive that allows the programmer to define
symbols and apecify their values.
The assembler directive generally used in EQU.
The general form of such statement is,symbol EQU value.
This statement defines the given symbol and assigns the value specified to it.
The value may be given as,
-A constant.
-As any expression involving contents.
-Previously defined symbols.
One use of EQU is to establish symbolic names that can be used for improved readability in
place of numeric values.
Eg:
+LDT
#4096
to load the values 4096 into register T.This values represents the maximum length record.We could
read with subroutine RDREC.
MAXLEN
EQU
4096
And the calling statement like this
+LDT
#MAXLEN
Now it is clear that MAX LEN is replaced with the values 4096 during execution.Assembler
encounters the EQU and stores it in the SYMTAB with its value 4096.
Another common use of EQU is in defining mnemonics names for registers.

.
e
b

u
t
e

s
c
/

/
:
p

t
t
h

16. Define LITTAB.(2 marks)


17. Define symbol defining statements.(2marks)

III CSE

UNIT II

http://csetube.weebly.com/

CS 2304 SYSTEM SOFTWARE

G.PRABHAKARAN AP/CSE
S.SELVARANI AP/CSE

Eg:
A
X
L

EQU 0
EQU 1
EQU 2

BASE
COUNT
INDEX

EQU
EQU
EQU

R1
R2
R3

These statements specify a 1 byte literal with the hexadecimal value 05.The notation used
for literal varies from assembler to assembler.
It is important to understand the difference between a literal and an immediate operand with
immediate addressing,the operand value is assembled as part of the machine instruction.
With literal the assembler generates the specified value as a constant at some other memory
location.
BASE
*
LDB
=*
Another assembler directive is called ORG.This is used to indirectly assign the values to
symbols.
When value is a constant or an expression involving constants and previously defined
symbol.
SYMBOL
RESB
6
VALUE
RESB
1
FLAGS
RESB
2
ORG
STAB +1100
The first ORG resets the location counter to the value of STAB.The label on the following
RESB statements defines SYMBOL to have the current value in LOCCTR.
(ie)the same address assigned to SYMTAB LOCCTR.

/
k
t

.
e
b

u
t
e

s
c
/

/
:
p

5.3 Expressions:
Our previous examples of assembler language statements have used single terms like
label,literal,etc.,as instruction operands.
Most of the assemblers use expression wherever a single operand is permitted.
Such expression is evaluated by the assembler and the result is used as the normal operand.
Arithmetic expressions are allowed and it must follow the normal rules using the operators
+,-,* and /.
This statement is encountered during assembly of a program,the assembler refers its location
contain(LOCCTR)to the specified value we can define a symbol tabel with all following
structures.
SYMBOL
VALUE
FLAGS

t
t
h

III CSE

UNIT II

http://csetube.weebly.com/

CS 2304 SYSTEM SOFTWARE

G.PRABHAKARAN AP/CSE
S.SELVARANI AP/CSE

In this tabel,SYMBOL field contain'6' byte user-defined symbols;VALUE is a one-word


representation of the value assigned to the symbol;FLAGS is a 2-byte field that specifies
symbol type and other information.
STAB
RESB
1100
With EQU statements,
SYMBOL
EQU
STAB
VALUE
EQU
STAB+6
FLAGS
EQU
STAB+9
With help of assembler directive ORG,we can write those statemnts,
STAB
RESB
1100
ORG
STAB
Division is usually defined to produce an integer result.Individual terms in the expression
may be constant,user-defined symbols(or)special terms,common special term is the current
value of the location counter(designated by *).(ie)the value of the next unassigned memory
location.
BUFEND
EQU
*
The above expression gives BUFEND a value that is the address of the next byte after the
buffer area.
Some values in the object program are relative to the beginning of the program,while others
are absolute.
Similarly,the values of terms and expressions are either relative or absolute.
A constant is an absolute term.Labels on instructions and data areas,and references to the
location counter value,are relative terms.
A symbol whose value is given by EQU may be either an absolute term or a relative term
depending upon the expression used to define its value.
Expressions are classified as18,

/
k
t

.
e
b

u
t
e

s
c
/

/
:
p

*Absolute expression
*Relative expression
The expressions are depending upon the type of value they produce.
Expression that contains only absolute terms are come under absolute expression.
There are some conditions19 to use the relative terms in the expressions,
*Every relative term is paired with another relative term.
*Remaining unpaired term is assigned with a pasitive sign.
*Relative term is not allowed for multiplication and division operation.
Expressions that do not come under absolute or relative are flagged by the assembler an
errors.
Some timer relative terms are paired with opposite signs,in that case the result is an absolute
value.
MAXLEN
EQU
BUFEND-BUFFER

t
t
h

18. Define expressions. Whatr are types of expression.(2 marks).


19. What are conditions to use the relative terms in expressions.(2 marks)

III CSE

UNIT II

http://csetube.weebly.com/

CS 2304 SYSTEM SOFTWARE

G.PRABHAKARAN AP/CSE
S.SELVARANI AP/CSE

5.4 Program Blocks:


Normally the source program is treated as a unit which contains subroutines,data areas,etc.,
The assembler evaluates the program and results in a single unit of object code.
Some features of assembler allow generalized machine instruction and data to appear in the
object program in a different order from the corresponding source statements.
These parts maintain their identity and are handled separately by the loader.
We use the program blocks to refer to segments of code that are arranged within a single
object program unit and control sections to refer to segments that are translate into
independent object program units.
Each program blocks may actually contain several seperate segments of the source program.
In this case three blocks20 are used.The first program block contains the executable
instructions of the program.(unnamed block).
The second block(C DATA)contains all data areas that are small in length.
The third (C BLKS) contains all data areas that consist of larger blocks of memory.
The assembler directive USE indicates which portions of the source program belongs to the
various blocks.
At the beginning of the program,statements are assumed to be part of unnamed
(default)block.
If no USE statements use included ,the entire program belongs to this single block.
The assembler will rearrange these segmants to gather together the pieces of each block.
These blocks will then be assigned addresses in the object program,with the blocks
appearing in the same order in which they were first begun in the source program.
The assembler accomplishes this logical rearrangement of code by maintaining,during pass
1 a seperate location counter for each program block.
The location counter for a block is initialized to '0' when the block is first begun.
The current value if this location counter is saved when switching to another block.
And the saved value is restored when resuming a previous block.
During pass 1 each label in the program is assigned an address that is relative to the start of
the block that contains it.
When labels are entered into the symbol tabel,the block name or number is stored along with
the assigned relative address.
At the end of pass 1 the latest value of the location counter for each block indicates the
length of that block.
The assembler can then assign to each block a starting address in the object program.
For code generation during pass 2,the assembler needs the address for each symbol relative
to the start of the object program.

/
k
t

.
e
b

u
t
e

s
c
/

/
:
p

t
t
h

20. what are blocks in program. How they classified. Explain.(2 marks)

III CSE

UNIT II

http://csetube.weebly.com/

CS 2304 SYSTEM SOFTWARE

G.PRABHAKARAN AP/CSE
S.SELVARANI AP/CSE

Block Name

Block Number

Address

Length

Default

0000

0066

C DATA

0066

00013

C BLKS

0071

1000

5.4.1 Control Section and Program linking:


A control section is a part of program that maintains its identify after assembly.
Each such control section can be loaded and relocated independently of the others.
Diffferent control sections are most often used for subroutines or other logical subdivisions
of a program.
The programmer can assemble,load and manipulate each of these control sections
seperately.The resulting flexibility is a major benefit of using control sections.
When control section form logically related parts of a program,it is necessary to provide
some means for linking them together.
Instructions in one control section might need to refer to instructions or data located in
another section.
Besause control sections are independently loaded and relocated ,the assembler is unable to
process these references in the usual way.
The assembler has no idea where any control section will be located at execution time.Such
references between control external references.
In this case there are three control sections.One for the main program and for each
subroutine. Program blocks traced through the assembly and loading process.
Control sections differ from program blocks in that they are handled seperately by the
assembler.
Symbols that are defined in control section may not be used directly by another section;they
must be identified as external references for loader to handle.
EXTDEF EXTERNAL DEFINITION
EXTREF EXTERNAL REFERENCE
The two new record types21 are DEFINE and REFER. A Define record gives information
about external symbol that are defined in this control section. A Refer record lists symbols
that are yield as external references by the control section.

/
k
t

.
e
b

u
t
e

s
c
/

/
:
p

t
t
h

DEFINE RECORD:
COL 1
:D
COL 2-7
:Name of the external symbol defined in this Control section.
COL 8-13
:Relative address of symbol.
COL 14-73 : Repeat information in col 2-13 for other external symbol.
21. Define DEFINE record and REFER record.explain.( 2 marks)

III CSE

UNIT II

http://csetube.weebly.com/

CS 2304 SYSTEM SOFTWARE

G.PRABHAKARAN AP/CSE
S.SELVARANI AP/CSE

REFER RECORD:
COL 1
:R
COL 2-7
:Name of external symbol.
COL 8-13
:Name of the other external reference symbols.
MODIFICATION RECORD:
COL 1
:M
COL 2-7
:Starting address of the field to be modified.
COL 8-9
:Length of the field to be modified as half bytes.
COL 10
:Modification flag.
COL 11-16 :External symbol whose value is to be added or subtracted to the
indication field.
6. One pass assemblers and Multipass assemblers:

/
k
t

6.1 One-Pass Assemblers:

.
e
b

Scenario for one-pass assemblers

Generate their object code in memory for immediate execution load-and-go

u
t
e

assembler.

External storage for the intermediate file between two passes is slow or is

s
c
/

inconvenient to use.

/
:
p

Main problem - Forward references


Data items

tt

Labels on instructions

Solution

Require that all areas be defined before they are referenced.


It is possible, although inconvenient, to do so for data items.
Forward jump to instruction items cannot be easily eliminated.
Insert (label, address_to_be_modified) to SYMTAB
Usually, address_to_be_modified is stored in a linked-list
6.1.1Forward Reference in One-pass Assembler:
Omits the operand address if the symbol has not yet been defined.
Enters this undefined symbol into SYMTAB and indicates that it is undefined
Adds the address of this operand address to a list of forward references associated
with the SYMTAB entry.
III CSE

UNIT II

http://csetube.weebly.com/

CS 2304 SYSTEM SOFTWARE

G.PRABHAKARAN AP/CSE
S.SELVARANI AP/CSE

When the definition for the symbol is encountered, scans the reference list and
inserts the address.
At the end of the program, reports the error if there are still SYMTAB entries
indicated undefined symbols.
6.2 Multi-Pass Assemblers:
For a two pass assembler, forward references in symbol definition are not allowed:
ALPHA

EQU

BETA

BETA

EQU

DELTA

DELTA

RESW

Symbol definition must be completed in pass 1.


Prohibiting forward references in symbol definition is not a serious inconvenience.

/
k
t

Forward references tend to create difficulty for a person reading the program.

.
e
b

6.2.1 Implementation:

u
t
e

For a forward reference in symbol definition, we store in the SYMTAB:

s
c
/

The symbol name

The defining expression

/
:
p

The number of undefined symbols in the defining expression

t
t
h

The undefined symbol (marked with a flag *) associated with a list of

symbols depend on this undefined symbol.


When a symbol is defined, we can recursively evaluate the symbol
expressions depending on the newly defined symbol.

7. IMPLEMENTATION EXAMPLE:
MASAM assembler
SPARC assembler

III CSE

UNIT II

http://csetube.weebly.com/

CS 2304 SYSTEM SOFTWARE

G.PRABHAKARAN AP/CSE
S.SELVARANI AP/CSE

MASAM assembler
MASAM assembler is written for Pentium and other x 86 systems.
Since x 86 system views memory as a collection of segments, MASAM
assembler language program is written as a collection of segments.
Each segment is defined as belonging to a particular class.
Commonly used classes are CODE, DATA, CONST and STACK.
During program execution, segments are addressed via the x 86 segment

/
k
t

registers.

Code segment are addressed using register CS

.
e
b

Start segments are addressed using register SS

u
t
e

Data segments are addressed using DS or GS.

s
c
/

Jump instructions are assembled in two different ways

/
:
p

t
t
h

Near jump
Far jump

III CSE

UNIT II

http://csetube.weebly.com/

You might also like