File Organization and Access Methods

Modes of file access
Serial file
.
A serial file is one in which records are stored, one after the other, in the order in which they are
added not in order of a key field.
This means that new records are stored at the end of the file.
The following shows a serial file that is used to store the number of entries for EdExcel GCSE
Mathematics. The entries were received in the order: Kettlewood, Queens Park, St Marys, Wilton
High, West Orling.
Centre
Number
Centre
Name
No of
Candidates
27102
Kettlewood
85
38240
Queens Park
103
64715
St Marys
121
30446
Wilton High
156
12304
West Orling
105
Note that the key field in this file would be Centre Number (it uniquely identifies each school)
Both disks and tapes can be used to store a file serially.
Sequential file
A sequential file is one in which the records are stored, one after the other, in the order of the key
field.
The following shows a sequential file that is used to store the number of entries for EdExcel GCSE
Mathematics. The entries were added in the order: Kettlewood, Queens Park, St Marys, Wilton
High, West Orling but they are stored in the order of the key field Centre Number:
Centre
Number
Centre
Name
No of
Candidates
12304
West Orling
105
27102
Kettlewood
85
30446
Wilton High
156
38240
Queens Park
103
64715
St Marys
121
As with a serial file, both tape and disks can be used to store a file sequentially and access to the
records must take place from the beginning of the file.
Benefits
Sequential files allow the records to be displayed in the order of the key field this makes the
process of adding a record slower, but significantly speeds up searches.
Indexed sequential file

An indexed sequential file is one in which the records are stored, one after the other, in the order of
the key field, but which also has an index that enables records to be accessed directly.
Index
An index is a file with two fields, created from the main file, which contains a list of:
the key fields (sorted sequentially);
pointers to where the records can be found in the main file.
Indexed sequential files are useful when:
it is sometimes necessary to process all the records in sequential order; and
it is sometimes necessary to access individual records randomly.
Examples of indexed sequential files

Company employee file
At the end of each month all the records will be processed sequentially, in order to produce
payslips. However, some records will need to be accessed randomly, at other times for example,
when an employee changes address.
A schools student file
When an attendance report is printed, the file will be accessed sequentially, but when the details of
an individual student are required the index will be used to find the required record quickly.
Random (direct) access file

A random access file is one in which a record can be written or retrieved without first examining
other records.
A random access file must be stored on disk and the disk address is calculated from the primary
key.
In its simplest form a record with a primary key of 1 will be stored at block 1, a record with a
primary key of 2 will be stored at block 2; a record with primary key 3 will be stored at block 3 etc:
It should be noted that this very simple method where [disk address] = [primary key] is very
inefficient in respect of disk space. For example:
if the lowest primary key is 1001, then all the disk space below block 1001 will be wasted.
If there are some values which the primary key never takes (for example odd values) these
storage spaces will be wasted.
In order to be more efficient with the use of disk space, random access files calculate disk addresses
by using a hashing algorithm (also known as just hashing).
Hashing
Hashing is a calculation that is performed on a primary key in order to calculate the storage address
of a record.
A hashing algorithm will typically divide the primary key by the number of disk blocks that are
available for storage, work out the remainder and add the start address. The answer will be the
storage address of the record.
[disk address] = [primary key] MOD [number of blocks] + [start address]
Example
If a file was to be stored on the first 5000 blocks of a disk then:
[disk address] = [primary key] MOD 5000
That is, the primary key of each of the records would be divided by 5000 and the remainder would
be the disk address for the record.
This means that a record with primary key of 27102 would be stored at the disk address calculated
as follows:
27102
= 5 remainder 2102
5000
This means that the disk address for this record will be 2102.
The table shows some other disk addresses calculated using the same hashing algorithm:
Centre
Number
Centre
Name
No of
Candidates
Disk
Address
27102
Kettlewood
85
2102
38240
Queens Park
103
3240
64715
St Marys
121
4715
30446
Wilton High
156
446
12304
West Orling
105
2304
Problems with hashing

One problem that could occur with hashing is that a block may already contain a record and be full.
For example records with key fields of 38240 and 43240 will both be assigned a disk address of
3240.
If this happens then the new record will need be written somewhere else. Two common ways of
determining this alternative location are:
the record can be written to the next available block note that if it is the last address block
which is full then the search for an available space will start from the first block.
the record could be written to a separate overflow area and a tag is placed in the calculated
location to indicate exactly where in this overflow area the record can be found.

File Organization and Access Methods

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

File Organization and Access Methods

Uploaded by

Copyright:

Available Formats

Modes of file access

Indexed sequential file

Examples of indexed sequential files

Random (direct) access file

Problems with hashing

You might also like