You are on page 1of 6

POS Keyed Files

Table of Contents
POS Keyed Files
Objective
Implementation
Which Keyed File Hashing Algorithm Should Be Used?
History of Hashing Algorithms on 4680/90
The Answer
How to CREATE a Keyed File with Algorithm 1 or 2
Other Keyed File Considerations
Packing Factor
Randomizing Divisor
W798 - Message from adding to a keyed file
New Hashing Algorithm for the TOF Item File

POS Keyed Files

Objective
The following are basic objectives and assumptions for the usage and implementation of
keyed files in the 4690 POS Operating System. The primary objective is one of
performance; be able to access records from very large files in the shortest time possible.
90% of the time, the desired record should be found on the first read to the file.
The files may contain 100 records or millions of records.
The keys are random (as opposed to sequential type keys).
Reorganization of the file at various intervals is not required.
Records can be Read, Updated, Added and Deleted.
This allows for some ability of housekeeping of the file, records that are deleted
free up space for new records.
Support 'Read with Lock' and 'Write with Unlock' to allow shared read/write
access to the same file (database).
Implementation
The following include some of the assumptions of the 4690 Keyed File implementation.
They exist largely based upon the objectives as well as the type of data that is being
processed in the POS store environment.
Records are fixed length.
Key is fixed length (up to 508 bytes)
Maximum record size for a file is 508 bytes.
There is only one level of keys.
Hashing is the most efficient way to locate one record in a database.
The hashing algorithm has to be able to handle various types of keys and still give
fast access-
- packed numeric keys (Item record keys)
- Alphabetic keys (Names)
The file size is fixed at origination or build time. It is created based upon a selected
number of records that the user feels will meet the requirements for that file for 'x'
number of years. For example, a store has 100,000 unique items today, but it expects to
grow to possibly handling as many as 500,000 unique item codes in five years. The file
will be created for 625,000 possible records. (That is room for 500,000 records with a
'packing factor' of 80%. 500K records fit in 80% of the available space. The extra space
gives room for records to fit efficiently allowing minimal accesses per record.)
This allows the file to grow without reorganization of the file; i.e. no store management
of the database is required. If a file eventually becomes 'full', the file will need to be
rebuilt for a larger size. There are utilities to 'create a direct file from a keyed file' and to
'create a keyed file from a direct file' which can aid in this process.
The limits of fixed length records and one level of key may be disadvantages to some
generic requirements. You cannot have multiple records with the same key. For example,
you cannot find all 'Smith's' in the data base using the keyed file services.
Support of 'Read with Lock' and 'Write with Unlock' allows shared read/write access to
the same file. Lock occurs only at the 'block size' level. 'Block' size is a sector, 512 bytes
in 4690. It is the size of the 'record' that the keyed file system manages. A hashed key
would point to a block. The system reads the 'block' and the record is likely to be one of
several records in that block. 4690 OS locks at the block level in order to reduce the
possibility of locking out other concurrent updates to other records in the same database.

Which Keyed File Hashing Algorithm Should Be Used?
History of Hashing Algorithms on 4680/90
To review, there are three hashing algorithms offered users on current versions of
4680/90 OS. These are-
0 - The IBM Folding Algorithm
1 - The XOR Rotation Algorithm
2 - The Polynomial Hashing Algorithm
The IBM Folding Algorithm (0) was the first algorithm offered, in 4680 OS V1R1, and it
remains the default algorithm. It was found that algorithm 0 did not achieve a good
distribution of records in the keyed file when the file exceeded 3 megabytes in size.
The XOR Rotation Hashing Algorithm (1) was eventually introduced because it was
found that it could achieve a more even distribution of records in keyed files greater than
3 megabytes in size.
Eventually it was discovered that neither of these two hashing algorithms would provide
good enough distribution of records through the file if the keys were made up of ASCII
characters. Hashing algorithm 2, the polynomial hashing algorithm, was introduced to
remedy the problem of ASCII keyes.
The Answer
Faced with these three choices, users can understandably become confused over the
question of which algorithm will give the best results in any given case. Fortunately,
there is a simple answer.
In all but one case, hashing algorithm 2, known as the polynomial hashing algorithm,
should give results as good or better than algorithms 0 or 1. Algorithm 2 is the way to go.
The one exception to this rule is with the Supermarket Application Item Movement File,
EAMIMOVE.DAT. With EAMIMOVE.DAT, you must use algorithm 0. The reason is
that the Supermarket Application does its own hashing into this file by implementing
algorithm 0 within the application.

How to CREATE a Keyed File with Algorithm 1 or 2
The Keyed File Utility (KFU) provides an option to select any of the three algorithms
when creating a file this way. However, selecting an algorithm when creating a keyed file
from within an application is not as obvious as it is when using the KFU.
The 4680 BASIC "CREATE" statement does not allow an algorithm to be selected. It
defaults to the Folding algorithm (0). In order to create a keyed file from an application
with either the XOR Rotation Algorithm or the Polynomial Hashing Algorithm, a User
Logical Name must have been previously defined. The new User Logical Name is the file
logical name appended with an "H". The value assigned should be "0", "1", OR "2"
depending upon the algorithm to be used.
For example, to have the Item File, whose logical name is EALITEMR, use the
polynomial algorithm when the application CREATEs the file, set the User Logical
Name, EALITEMRH equal to the value 2.
For more information, see the "4680 Store Systems Programming Guide" - 'Using the
Alternate Hashing Algorithms'.

Other Keyed File Considerations
Packing Factor
The "packing factor" is the percentage of the whole file containing actual records. If a file
is built for 100,000 records and it contains 65,000 records, that file's packing factor is
65%.
Seventy-five percent packing is considered a good planning guide. This says that if you
expect in the future that the largest number of records that you will ever have is 100,000,
then the file should be built for 100,000/(.75) or about 133,000 records.
If the logical record length is greater than 169 bytes, then one sector of the keyed file
(508 bytes available for records) can hold only one or two records per sector. This can
impact the distribution of records and cause greater chaining than might be optimal. The
following guidelines for packing factors are recommendations for achieving optimum
keyed file performance. Optimum keyed file performance is considered to be 10%
chaining or less.
Records per Recommended
Sector LRECL Packing Factor

1 >254 Bytes 50%
2 170-254 Bytes 55%
3 128-169 Bytes 65%
4+ 1-127 Bytes 75%
Randomizing Divisor
The Randomizing Divisor selected by the system is typically very good. It is not
necessary to try to select a different one.
W798 - Message from adding to a keyed file
This message is a 'lesser' warning. It is an indication that a keyed file may be full or that
it may have records that are poorly distributed within the file.
The message is issued only one time per open of the file when a record is added to the
file that requires chaining, AND the keyed file system has to read at least 50 sectors to
find a suitable sector for the new record. "Suitable" requires that (1) the sector has no
chained records from a HOME sector different from the new record, and (2) the sector is
not full.
It would be wise to at least determine if this file were becoming full. This could be
anywhere from 80% to 99% of capacity. Keyed File Statistics can sometimes indicate this
if the statistics have never been cleared. There is a "Keyed File Analysis" tool also that is
available that does a detailed analysis of keyed files. It is called "KFANALY".
A third way to determine the number of records in a keyed file is to create a direct file
from the keyed file and divide the size of the direct file by the record length. The answer
is the number of records in the keyed file.
The maximum number of records that a keyed file can contain is equal to -->
(Number of Sectors in File) * (integer value of (508 / LRECL))
The "percentage full" of the keyed file is equal to -->
(Number of Records in File) / (The Maximum records the file can
contain)
If the keyed file is more than 80% full, then this file is becoming a candidate for being
rebuilt after creating the keyed file as a larger file. Eventually this file could become so
full that the system has to search for several seconds for free space. This has the potential
to freeze access to the file system for this search period.
If the keyed file is less than 80% full, then check the hashing algorithm. If algorithm "0"
is used, then changing the algorithm to "1" or "2" will probably better suit this file. See
the notes above on "which algorithm" for guidance.
New Hashing Algorithm for the TOF Item File
This note refers to the Terminal Offline Feature (TOF) of the GSA and SA application
products for 4690 POS. This feature can allow the Item File (or a subset of this file) to be
loaded into the terminal; either to the terminal's RAM disk or to the terminal's hard disk.
Since the terminal item file is usually tightly packed in order to get it into terminals that
are restricted by memory, algorithm number 2 is a good choice for that file. Even though
the file may be less than 3 MB and does not have ASCII keys, it will provide a better
distribution of records for a tightly packed file.
In order to change the algorithm for the GSA or SA TOF Item File, the logical name of
the internal "work file" must be changed in order to get the EALIMAGE or EAMIMAGE
file set for a new algorithm.
The EAxIMAGE file is actually created as a local WORK FILE. This work file then has
its distribution mode changed and the file is renamed to EAxIMAGE. The WORK FILE
is the file that must have a logical name in order to be created with hashing algorithm 1 or
2. This file is named "WRKIMAGE".
The logical name that should be set to the hashing value is:
WRKIMAGEH
Updated January, 2001

You might also like