You are on page 1of 34

Chapter 8:

Data Structures and


CAATTs for Data
Extraction

IT Auditing, Hall, 3e
2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated,
or posted to a publicly accessible website, in whole or in part.

Data Structures
Two fundamental components:
Organization: the way records are

physically arranged on the secondary


storage device

Access method: technique used to

locate records and to navigate through


the database or file

2011 Cengage Learning. All Rights Reserved. May not beHall,


scanned,
copied or duplicated,
3e
or posted to a publicly accessible website, in whole or in part.

Access:
Non-Index
Methods

INDEX
File

Access:
Index Methods
SEQUENTIAL
ISAM RANDOM

Hashing
Pointers

DATA File

Data
Organizatio
n
SEQUENTIAL
RANDOM

2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated,
or posted to a publicly accessible website, in whole or in part.

Hall, 3e

File Processing Operations


1.
2.
3.
4.
5.
6.
7.

Retrieve a record by key


Insert a record
Update a record
Read a file
Find next record
Scan a file
Delete a record

2011 Cengage Learning. All Rights Reserved. May not beHall,


scanned,
copied or duplicated,
3e
or posted to a publicly accessible website, in whole or in part.

Individual
Records

Data Structures
Flat file structures
Sequential structure [Figure 8-1]
All records in contiguous storage spaces in specified
sequence (key field)
Sequential files are simple & easy to process
Application reads from beginning in sequence
If only small portion of file being processed, inefficient
method
Does not permit accessing a record directly
Efficient: 4, 5 sometimes 3

2011 Cengage Learning. All Rights Reserved. May not beHall,


scanned,
copied or duplicated,
3e
or posted to a publicly accessible website, in whole or in part.

Data Structures

Flat file structures

Indexed structure
In addition to data file, separate index
file
Contains physical address in data file
of each indexed record

2011 Cengage Learning. All Rights Reserved. May not beHall,


scanned,
copied or duplicated,
3e
or posted to a publicly accessible website, in whole or in part.

Data Structures
Flat file structures
Indexed random file [Figure 8-2]

Records are created without regard to physical


proximity to other related records
Physical organization of index file itself may be
sequential or random
Random indexes are easier to maintain, sequential
more difficult
Advantage over sequential: rapid searches
Other advantages: processing individual records,
efficient usage of disk storage
Efficient: 1, 2, 3, 7

2011 Cengage Learning. All Rights Reserved. May not beHall,


scanned,
copied or duplicated,
3e
or posted to a publicly accessible website, in whole or in part.

Data Structures
Flat file structures
Virtual Storage Access Method (VSAM) [Figure 8-3]
Large files, routine batch processing
Moderate degree of individual record processing
Used for files across cylinders
Uses number of indexes, with summarized content
Access time for single record is slower than Indexed
Sequential or Indexed Random
Disadvantage: does not perform record insertions efficiently
requires physical relocation of all records beyond that point
SOS
Has 3 physical components: indexes, prime data storage area,
overflow area [Figure 8-4]
Might have to search index, prime data area, and overflow
area slowing down access time
Integrating overflow records into prime data area, then
reconstructing indexes reorganizes ISAM files
Very Efficient: 4, 5, 6
Moderately Efficient: 1, 3
Inefficient: 2, 7

2011 Cengage Learning. All Rights Reserved. May not beHall,


scanned,
copied or duplicated,
3e
or posted to a publicly accessible website, in whole or in part.

om
d
n
Ra

Legacy systems

M
A
S

l
a
i
t
n
e
u
q
e
S

1960

DBMS etc.

Legacy systems

1970

1980

1990

EVOLUTION OF ORG./ACCESS METHODS


2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated,
or posted to a publicly accessible website, in whole or in part.

Hall, 3e

Efficient

AM
S
V

l
a
i
t
n
e
u
q
e
S

Ra

nd
o

Inefficient
Access single records
2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated,
or posted to a publicly accessible website, in whole or in part.

Access entire files

Hall, 3e

10

Hashing Structure
Employs algorithm to convert primary key

into physical record storage address


[Figure 8-5]

No separate index necessary


Advantage: access speed
Disadvantage

Inefficient use of storage


Different keys may create same

address

Efficient: 1, 2, 3, 6
Inefficient: 4, 5, 7

2011 Cengage Learning. All Rights Reserved. May not beHall,


scanned,
copied or duplicated,
3e
or posted to a publicly accessible website, in whole or in part.

11

Pointer Structure

Stores the address (pointer) of related record in a


field with each data record [Figure 8-6]

Records stored randomly


Pointers provide connections b/w records
Pointers may also provide links of records b/w files
[Figure 8-7]
Types of pointers [Figure 8-8]:
Physical address actual disk storage location

Advantage: Access speed

Disadvantage: if related record moves, pointer must be changed


& w/o logical reference, a pointer could be lost causing
referenced record to be lost
Relative address relative position in the file (135th)

Must be manipulated to convert to physical address


Logical address primary key of related record

Key value is converted by hashing to physical address

Efficient: 1, 2, 3, 6
Inefficient: 4, 5, 7

2011 Cengage Learning. All Rights Reserved. May not beHall,


scanned,
copied or duplicated,
3e
or posted to a publicly accessible website, in whole or in part.

12

Database Conceptual
Models

Refers to the particular method used to


organize records in a database.
a.k.a. logical data structures

Objective: develop the database efficiently so


that data can be accessed quickly and easily.
There are three main models:
hierarchical (tree structure)
network
relational

Most existing databases are relational. Some


legacy systems use hierarchical or network
databases.

2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated,
or posted to a publicly accessible website, in whole or in part.

Hall, 3e

13

The Relational Model


The relational model portrays data in
the form of two dimensional tables.
Its strength is the ease with which
tables may be linked to one another.

a major weakness of hierarchical and


network databases

Relational model is based on the


relational algebra functions of restrict,
project, and join.

2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated,
or posted to a publicly accessible website, in whole or in part.

Hall, 3e

14

The Relational Algebra Functions


Restrict, Project, and Join

Figure 9-9

2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated,
or posted to a publicly accessible website, in whole or in part.

Hall, 3e

15

Associations and
Cardinality
Association
Represented by a line connecting two entities
Described by a verb, such as ships, requests, or
receives

Cardinality the degree of association


between two entities
The number of possible occurrences in one
table that are associated with a single
occurrence in a related table
Used to determine primary keys and foreign
keys

2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated,
or posted to a publicly accessible website, in whole or in part.

Hall, 3e

16

Examples of Entity Associations

2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated,
or posted to a publicly accessible website, in whole or in part.

Hall, 3e

17

Properly Designed Relational


Tables

Each row in the table must be unique in at


least one attribute, which is the primary
key.

Tables are linked by embedding the primary key


into the related table as a foreign key.

The attribute values in any column must all


be of the same class or data type.
Each column in a given table must be
uniquely named.
Tables must conform to the rules of
normalization, i.e., free from structural
dependencies or anomalies.

2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated,
or posted to a publicly accessible website, in whole or in part.

Hall, 3e

18

Three Types of Anomalies


Insertion Anomaly: A new item cannot
be added to the table until at least one
entity uses a particular attribute item.
Deletion Anomaly: If an attribute item
used by only one entity is deleted, all
information about that attribute item is
lost.
Update Anomaly: A modification on an
attribute must be made in each of the
rows in which the attribute appears.
Anomalies can be corrected by creating
additional relational tables.

2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated,
or posted to a publicly accessible website, in whole or in part.

Hall, 3e

19

Advantages of Relational Tables


Removes

all three types of


anomalies.
Various items of interest
(customers, inventory, sales)
are stored in separate tables.
Space is used efficiently.
Very flexible users can form
ad hoc relationships.
2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated,
or posted to a publicly accessible website, in whole or in part.

Hall, 3e

20

The Normalization Process

A process which systematically splits


unnormalized complex tables into
smaller tables that meet two conditions:

all nonkey (secondary) attributes in the table are


dependent on the primary key
all nonkey attributes are independent of the
other nonkey attributes

When unnormalized tables are split and


reduced to third normal form, they must
then be linked together by foreign keys.

2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated,
or posted to a publicly accessible website, in whole or in part.

Hall, 3e

21

Steps in the Normalization


Process

2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated,
or posted to a publicly accessible website, in whole or in part.

Hall, 3e

22

Accountants and Data


Normalization

Update anomalies can generate conflicting


and obsolete database values.
Insertion anomalies can result in
unrecorded transactions and incomplete
audit trails.
Deletion anomalies can cause the loss of
accounting records and the destruction of
audit trails.
Accountants should understand the data
normalization process and be able to
determine whether a database is properly
normalized.

2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated,
or posted to a publicly accessible website, in whole or in part.

Hall, 3e

23

Six Phases in Designing


Relational Databases
1.

2.

Identify entities

identify the primary entities of the


organization
construct a data model of their
relationships

Construct a data model showing


entity associations

determine the associations between


entities
model associations into an ER
diagram
2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated,
or posted to a publicly accessible website, in whole or in part.

Hall, 3e

24

Six Phases in Designing


Relational Databases
3.

Add primary keys and attributes


assign primary keys to all entities in the
model to uniquely identify records
every attribute should appear in one or
more user views

4.

Normalize and add foreign keys


remove repeating groups, partial and
transitive dependencies
assign foreign keys to be able to link
tables

2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated,
or posted to a publicly accessible website, in whole or in part.

Hall, 3e

25

Six Phases in Designing


Relational Databases
5.

Construct the physical database


create physical tables
populate tables with data

6.

Prepare the user views


normalized tables should support all
required views of system users
user views restrict users from having
access to unauthorized data

2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated,
or posted to a publicly accessible website, in whole or in part.

Hall, 3e

26

Auditors and Data Normalization

Database normalization is a technical matter


that is usually the responsibility of systems
professionals.
The subject has implications for internal control
that make it the concern of auditors also.
Most auditors will never be responsible for
normalizing an organizations databases; they
should have an understanding of the process
and be able to determine whether a table is
properly normalized.
In order to extract data from tables to perform
audit procedures, the auditor first needs to
know how the data are structured.

2011 Cengage Learning. All Rights Reserved. May not beHall,


scanned,
copied or duplicated,
3e
or posted to a publicly accessible website, in whole or in part.

27

Embedded Audit Module (EAM)


Identify important transactions live

while they are being processed and


extract them [Figure 8-26]
Examples
Errors
Fraud
Compliance
SAS 109, SAS 94, SAS 99 / S-OX
2011 Cengage Learning. All Rights Reserved. May not beHall,
scanned,
copied or duplicated,
3e
or posted to a publicly accessible website, in whole or in part.

28

Embedded Audit Module


Disadvantages:
Operational efficiency can decrease

performance, especially if testing is


extensive
Verifying EAM integrity - such as

environments with a high level of


program maintenance
Status: increasing need, demand, and

usage of COA/EAM/CA

2011 Cengage Learning. All Rights Reserved. May not beHall,


scanned,
copied or duplicated,
3e
or posted to a publicly accessible website, in whole or in part.

29

Generalized Audit Software (GAS)


Brief history
Most widely used CAATT [Figure 8-19]
Usages include:
1) Footing and balancing entire files or selected data

items (e.g., extending inventory)

2) Selecting and reporting detail data


3) Selecting stratified statistical samples from data files
4) Formatting results into audit reports (auto work papers!)
5) Printing confirmations
6) Screening / filtering data
2011 Cengage Learning. All Rights Reserved. May not beHall,
scanned,
copied or duplicated,
3e
or posted to a publicly accessible website, in whole or in part.

7) Comparing multiple files for differences

30

Generalized Audit Software


Popular because:
1. GAS software is easy to use and requires
little computer background
2. Many products are platform independent,
works on mainframes and PCs
3. Auditors can perform tests independently
of IT staff
4. GAS can be used to audit the data
currently being stored in most file
structures and formats
2011 Cengage Learning. All Rights Reserved. May not beHall,
scanned,
copied or duplicated,
3e
or posted to a publicly accessible website, in whole or in part.

31

Generalized Audit Software


Simple structures [Figure 8-27]
Complex structures [Figures 8-28, 8-29]
Auditing issues:

Auditor must sometime rely on IT personnel to produce


files/data
Risk that data integrity is compromised by extraction
procedures
Auditors skilled in programming better prepared to avoid
these pitfalls

2011 Cengage Learning. All Rights Reserved. May not beHall,


scanned,
copied or duplicated,
3e
or posted to a publicly accessible website, in whole or in part.

32

ACL
ACL is a proprietary version of GAS
Leader in the industry
Designed as an auditor-friendly meta-

language (i.e., contains commonly


used auditor tests)
Access to data generally easy with

ODBC interface
2011 Cengage Learning. All Rights Reserved. May not beHall,
scanned,
copied or duplicated,
3e
or posted to a publicly accessible website, in whole or in part.

33

ACL
See ACL tutorial #1
Input file definition
Customizing a view

[Figure 8-31]
Filtering data
[Figures 8-34 thru 8-35]
Stratifying data [Figure 8-36]
Statistical analysis
2011 Cengage Learning. All Rights Reserved. May not beHall,
scanned,
copied or duplicated,
3e
or posted to a publicly accessible website, in whole or in part.

34

You might also like