You are on page 1of 7

Chapter 8: The Physical Data Warehouse

The VLDB Very Large Database


If you have only a small window of opportunity to backup your database each day (based
on user inactivity, or shut-down times), and the database is too large to be backed up in
that time frame, your database is a VLDB.
Window of Opportunity
The window of opportunity is the amount of time in a 24-hour period that the database is
quiet and within which nightly management tasks can be applied.
Implementing a VLDB
The single most effective way to manage the VLDB is to break it up into smaller pieces
and exercise care when allocating space. These pieces are called partitions.
Partitions
Partitioning breaks up a single table into smaller pieces. Each of these pieces is an object
in its own right. Therefore, you can query them, back them up, or recover each partition
on its own. This makes working with large tables much easier.
There are three methods to partition table data in Oracle 8i.
Range partitioning
Hash partitioning and
Composite portioning.
Range Partitioning
This method separates data based on a range of data. For example, data that is distributed
across time (such as sales data) could be divided into partitions based on the month. Each
mini-table or partition would contain data for a particular month.
Hash Partitioning
Hash partitioning distributes data among partitions via a hash function. This function
takes column input from the incoming data, applies a function to it, and places the data
into a partition based on the resultant value of the function.
This value allows you to partition data that does not easily fall into nice ranges.
Composite Partitioning/Subpartitioning
A subpartition is simply a partition within another partition.

Oracle 8i has created composite partitioning as a combination of range and hash


partitioning. Oracle uses range partitions as a first cut to distribute data. It then applies a
hash function to further divide it.
Nested Table
A nested table is a table whose one of the columns is a complete table structure.
It essentially allows you to perform a one-to-many join operation without the join.
Transportable Tablespaces
Many times in a data warehouse environment there is a need to move complete tables (or
a collection of tables, also known as a tablespace) from one database to another.
A copy of data may be necessary to help create a federal data mart, to move data from an
ODS to another data warehouse object, or to archive or purge within the data warehouse.
Oracle has introduced transportable tablespaces that allow the database to export/import
only the metadata for a tablespace and not the actual data itself.
Oracle 8i allows us to use any mechanism available to copy data from one database to
another. High-speed utilities such as operating system copy facilities can be used to move
the actual data.
Systematic Denormalization
As data is moved from operational systems into the warehouse, it goes through a process
called systematic denormalization that violates all the rules that relational database
architects apply when modeling most transactional systems.
Systematic denormalization is done to enhance the performance of the warehouse by
reducing the need to understand every possible join path (thus we are accurately able to
per-index the database) and by reducing join operations that are resource intensive.
Materialized Views
Materialized Views are very similar to regular database views with one important
difference: they actually store the data, not just the SQL definition of the data. The fact
that the materialized views precompute the results of the query that defines them greatly
aids in the performance of the data warehouse.
In Oracle 8i, materialized views are really meant to store aggregate (presummarized)
information. Materialized views are designed completely transparent to end-users.

For example, suppose we want to aggregate REVENUE table. The DDL (Data Dictionary
Lock) to create the view would look similar to this:
create materialized view revenue_summary
build immediate
refresh complete
as
select r.product_id, r.location_id, sum(r.revenue_dollars) sum_revenue_dollars
from revenue r
group by r.product_id, r.location_id;

This statement will define and populate the materialized view revenue_summary
immediately upon executing the script. In addition, any time any information is
committed with REVENUE table, the materialized view is also updated.
For Oracle 8i to perform a fast refresh (that is, one that doesnt rebuild the whole table),
the DBA must define a materialized view log for the source table. For the above
revenue_summary example, the log would be created as follows:
create materialized view log on revenue
with rowed (product_id, location_id, revenue_dollars)
including new values;

Parallelism
Parallelism involves the ability of software to take advantage of multi-CPU machines to
reduce response time when queries are passed to the database engine for processing.
The two most commonly used multi-CPU machines are:

Massively Parallel Processor (MPP) and


Symmetric Multi Processor (SMP).

Massively Parallel Processor (MPP)


Massively Parallel Processor (MPP) architecture has many separate nodes hooked
together by a high-speed interconnect mechanism. An MPP node consists of one or more
processor(s), local memory, and, sometimes, a local disk. The operating system runs
separately on each node.
Symmetric Multi Processor (SMP)
Symmetric Multi Processor (SMP) architecture involves more than one CPU utilizing a
common memory and disk. The operating system runs concurrently as one image across
multiple CPUs. The operating system provides scheduling so that tasks execute on all
CPUs in a symmetric fashion.
Parallelism and the Warehouse

A number of operations can be parallelized using Oracle 8i, some of them are crucial to
the loading, transforming and population of the data warehouse. Oracle 8i can parallelize
more than 20 operations. Some of the operations are:
i.
ii.
iii.
iv.
v.
vi.
vii.
viii.
ix.
x.
xi.

Table scan
Not in
Group by
Select distinct
Aggregation
Order by
Create table as select
Index Maintenance
Inserting rows from other tables
Enabling constraints
Star optimization

Degree of Parallelism
The degree of parallelism is the number of query process associated with a single
operation.
It can be set at:
The statement level or
The object level.
At the Statement Level
This can be accomplished during any phase of data warehouse activity by using hints.
Hints are special keywords used to influence the way optimizer process queries.
The optimizer is a set of routines enlivened when query is passed to Oracle; the optimizer
ensures the most efficient processing is performed on the query based on the nature of the
data in the tables the query references.
Using hints, the developer can influence the degree of parallelism to be used on a query
and what structures should be parallelized to what degree.
At the Object Level
This is the best place to define degree of parallelism. The familiar create table statement
includes a parallel (degree n) clause, where n refers to the optimal number of query
processes that will be used to process queries against the process.
Turning on Parallel Query at the Instance Level

A database administrator places the following entries in the initialization parameter file:

PARALLEL_MIN_SERVERS
PARALLEL_MAX_SERVERS
PARALLEL_SERVER_IDLE_TIME

PARALLEL_MIN_SERVERS
This entry determines the number of query processes to spawn when the database is
started. There is an additional requirement in memory to initiate and keep these
processes running.
When run on UNIX machine, the query server processes will be identified by P000 to
P00XX where XX equal the setting for the parameter minus one. Thus, a setting of 12
will spawn query process P000 to P011.
PARALLEL_MAX_SERVERS
This determines maximum number of query processes that can be initiated if extra
processes are required over and above set by the previous parameter.
PARALLEL_MAX_SERVERS is a cumulative number such that it specifies the total
number to run, not the number of extra to start.
Oracle will spawn processes over the minimum setting up to the maximum setting if
query processing requires more than the minimum. Suppose the former is set to 12 and
the latter to 24; Oracle will spawn up to 12 additional processes when needed.
PARALLEL_SERVER_IDLE_TIME:
This parameter sets out a time, at the expiration of which the extra query processes up to
and including those designated by PARALLEL_MAX_SERVERS will be killed off by
Oracle.
It is measured in minutes, and it is very possible that in a warehouse with sporadically
heavy usage-the number of P0XX processes could vary during different monitoring times
of day.
Tablespace Segregation
There is a logical rather than physical association between the files making up each
tablespace. There are two types of tablespaces require in the warehouse:

System Support Tablespaces


Application Tablespaces

System Support Tablespaces


Oracle uses five system support table spaces.
They are defined as follows:

SYSTEM
The heart of the database, this tablespace contains the data dictionary and the
objects owned by the special users SYS and SYSTEM.
ROLLBACK
This is the tablespace where the rollback segments are stored. These are used to
save pre-updated copies of rows massaged by users of database before they
commit or rollback a transaction. Commit is the activity of saving changes to the
database; rollback is the act of rolling transaction back to the state they were in
before the changes were initiated.
TEMPORARY
This is Oracles scratch pad, where temporary tables are created for the life of the
processing cycle of each query, then cleaned up when no longer required.
TOOLS
This is where Oracle places tables that are used by its own tools delivered with
the database; they suggest you place where vendors objects here as well, rather
than in the SYSTEM tablespace.
USERS
This area is set aside for any non-system objects required by application.

Application Tablespace
The tablespaces that contain the warehouse data must be created manually using an
interface like Oracle Enterprise Manager.
Guidelines for setting up Tablespaces
Estimate the space required for your data and indexes. Formulae for these
calculations are in works such as the Oracle 8i Server Administrators Guide.
Often, the row sizes in the warehouse can be estimated based on the source of
their operational counterparts.

Create Oracle accounts to be the keeper of the data for each section of your data
warehouse or each individual data mart on its own.
Separate the data and index containers.

National Language Support


National Language Support is used to support the storage and display of extended
characters in the Oracle 8i database.
The most commonly used NLS parameters are:
NLS_DATE_FORMAT
NLS_TERRITORY

You might also like