Professional Documents
Culture Documents
By : Mr.Nilesh Magar Lecturer in Computer Science, MIT- MACS College, Kothrud, Pune-38.
Introduction
Definition : Simplex perception- No more than collection of Key pieces of information used to manage & direct the business for the most profitable outcome. Precise Definition- It concentrate on data- Data should be subject oriented, be consistent across sources & so on. Pearsons Definition- It is more than vast data- it is also process involved in getting that data from source to table & from table to analysts. ** In other word ** A DWH is the data (Meta/fact/dimension/aggregate) and process manager (load/warehouse/query) that make information available, enabling people to make informed decision.
Data-ware housing Architecture : DWH must architected to support three major driving factors. 1) Populating DWH. 2) Day-to-Day management of DWH. 3) The ability to cope with requirement evolution.
Query
Archive data
Processes :
1. Extract & load the data 2. Clean & transform data in to a form that can cope with large data volume & provide good query performance. 3. Back up & Archive data 4. Manage queries & direct them to appropriate data Sources.
Op. Data
Suitable for operational System, May have been modified & extended over yrs to support performance.
Reconstructed
D.W.H
b. Partition data in order to speed up queries, optimize h/w performance& simplify the management of DWH
Process Architecture
Function Extract & load the data, performing simple transformations before & during load.
Warehouse manager
Query Manager
Data
Information
Decision
Operational Data
L O A D M A N A G E R
Summary info
Q U E R Y M A N A G E R
Data dipper
Operational Data
Warehouse Manager
OLAP tools
Load Manager
System Component that perform all the operations necessary to support the extract and load process.
Off-the-Shelf tools, bespoke coding, C programs & Shell script. Size & Complexity will vary between specific solutions from d.h.w to d.h.w., larger the degree of overlap between source systems, the larger the load manager will be.
1) Extract the data from source systems. 2) Fast load the extracted data into a temporary data store. 3) Perform Simple transformations into a structure similar to the one in the data ware house.
Each of these function has to be operate automatically & recover from any error it encounters, to very large extent with no human intervention.
In order get hold of the source data it has to be transfer from Source systems, and made available to D.W.H.. ASCII files are FTP across the LAN. Current gateways tech. operates too slowly to compete to FTP.
Fast Load
Data should be loaded into warehouse in the fastest possible time, in order to minimize the total load window. This becomes critical as the no. of data sources increases and time window shrinks. In practice it is more effective to load the data in to a relational D.B. prior to applying transformation & checks.(ASCII)
Simple Transformation
Before or during the load there will be an opportunity to perform simple Transformations on the data. Here we perform those transformation that does not require complex Logic, or use of relational set operators. Eg: retail management system.: 1) Strip out all the column that are not required in DWH. 2) Convert all the values to the required data types;
Load Manager
Controlling Process
Stored Procedure
Copy management tools
File structure
Fast loader
Ware-house Manager
System Component that perform all the operations necessary to support the Ware house management process.
Third party system management tools, bespoke coding, C programs & Shell script.
As the Load manager size & Complexity of ware-house manager will vary between specific solution. Unlike L.M. the complexity of WH manager is driven by extend to which the operational management of the DHW has been automated.
1) Analyze the data to perform consistency & referential integrity check 2) Transform & merge the source data in to a temporary data source into the Published DWH. 3) Create indexes, business view, partition views & so on. 4) Generate denormalization if appropriate.
Stored Procedure
SQL scripts
Summary tables
Once the data is in temporary Store, the next step is to crate a set of tables identical to the destination table in the DWH.
Ex: if the data in DWH is highly partitioned. As we r abt. to execute substantial constancy check, data should not be loaded until it has been cleaned up.
If consistency check fails Although Relational databases some form rollback, but in practice it is easy to load data in temporary area, clean it up & then publish it to the DWH.
Complex Transformation
Reconcile data
One would expect the index creation time to be significant, even if we need only to create index against fact table partition. Because of this most relational technology have facilities to create indexes in parallel, distributing the load across the H/W & significantly reducing the elapsed time. Overhead of inserting a row into a table.
Ware-house manager has to create a set of the aggregation to speed up query performance. Generated Automatically.
Query manager:
System Component that perform all the operations necessary to support the Query management process.
User access tools, specialist data-ware housing monitoring tools, native data base facilities, bespoke coding, C programs & Shell script.
Size & Complexity will vary between specific solutions. Unlike the L.M. complexity of Q.M. is driven by th extent to which the facilities are provided by user access tools or native DB facilities.
1. Direct queries to the appropriate tables 2. Schedule the execution of the user queries.