You are on page 1of 13

1 Data Acquisition: It is a process of extracting the relevant business information transforming data into a required business format and

loading into the target system.

A data acquisition is defined with following type of process. 1. Data Extraction 2. Data Transformation 3. Data Loading Data Extraction: It is a process of reading the data from various types of source systems. The following are type of sources. 1. ERP Sources SAP, Oracle Applications, People Soft

2 2. File Sources Flat Files, XML Files 3. Relational Sources Oracle, Sql Server, DB2, Sybase 4. Legacy Sources Mainframes, AS 400, COBAL Files. Data Transformation: It is a process of transforming data and cleansing the data into the required business format. The following are data transformation activities takes place in the stage. 1. Data Cleansing 2. Data Scrubbing 3. Data Aggregation 4. Data Merging Staging Area: Staging area is a temporary memory where the following data transformation activities take place. Data Cleansing: It is a process of changing inconsistencies and inaccuracy. (Or) It is a process of removing unwanted data from staging. Ex: 1. Removing duplicates 2. Records which contains a null 3. Removing spaces 4. Rounding the data.

Data Scrubbing: It is a process of deriving new attributes. Attributes nothing but table columns.

Data Aggregation: It is a process of calculating the summaries from detailed data.

Data Merging: It is a process of integration of data from multiple source system. There are two types of data merge operation takes places. 1. Horizontal Merging 2. Vertical Merging Horizontal Merging: horizontally. Ex: Join Vertical Merging: It is a process of merging the records vertically when the two sources are having same meta data (Union). Meta data means data structures (Two (or) three column names are same). Data Loading: It is the process of inserting the data into a target system. There are two types of data loading. 1. Initial Loading (or) Full Loading. 2. Incremental Loading (or) Delta Loading. Note: In informatica an ETL Plan is call as Mapping. It is a process of merging the records

In data stage it is called as Job.

4 In abinitio it is called as Graph. Initial Loading: We are loading data from source system to load the data into target system first time the records are entered directly to the target system. Incremental Loading: We are entering data from the source system and load into target system first time newly entered the records as well as update the records into the target system. Data Warehouse Data base design: What is Data Warehouse? A data warehouse is a relational database that is designed for query and analysis rather than for transaction processing. It usually contains historical data derived from transaction data, but it can include data from other sources. It separates analysis workload from transaction workload and enables an organization to consolidate data from several sources. A common way of introducing data warehousing is to refer to the characteristics of a data warehouse as set forth by William Inmon:

Subject Oriented Integrated Nonvolatile Time Variant

Subject Oriented:

5 Data warehouses are designed to help you analyze data. For example, to learn more about your company's sales data, you can build a warehouse that concentrates on sales. Using this warehouse, you can answer questions like "Who was our best customer for this item last year?" This ability to define a data warehouse by subject matter, sales in this case, makes the data warehouse subject oriented. Integrated: Integration is closely related to subject orientation. Data warehouses must put data from disparate sources into a consistent format. They must resolve such problems as naming conflicts and inconsistencies among units of measure. When they achieve this, they are said to be integrated. Nonvolatile Nonvolatile means that, once entered into the warehouse, data should not change. This is logical because the purpose of a warehouse is to enable you to analyze what has occurred. Time Variant In order to discover trends in business, analysts need large amounts of data. This is very much in contrast to online transaction processing (OLTP) systems, where performance requirements demand that historical data be moved to an archive. A data warehouse's focus on change over time.

A Data warehouse is design with the following types of Schemas.

6 1. Star Schema 2. Snowflake Schema 3. Galaxy Schema (Hybrid Schema, Multi Star Schema, Integrated Schema). Data Modeling: The process of designing the database is known is data modeling. A database architect (or) data modeler creates database schemas using a GUI base database designing tool called ERWin it is a process of computer associates.

1. Star Schema: A star schema is a database design which contains centrally located Fact Table which is surrounded by dimension tables.

A fact table contains facts. Facts are numeric. Every numeric is not a fact. Facts are business measure because the estimate the business performance. Dimension table gives the detailed information about the fact table attributes.

2. Snowflake Schema: a. In snowflake schema a dimension may have a parent table. b. A large dimension table spitted into one (or) more normalized table (de-composite dimension).

3. Galaxy Schema: a. A data warehouse is a design with integration of multiple star schemas (or) snowflake schemas are both. b. A galaxy schema is also known as hybrid schema (or) Constellation Schema.

9 Fact Constellation: It is a process of joining two fact tables. Conformed Dimension: A dimension table which is used more than one fact table. (or) A dimension table which can be shared by multiple fact tables is known as conformed dimension. Factless Fact table: A fact table without any fact is known as factless fact table. Slowly changing dimension: A dimension which can be changed over the period of time is known as slowly changing dimension. There are three types of dimension. Type 1: The type 1 dimension table stores only current data in the target it doesnt maintain any history. Type 2: The type 2 dimension table maintains the full history in the target for each updates it inserts a new record in the target. Type 3: The type 3 dimension table maintains partial history (Current + Previous information). Refer http://en.wikipedia.org/wiki/Slowly_changing_dimension Surrogate key: A surrogate key is a referential key (artificial key) that is treated as a Primary Key in dimension table. In every dimension table must exist a surrogate key. A surrogate key is a system generated sequence number that is treated as a -primary key. website:

10

Data Mart and Types of Data Mart: A data mart is a subject oriented database which supports the business needs of department specific business managers. (or) A data mart is a subset of Enterprise data warehouse. A data mart is also know as high performance query structure (HPQS). There are two types of data marts. 1. Dependent Data mart 2. Independent Data mart

-------------------Hr -------------------Market Enterprise -------------------Account -------------------Percharge -------------------Sales The enterprise is a integration of various department. Integration of multiple datamart is a enterprise DWH. Top-Down Data warehousing Approach: (Inmon)

11 According to the Inmon first we need to build (or) enterprise data warehouses, from the EDH design subject oriented department specific database known as datamarts.

Bottom-up Data warehousing Approach: (Kimball) According to the Kimball design department specific, subject oriented database known as datamarts, integrate the datamarts to define enterprise data warehouse.

Dependent Datamart: In a top-down approach a datamart development Dependents on enterprise data warehouse hence datamart are known as dependant datamart. Independent Datamart: In a bottom-up approach a datamart development is Independent of enterprise data warehouse hence such datamarts are known as independent datamarts. ODS (Operational Data store): An ODS is an integrated view of multiple OLTP databases.

ODS

DSS (Decision Supporting System)

12

Similarity: Integrated Database Differences: 1. Volatile Data 2. Current Data 3. Detailed Data

Similarity: Integrated Database Differences: 1. Non- Volatile Data 2. History Data 3. Summary Data

OLAP: It is an interface (or) gateway between the user and database. Types of OLAPs: 1. DOLAP (Desktop OLAP): A OLAP which can query the data from a database which is constructed by using desktop databases. Ex: XML file, .txt file, XL desktop databases. 2. ROLAP (Relational OLAP): ROLAP is used to query the data from relational sources like sql, oracle, sysbase, teradata. 3. MOLAP (Multi OLAP): It is used to query the data from multi dimensional sources like CUBE, DMR. 4. HOLAP (Hybrid OLAP): It is a combination ROLAP and MOLAP.

13

You might also like