You are on page 1of 11

New – Added links to each topic, click on each topic to get to appropriate links

1, 2 Denotes more links

learndatamodeling.com
This is the best site where you can learn everything about Data Warehouse technology. Highly recommended.

 Explain in terms of data warehousing terminology

Please refer to Data Warehouse Concepts



Data Warehouse vs. Data Mart
Kimball vs. Inmon 1
3NF vs. Denormalized
Star vs. Snowflake 1
Dimensional Modeling 1
Definition of the following
 Fact, Dimension, Conformed Dimension, Staging Area, Time Dimension 1 and 2,
Junk Dimension, Slowly Changing Dimension, Factless fact, Conformed fact,
Surrogate key, Hierarchy, Time variant, subject oriented, OLTP, ODS, DSS,
OLAP, ROLAP, MOLAP
 Development Methodology (Top down, bottom up) and their relative benefits
 Environment/Process you are working/worked on:-

 Size of your Data Warehouse/Data Mart


 Version of DB, Type(Oracle, DB2, SQL Server, Teradata)
 Transactions per day (Extraction, Loading).
 How many facts and dimensions?
 Error handling and Support
 Data Profiling.
 Data Cleansing tools
 External scripts, PL/SQL, Shell Scripts
 Scheduling software (Informatica, Cron or any other s/w).
 Number of mappings/mapplets, transformation.
 Complexity like looping, external procedures (C++, Java)
 Workflows, Worklets(if any), Sessions
 Performance tuning methods
 Scripting in terms of calling pmcmd, pmrep(for backup etc)
 FTP Methods(ftp, sftp, ssh, rsync etc)
 In case SAP PowerConnect, FTP or Stream, overriding SAP ABAP code(if have
you done so)
 Team based development
Global Repository, Shortcuts
 Configuration management tools like (CCM, Clearcase, PVCS etc)
 Unit, Volume and Integration testing.
 Migration Compliance, Test/Production Rollout
 Metadata Management
 Project Management

 Business Case Assessment


 Cash flow analysis.
 Return of Investment (ROI).
 Net Present Value(NPV)
 Payback period
 Tool Selection
ETL
Reporting
 Internal Rate of Return(IRR)
 Enterprise Deployment Map.
 Potential Benefits Analysis.
 Roles and Responsibilities.
 Risk Analysis, Costs and Benefits Adjustments
Service Level Agreements (SLA)

 Work Breakdown Structure


 Develop task list and description
 Identify any key internal and external project dependencies.
 Identify critical project milestones
 Identify task owner.
 Estimate timelines.
 Create a timeline.
 Resources Identification
 Identify the generic resources required to complete each task.
 Identify the quantity required, the delivery dates and the project tasks in the WBS

 Construct a Project Schedule

 Critical Success Factors


Common data definitions or Metadata
User Training
User Expectations
 Schedule
 Budget
 Scope
 Performance
 Availability
 Simplicity (ease of use)
 Tool functionality
 Data cleanliness
 Users’ roles and responsibilities
User Involvement
Realistic Schedule
QA Process
Change Control Procedures
 Roles and Responsibilities

 Business User
 Database Administrator
 Data Warehouse Project Manager
 DW Architect
 Support Team(Production,Migration)

 Metrics to evaluate the success of the project/Scorecard


 Pre and Post BI Implementation
Operational Metrics
Latency tolerance
Granularity
Availability
BI Related Metrics
Functional quality
Data quality
System performance
Network performance
User satisfaction
ETL Process Time (Extraction and Loading)
Post processing time
Number of Reports/queries
What data is accessed
Satisfies scope agreement
Benefits achieved
Scalability
Enterprise Metrics (e.g. Computer Manufacturing Unit)
Net revenue generated
Net costs related to the production and distribution,
Customer Loyalty
Number of service calls
Total units sold

 Key Performance Indicators(KPI)


 SMART (Specific, Measurable, Achievable, Realistic, Timely)
 Quantitative indicators which can be presented as a number.
 Practical indicators that interface with existing company processes.
 Directional indicators specifying whether an organization is getting better or
not.
 Actionable indicators are sufficiently in an organization's control to effect
change
 Questions for Data Warehouse Architect

 What is dimensional modeling and how it is different from ERD?


 What are the benefits and pitfalls of snowflaking?
 What methodology did you adopt, Kimball or Inmon and why?
 What are the advantages of the Corporate Information Factory 1 2 3(CIF) architecture
vs. the bus architecture with conformed dimensions? What suits best for your
environment and why?
 What is subject area?
 What is difference between normalization and de-normalization? Can we have a
normalized data warehouse schema?
 Why do we need an ODS?
 How will you plan to load your data into your data warehouse from an ODS vs. an
OLTP environment?
 Assume that you are designing a data warehouse for an insurance company. What
type of SCD you will design for an insurance agent table, where agent keeps moving
across many regions/states.
 How will go about designing star schema for your business?
 In your organization business requirement/practices have changed dramatically and
warrants for major schema refactoring of your existing data warehouse. How will you
handle this situation? For e.g., you need to add couple of dimension keys to existing
fact tables.
 You have a data warehouse setup and your company did a major acquisition/merger
and how will you go about handling the new data flow into your setup.
 What are factless fact tables and give a scenario where you will use it?
 What type of optimization techniques applied at the data model level and why did you
choose them?
 How will you handle data cleansing, validation?
 What is Master Data Management?
 How will you handle data rejects in your Data warehouse?
 Describe common techniques for loading from the staging area/OLTP to the
warehouse when you only have a small window.
 How do you load type 1 dimensions, 2 dimensions
 How would you model unbalanced hierarchies
 How would you model cyclic relations
 What major elements would you include in an audit model?
 How would you implement traceability?
 What steps would you take to improve reporting performance?
 Typical Questions
(Most of these questions are meant for experienced professionals. Fresh candidates
please do not get overwhelmed with these questions.)

 General Questions
 Please refer for more help - http://devisql.blogspot.com
 When you make index ‘unusable’ and now issue a truncate command on the table.
What will happen to status of index, Will it usable or unusable?
 User A has a table ‘emp’ and he creates a view, emp_v on emp with a condition where
emp_salary <100000 and creates a public synonym as emp on emp_v.
o What will user A see when runs select * from emp
o What will user B see when he runs select * from emp?
 tableB has 100 rows and when you use in the where clause, create tableA as select *
from tableB
1. 1=1
2. 1=2.
How many rows will be in tableA in each case?
 You have two tables A and B and you run ‘insert into tableA select * from tableB’ and
then run ‘create table tableC as select * from tableA’. Now if you run ‘delete from
tableB’ and then execute a rollback. How many rows will be there in each of these
tables, tableA, tableB and tableC.
 You have a table named ‘customer’ with 100 million rows, and you need to add a
column with a not-null constraint and a predefined default value to this column. What
is the best approach, what will happen if you run this sql in production, ALTER TABLE
customer ADD (cust_credit_rating number default 1 not null);
 Your sql script must exit with an error should anyone else run other the specified
user. It should be a one -line sql? Hint –on sqlerror, how do you generate this error?
 Why do we use dimensional modeling instead of ER modeling for data warehousing
applications?
 Why can’t we use a copy of our transactional system to meet our data warehousing
needs? If we are unable to use a copy of transaction system as DW, then why don’t we
use it as a staging environment?
 What are the three most important themes in a data warehouse? Give examples?
Partial Answer - Drilling Down, Drilling Across and Handling Time
 Why can’t we have just one single fact table instead of more than one fact table?
 How would you go about deleting 10 million records from 100 million records table in
a production after an erroneous load? Hint – Try alternate method other than delete,
like create table xyz as select * from or if it is a partitioned table, try dropping the
partition.
 Primary key update-
Your company XYZ has acquired a new company. You are asked to load their customer
information into the data warehouse. Your company’s cust_id is numeric, whereas
theirs is a string. How can you handle this situation? How can you handle the
customer records that already exist in your data warehouse? Hint – Use of surrogate
key
 What is Changed Data Capture and how will handle them in case of OLTP
environment like SAP, Oracle Apps?
 What is ODS? Why do you need them? Where would you place them?
 What do you mean by Drilling Up and Drilling around? Hint – subtract and sharing of
related fact
 How would you go about deleting 10 million records from 100 million records table in
a production after an erroneous load? Hint – Try alternate method other than delete,
like create table xyz as select * from or if it is a partitioned table, try dropping the
partition.
 When you extract data from source systems during night, your extract process keep
failing with “snapshot too old” error? How will fix this issue? Hint – your DBA has
provided with alter session privilege
 What is the difference between ‘drop index’ and ‘alter index unusable’ and rebuild?
Why do we prefer second method?
 How do you handle duplicate records from Relational Database?
 You are required create one million records (sequence numbers) into a table using
one sql command? Hint:- Take advantage of Cartesian product and union
 How do you perform debugging in Stored Procedure?
 How can you transform a sub-query involving the IN clause to a Join? E.g. select
emp_no, name from emp where dept_id in (select dept_id where dept_name=’HR’)
 How can you transform a statement involving an OR condition to a UNION ALL? Eg.
select emp_no, name, title from emp where title=’MANAGER’ or ‘DIRECTOR’
 How to get count of the different data values in a column?
 How can you get count/sum ‘ranges of data’ values in a column?
 Give me sql for each of the following
1) TOP N Rows from a table
2) EVERY Nth row from a table
3) Rows X to Y from a table
 Informatica PowerCenter related:
 http://learnbi.com/ or http://leandatamodeling.com
 What are key differences over PowerCenter 7 vs 8? Give one significant change
version 8 is offering.
 What changes do we need to make, for to run version 7 ‘pmcmd’ command on version
8.
 What is purpose of ‘infacmd’ command and domains.infa file?
 Can we create a mapping without using a source qualifier? Hint - Normalizer
 Can we use a flat file as a lookup?
 How can you transpose rows to columns and vice versa and what transformations
would you choose?
 In the update strategy, instead of using DD_INSERT I put the value as 1. Will this
work?
 If you check all the ports as group by in the Aggregator transformation, how many
rows would be output?
 How can you limit number of running sessions in a workflow?
 What is the difference between Union and Joiner Transformation?
 Your source data is a flat file and it has empid, deptid and 12 columns for salaries of
each month (jan-dec). Now you requirement is to create one record for each month
and also running total of the salary. What type of transformations you will use?
 What is the use of indirect file list in PowerCenter?
 In your environment, there are several mappings in version 7 and these mappings use
external procedures (AEP). You need to move these mappings to version 8. What
steps do you take?
 Your customer has a requirement that in a day you should not run workflow more
than once. How can you achieve this by means of workflow control?
 You have a requirement to alert you of any long running sessions in your workflow.
How can you create a workflow that will send you email for sessions running more
than 30 minutes. You can use any method, shell script, procedure or Informatica
mapping or workflow control.
 You want to attach a file as an email attachment from a particular directory using
‘email task’ in Informatica, How will you do it? Hint –%a
 You have two sets of sessions (mappings) in a workflow. You wish to run one set and
you want to selectively ‘turn-off’ the second set of mappings by means of ‘just’ one
parameter so that these sessions/mappings would just run and exit with no data. How
can you manage this using a parameter file? Hint 1=2
 One of your source file you load contains a formula column, emp_id, emp_name,
dept_id, salary, commission_formula (formula_column) ,salary_date.
123, King, d01, 4500, 2*3, 01/20/2005
453, Scott, d02, 8500, 4.5/2*1.4, 01/31/2006
You need to evaluate the commission formula and calculate the commission with the
salary (i.e., commission_formula*salary) to calculate commission. How can you
accomplish this?
 Your mapping is loading the target data in a flat file, but there are many new line
characters that cause your application to fail. How would handle these characters
before they are loaded inside flat file? Hint – Ascii equivalent of CR- CHR(13), CHR(10)
 How to delete duplicate records from source database/Flat Files? Can we use post sql
to delete these records. In case of flat file, how can you delete duplicates before it
starts loading?
 Join two or more tables and then pull out two columns from each table into the source
qualifier. Now, pull out one column from the source qualifier into an Expression
transformation and then do a "generate SQL" in the source qualifier, how many
columns will show up in the generated SQL?
 You are asked to document source to target dependency of all Informatica mappings
and you know there are some repository views (REP_TBL_MAPPING) provided by
Informatica can do this job. But these views return no rows. How can you rectify this
issue? Hint - MX Data, what change would do to load OPB_TARG_TBL_EXPR which provides a crucial link
to REP_TBL_MAPPING?
 You are required to perform “bulk loading” using Informatica on Oracle, what action
would perform at Informatica + Oracle level for a successful load? Hint –alter session
privilege, Environment SQL (where is it?)
 Your session failed and when you try to open a log file, it complains that the session
details are not available. How would do trace the error? What log file would you seek
for? Hint – What other types of logs are available in that server?
 Your business user says revenue report does not tally with SAP (source system)
report though the ETL process did not fail today. How will identify the exact issue?
Hint – reject records, primary key
 There are around 100 sessions in a workflow. You have specified variables in the
sessions, worklets and workflows. These variables are all over the place. You are
supposed to change variable names from variable(x) to myvariable(x+1). What
approach would you take? Hint – Export XML file and use of regular expressions?
 When you export a workflow from Repositor Manager, what does this xml contain?
Workflow only? Hint – Question itself contains the hint?
 What is difference between Mapping parameter and variable?
 How do you set a mapping variable during session run? How do you reset them?
 What are the pitfalls of using Sequence Generator transformation or in general, why
do we avoid?
 What precautions do you need take when you use reusable Sequence generator
transformation for concurrent sessions?
 Tell me what are alternative methods of Sequence Generator transformation? How
will go about using the same when you use them in concurrent sessions?
 Is it possible negative increment in Sequence Generator? If yes, how would you
accomplish it? Hint – Expression
 How can you handle sequence generation over 2 billion records?
 Which directory Informatica looks for parameter file and what happens if it is missing
when start the session? Does session stop after it starts? Hint – Does your source qualifier
have dependency on mapping parameter?
 What happens when a particular parameter is missing?
 Informatica is complaining about the server could not be reached? What steps would
you take? Hint – Hosts file
 You observe lately some sluggishness in getting repository objects, what action would
you perform? Hint –Repository objects
 Informatica server suddenly stops after starting? How can you rectify this issue? Hint –
Server variables, path?
 What is a parameter file and how would dynamically create them using a mapping
and what transformation would you choose? Hint – Normalizer
 What is the Difference between connected & unconnected lookup
 If you have more than two sources, how will use the joiner transformation to join
these sources?
 If you have more than one pipeline in your mapping how will change the order of
load?
 What approach would you take so that your source qualifier SQL override is database
independent? Hint – Next question
 What is an ANSI SQL?
 What is a mapplet?
 What are an active and a passive transformation?
 Can we use an active transformation inside a mapplet?
 What is the difference between Source Qualifier and Joiner?
 How do you override SQL in Lookup?
 How do you change order by clause in lookups?
 How many types of lookups are there?
 What is a dynamic lookup and what is the significance of NewLookupRow? How will
use them for rejecting duplicate records?
 What are benefits over connected vs. unconnected or vice versa?
 In an unconnected lookup can it only have an output port and will the mapping work
if you don’t check return port and why? Hint – Try un-checking Return port
 You have more five mappings use the same lookup. How can you manage the lookup?
Hint – Persistence
 What are the type of caches are available in lookups?
 How will you improve the performance of a lookup?
 What is a sorted Input option in Source Qualifier?
 How will you increase the performance of an aggregator transformation other than
using sorted input?
 What is an Incremental Aggregation?
 What is an Incremental Loading?
 What will happen if you copy the mapping from one repository to another repository
and if there is no identical source?
 Difference between router & filter transformation and where do you need to place
filter transformation to get better performance?
 What is joiner?
 How do you perform recovery? What happens internally?
 Which process writes the information into repository tables?
 What is difference between an abort and stop in a session?
 What is DTM?
 What is update strategy? What is data driven?
 An Aggregate transformation has 4 ports (l sum (col 1), group by col 2, col3), which
port should be the output?
 What is Parameter file and explain scenario when u use?
 What is the difference between surrogate key and primary key?
 Oracle 10g New Features:-

1) DB configuration assistant
2) Renaming Tablespaces
3) Drop Database
4) Automatic undo retention
5) Allows you to manually shrink the overall size of a table, removing unused space thus
resetting the high-water mark
6) Automated collection of statistics, collection of data dictionary statistics
7) Automatic SGA Tuning
8) Sorted Hash Clusters
9) Virtual Private Database with New features
Column-level privacy
New VPD policies
Support for parallel query

10) Database recovery


Easier recovery through the resetlogs command
Changes to the alter database archivelog command
New and changed Oracle Database 10g backup commands

11) Flashback Database

12) Oracle Data Pump previously exp and imp


Data Pump provides new methods of moving data between databases
Importing Specific Database Object Types, INCLUDE=TABLE:''LIKE 'EMP%'"
Restartability

13) Bigfile Tablespaces


BFT varies based on the database block size. It can range anywhere from 8
terabytes to 128 terabytes
14) Cross-Platform Transportable Tablespaces
15) Enhanced Merge Functionality
Optional delete Clause
on Condition

16) New Materialized View, Query Rewrite, and Summary Management DDL Features
17) Partitioning New Features
18) SQL and PL/SQL regular expressions
 Oracle 11g New Features:-

1) Automatic Memory Management


a. The memory_target Parameter
b. The memory_max_target Parameter
2) Automatic Diagnostic Repository (ADR) for fault diagnosability
3) Snapshot Databases –
Allow you to open a physical standby database, and change data and structures in that database
4) Datapump New Features
Deprecation of Export utility
Compression of dump file sets
Encryption improvement
5) Oracle Database Advisors
6) Partitioning
a. Interval partitioning
b. Extended composite partitioning
i. Composite range-range partitioning
ii. Composite list-range partitioning
iii. Composite list-hash partitioning
iv. Composite list-list partitioning
c. Reference partitioning
i. form a parent-child relationship and logically equi-partition these
tables
d. System partitioning
e. System-managed domain indexes
7) Virtual Columns
8) The Pivot and Unpivot Clauses
9) Invisible Indexes
An invisible index is an index that the optimizer cannot see and therefore it will not consider while
it generates the execution plans.

You might also like