Informatica Interview Questioner-Ambarish

Informatica Questionnaire
Business Intelligence ambarish Page 1 of 211


1. What are the components of Informatica? And what is the purpose of each?
Ans: Informatica Designer, Server Manager & Repository Manager. Designer for Creating
Source & Target definitions, Creating Mapplets and Mappings etc. Server Manager for
creating sessions & batches, Scheduling the sessions & batches, Monitoring the triggered
sessions and batches, giving post and pre session commands, creating database connections to
various instances etc. Repository Manage for Creating and Adding repositories, Creating &
editing folders within a repository, Establishing users, groups, privileges & folder
permissions, Copy, delete, backup a repository, Viewing the history of sessions, Viewing the
locks on various objects and removing those locks etc.

2. What is a repository? And how to add it in an informatica client?
Ans: Its a location where all the mappings and sessions related information is stored.
Basically its a database where the metadata resides. We can add a repository through the
Repository manager.

3. Name atleast 5 different types of transformations used in mapping designing and state the
use of each.
Ans: Source Qualifier Source Qualifier represents all data queries from the source,
Expression Expression performs simple calculations,
Filter Filter serves as a conditional filter,
Lookup Lookup looks up values and passes to other objects,
Aggregator - Aggregator performs aggregate calculations.

4. How can a transformation be made reusable?
Ans: In the edit properties of any transformation there is a check box to make it reusable, by
checking that it becomes reusable. You can even create reusable transformations in
Transformation developer.

5. How are the sources and targets definitions imported in informatica designer? How to
create Target definition for flat files?
Ans: When you are in source analyzer there is a option in main menu to Import the source
from Database, Flat File, Cobol File & XML file, by selecting any one of them you can
import a source definition. When you are in Warehouse Designer there is an option in main
menu to import the target from Database, XML from File and XML from sources you can
select any one of these.
There is no way to import target definition as file in Informatica designer. So while creating
the target definition for a file in the warehouse designer it is created considering it as a table,
and then in the session properties of that mapping it is specified as file.

6. Explain what is sql override for a source table in a mapping.
Ans: The Source Qualifier provides the SQL Query option to override the default query. You
can enter any SQL statement supported by your source database. You might enter your own
SELECT statement, or have the database perform aggregate calculations, or call a stored
procedure or stored function to read the data and perform some tasks.

7. What is lookup override?

Ans: This feature is similar to entering a custom query in a Source Qualifier transformation.
When entering a Lookup SQL Override, you can enter the entire override, or generate and
edit the default SQL statement. The lookup query override can include WHERE clause.
8. What are mapplets? How is it different from a Reusable Transformation?
Ans: A mapplet is a reusable object that represents a set of transformations. It allows you to
reuse transformation logic and can contain as many transformations as you need. You create
mapplets in the Mapplet Designer. Its different than a reusable transformation as it may
contain a set of transformations, while a reusable transformation is a single one.

9. How to use an oracle sequence generator in a mapping?
Ans: We have to write a stored procedure, which can take the sequence name as input and
dynamically generates a nextval from that sequence. Then in the mapping we can use that
stored procedure through a procedure transformation.

10. What is a session and how to create it?
Ans: A session is a set of instructions that tells the Informatica Server how and when to move
data from sources to targets. You create and maintain sessions in the Server Manager.
11. How to create the source and target database connections in server manager?
Ans: In the main menu of server manager there is menu Server Configuration, in that there
is the menu Database connections. From here you can create the Source and Target
database connections.

12. Where are the source flat files kept before running the session?
Ans: The source flat files can be kept in some folder on the Informatica server or any other
machine, which is in its domain.

13. What are the oracle DML commands possible through an update strategy?
Ans: dd_insert, dd_update, dd_delete & dd_reject.

14. How to update or delete the rows in a target, which do not have key fields?
Ans: To Update a table that does not have any Keys we can do a SQL Override of the Target
Transformation by specifying the WHERE conditions explicitly. Delete cannot be done this
way. In this case you have to specifically mention the Key for Target table definition on the
Target transformation in the Warehouse Designer and delete the row using the Update
Strategy transformation.

15. What is option by which we can run all the sessions in a batch simultaneously?
Ans: In the batch edit box there is an option called concurrent. By checking that all the
sessions in that Batch will run concurrently.

16. Informatica settings are available in which file?
Ans: Informatica settings are available in a file pmdesign.ini in Windows folder.

17. How can we join the records from two heterogeneous sources in a mapping?
Ans: By using a joiner.


18. Difference between Connected & Unconnected look-up.
Ans: An unconnected Lookup transformation exists separate from the pipeline in the
mapping. You write an expression using the :LKP reference qualifier to call the lookup
within another transformation. While the connected lookup forms a part of the whole flow of
mapping.

19. Difference between Lookup Transformation & Unconnected Stored Procedure
Transformation Which one is faster ?

20. Compare Router Vs Filter & Source Qualifier Vs Joiner.
Ans : A Router transformation has input ports and output ports. Input ports reside in the input
group, and output ports reside in the output groups. Here you can test data based on one or
more group filter conditions.
But in filter you can filter data based on one or more conditions before writing it to targets.
A source qualifier can join data coming from same source database. While a joiner is used to
combine data from heterogeneous sources. It can even join data from two tables from same
database.
A source qualifier can join more than two sources. But a joiner can join only two sources.

21. How to Join 2 tables connected to a Source Qualifier w/o having any relationship defined
?
Ans : By writing an sql override.

22. In a mapping there are 2 targets to load header and detail, how to ensure that header loads
first then detail table.
Ans : Constraint Based Loading (if no relationship at oracle level) OR Target Load Plan(if
only 1 source qualifier for both tables) OR select first the header target table and then the
detail table while dragging them in mapping.

23. A mapping just take 10 seconds to run, it takes a source file and insert into target, but
before that there is a Stored Procedure transformation which takes around 5 minutes to
run and gives output Y or N. If Y then continue feed or else stop the feed. (Hint :
since SP transformation takes more time compared to the mapping, it shouldnt run row
wise).
Ans : There is an option to run the stored procedure before starting to load the rows.

1.Can 2 Fact Tables share same dimensions Tables? How many Dimension tables are
associated with one Fact Table ur project?
Ans: Yes

2.What is ROLAP, MOLAP, and DOLAP...?
Ans: ROLAP (Relational OLAP), MOLAP (Multidimensional OLAP), and DOLAP (Desktop
OLAP). In these three OLAP
architectures, the interface to the analytic layer is typically the same; what is quite different
is how the data is physically stored.
In MOLAP, the premise is that online analytical processing is best implemented by storing
the data multidimensionally; that is,
data must be stored multidimensionally in order to be viewed in a multidimensional manner.
In ROLAP, architects believe to store the data in the relational model; for instance, OLAP
capabilities are best provided

against the relational database.
DOLAP, is a variation that exists to provide portability for the OLAP user. It creates
multidimensional datasets that can be
transferred from server to desktop, requiring only the DOLAP software to exist on the target
system. This provides significant
advantages to portable computer users, such as salespeople who are frequently on the road
and do not have direct access to
their office server.

3.What is an MDDB? and What is the difference between MDDBs and RDBMSs?
Ans: Multidimensional Database There are two primary technologies that are used for storing
the data used in OLAP applications.
These two technologies are multidimensional databases (MDDB) and relational databases
(RDBMS). The major difference
between MDDBs and RDBMSs is in how they store data. Relational databases store their
data in a series of tables and
columns. Multidimensional databases, on the other hand, store their data in a large
multidimensional arrays.
For example, in an MDDB world, you might refer to a sales figure as Sales with Date,
Product, and Location coordinates of
12-1-2001, Car, and south, respectively.

Advantages of MDDB:
Retrieval is very fast because
The data corresponding to any combination of dimension members can be retrieved with
a single I/O.
Data is clustered compactly in a multidimensional array.
Values are caluculated ahead of time.
The index is small and can therefore usually reside completely in memory.
Storage is very efficient because
The blocks contain only data.
A single index locates the block corresponding to a combination of sparse dimension
numbers.

4. What is MDB modeling and RDB Modeling?
Ans:

5. What is Mapplet and how do u create Mapplet?
Ans: A mapplet is a reusable object that represents a set of transformations. It allows you to reuse
transformation logic and can
contain as many transformations as you need.
Create a mapplet when you want to use a standardized set of transformation logic in several
mappings. For example, if you
have a several fact tables that require a series of dimension keys, you can create a mapplet
containing a series of Lookup
transformations to find each dimension key. You can then use the mapplet in each fact table
mapping, rather than recreate the
same lookup logic in each mapping.
To create a new mapplet:
1. In the Mapplet Designer, choose Mapplets-Create Mapplet.

2. Enter a descriptive mapplet name.
The recommended naming convention for mapplets is mpltMappletName.
3. Click OK.
The Mapping Designer creates a new mapplet in the Mapplet Designer.
4. Choose Repository-Save.

6. What for is the transformations are used?
Ans: Transformations are the manipulation of data from how it appears in the source system(s)
into another form in the data
warehouse or mart in a way that enhances or simplifies its meaning. In short, u transform
data into information.

This includes Datamerging, Cleansing, Aggregation: -
Datamerging: Process of standardizing data types and fields. Suppose one source system
calls integer type data as smallint
where as another calls similar data as decimal. The data from the two source systems needs
to rationalized when moved into
the oracle data format called number.
Cleansing: This involves identifying any changing inconsistencies or inaccuracies.
- Eliminating inconsistencies in the data from multiple sources.
- Converting data from different systems into single consistent data set suitable for
analysis.
- Meets a standard for establishing data elements, codes, domains, formats and naming
conventions.
- Correct data errors and fills in for missing data values.
Aggregation: The process where by multiple detailed values are combined into a single
summary value typically summation numbers representing dollars spend or units sold.
- Generate summarized data for use in aggregate fact and dimension tables.

Data Transformation is an interesting concept in that some transformation can occur
during the extract, some during the
transformation, or even in limited cases--- during load portion of the ETL process.
The type of transformation function u
need will most often determine where it should be performed. Some transformation
functions could even be performed in more
than one place. Bze many of the transformations u will want to perform already exist in
some form or another in more than
one of the three environments (source database or application, ETL tool, or the target db).

7. What is the difference btween OLTP & OLAP?
Ans: OLTP stand for Online Transaction Processing. This is standard, normalized database
structure. OLTP is designed for
Transactions, which means that inserts, updates, and deletes must be fast. Imagine a call
center that takes orders. Call takers are continually taking calls and entering orders that may
contain numerous items. Each order and each item must be inserted into a database. Since the
performance of database is critical, we want to maximize the speed of inserts (and updates
and deletes). To maximize performance, we typically try to hold as few records in the
database as possible.

OLAP stands for Online Analytical Processing. OLAP is a term that means many things to
many people. Here, we will use the term OLAP and Star Schema pretty much

interchangeably. We will assume that star schema database is an OLAP system.( This is
not the same thing that Microsoft calls OLAP; they extend OLAP to mean the cube
structures built using their product, OLAP Services). Here, we will assume that any system
of read-only, historical, aggregated data is an OLAP system.

A data warehouse(or mart) is way of storing data for later retrieval. This retrieval is almost
always used to support decision-making in the organization. That is why many data
warehouses are considered to be DSS (Decision-Support Systems).

Both a data warehouse and a data mart are storage mechanisms for read-only, historical,
aggregated data.
By read-only, we mean that the person looking at the data wont be changing it. If a user
wants at the sales yesterday for a certain product, they should not have the ability to change
that number.

The historical part may just be a few minutes old, but usually it is at least a day old.A data
warehouse usually holds data that goes back a certain period in time, such as five years. In
contrast, standard OLTP systems usually only hold data as long as it is current or active.
An order table, for example, may move orders to an archive table once they have been
completed, shipped, and received by the customer.

When we say that data warehouses and data marts hold aggregated data, we need to stress
that there are many levels of aggregation in a typical data warehouse.

8. If data source is in the form of Excel Spread sheet then how do use?
Ans: PowerMart and PowerCenter treat a Microsoft Excel source as a relational database, not a
flat file. Like relational sources,
the Designer uses ODBC to import a Microsoft Excel source. You do not need database
permissions to import Microsoft
Excel sources.
To import an Excel source definition, you need to complete the following tasks:
Install the Microsoft Excel ODBC driver on your system.
Create a Microsoft Excel ODBC data source for each source file in the ODBC 32-bit
Administrator.
Prepare Microsoft Excel spreadsheets by defining ranges and formatting columns of
numeric data.
Import the source definitions in the Designer.
Once you define ranges and format cells, you can import the ranges in the Designer. Ranges
display as source definitions
when you import the source.

9. Which db is RDBMS and which is MDDB can u name them?
Ans: MDDB ex. Oracle Express Server(OES), Essbase by Hyperion Software, Powerplay by
Cognos and
RDBMS ex. Oracle , SQL Server etc.

10. What are the modules/tools in Business Objects? Explain theier purpose briefly?
Ans: BO Designer, Business Query for Excel, BO Reporter, Infoview,Explorer,WEBI, BO
Publisher, and Broadcast Agent, BO
ZABO).

InfoView: IT portal entry into WebIntelligence & Business Objects.
Base module required for all options to view and refresh reports.
Reporter: Upgrade to create/modify reports on LAN or Web.
Explorer: Upgrade to perform OLAP processing on LAN or Web.
Designer: Creates semantic layer between user and database.
Supervisor: Administer and control access for group of users.
WebIntelligence: Integrated query, reporting, and OLAP analysis over the Web.
Broadcast Agent: Used to schedule, run, publish, push, and broadcast pre-built reports and
spreadsheets, including event
notification and response capabilities, event filtering, and calendar based
notification, over the LAN, e-
mail, pager,Fax, Personal Digital Assistant( PDA), Short Messaging
Service(SMS), etc.
Set Analyzer - Applies set-based analysis to perform functions such as execlusion,
intersections, unions, and overlaps
visually.
Developer Suite Build packaged, analytical, or customized apps.

11.What are the Ad hoc quries, Canned Quries/Reports? and How do u create them?
(Plz check this pageC\:BObjects\Quries\Data Warehouse - About Queries.htm)
Ans: The data warehouse will contain two types of query. There will be fixed queries that are
clearly defined and well understood, such as regular reports, canned queries (standard
reports) and common aggregations. There will also be ad hoc queries that are
unpredictable, both in quantity and frequency.

Ad Hoc Query: Ad hoc queries are the starting point for any analysis into a database. Any
business analyst wants to know what is inside the database. He then proceeds by calculating
totals, averages, maximum and minimum values for most attributes within the database.
These are unpredictable element of a data warehouse. It is exactly that ability to run any
query when desired and expect a reasonable response that makes the data warhouse
worthwhile, and makes the design such a significant challenge.
The end-user access tools are capable of automatically generating the database query that
answers any Question posed by the user. The user will typically pose questions in terms that
they are familier with (for example, sales by store last week); this is converted into the
database query by the access tool, which is aware of the structure of information within the
data warehouse.
Canned queries: Canned queries are predefined queries. In most instances, canned queries
contain prompts that allow you to customize the query for your specific needs. For example,
a prompt may ask you for a School, department, term, or section ID. In this instance you
would enter the name of the School, department or term, and the query will retrieve the
specified data from the Warehouse.You can measure resource requirements of these queries,
and the results can be used for capacity palnning and for database design.
The main reason for using a canned query or report rather than creating your own is that your
chances of misinterpreting data or getting the wrong answer are reduced. You are assured of
getting the right data and the right answer.
12. How many Fact tables and how many dimension tables u did? Which table precedes
what?

Ans: http://www.ciobriefings.com/whitepapers/StarSchema.asp

13. What is the difference between STAR SCHEMA & SNOW FLAKE SCHEMA?
Ans: http://www.ciobriefings.com/whitepapers/StarSchema.asp

14. Why did u choose STAR SCHEMA only? What are the benefits of STAR SCHEMA?
Ans: Because its denormalized structure , i.e., Dimension Tables are denormalized. Why to
denormalize means the first (and often only) answer is : speed. OLTP structure is designed for
data inserts, updates, and deletes, but not data retrieval. Therefore, we can often squeeze some
speed out of it by denormalizing some of the tables and having queries go against fewer tables.
These queries are faster because they perform fewer joins to retrieve the same recordset.
Joins are also confusing to many End users. By denormalizing, we can present the user with a
view of the data that is far easier for them to understand.

Benefits of STAR SCHEMA:
Far fewer Tables.
Designed for analysis across time.
Simplifies joins.
Less database space.
Supports drilling in reports.
Flexibility to meet business and technical needs.

15. How do u load the data using Informatica?
Ans: Using session.

16. (i) What is FTP? (ii) How do u connect to remote? (iii) Is there another way to use
FTP without a special utility?
Ans: (i): The FTP (File Transfer Protocol) utility program is commonly used for copying files to
and from other computers. These
computers may be at the same site or at different sites thousands of miles apart. FTP is
general protocol that works on UNIX
systems as well as other non- UNIX systems.

(ii): Remote connect commands:
ftp machinename
ex: ftp 129.82.45.181 or ftp iesg
If the remote machine has been reached successfully, FTP responds by asking for a
loginname and password. When u enter
ur own loginname and password for the remote machine, it returns the prompt like below
ftp>
and permits u access to ur own home directory on the remote machine. U should be able to
move around in ur own directory
and to copy files to and from ur local machine using the FTP interface commands.
Note: U can set the mode of file transfer to ASCII ( default and transmits seven bits per
character).
Use the ASCII mode with any of the following:
- Raw Data (e.g. *.dat or *.txt, codebooks, or other plain text documents)
- SPSS Portable files.
- HTML files.

If u set mode of file transfer to Binary (the binary mode transmits all eight bits per byte
and thus provides less chance of
a transmission error and must be used to transmit files other than ASCII files).
For example use binary mode for the following types of files:
- SPSS System files
- SAS Dataset
- Graphic files (eg., *.gif, *.jpg, *.bmp, etc.)
- Microsoft Office documents (*.doc, *.xls, etc.)

(iii): Yes. If u r using Windows, u can access a text-based FTP utility from a DOS prompt.
To do this, perform the following steps:
1. From the Start Programs MS-Dos Prompt
2. Enter ftp ftp.geocities.com. A prompt will appear
(or)
Enter ftp to get ftp prompt ftp> open hostname ex. ftp>open ftp.geocities.com (It
connect to the specified host).
3. Enter ur yahoo! GeoCities member name.
4. enter your yahoo! GeoCities pwd.
You can now use standard FTP commands to manage the files in your Yahoo! GeoCities
directory.

17.What cmd is used to transfer multiple files at a time using FTP?
Ans: mget ==> To copy multiple files from the remote machine to the local machine. You will be
prompted for a y/n answer before
transferring each file mget * ( copies all files in the current remote directory to ur
current local directory,
using the same file names).
mput ==> To copy multiple files from the local machine to the remote machine.

18. What is an Filter Transformation? or what options u have in Filter Transformation?
Ans: The Filter transformation provides the means for filtering records in a mapping. You pass
all the rows from a source
transformation through the Filter transformation, then enter a filter condition for the
transformation. All ports in a Filter
transformation are input/output, and only records that meet the condition pass through
the Filter transformation.
Note: Discarded rows do not appear in the session log or reject files
To maximize session performance, include the Filter transformation as close to the sources
in the mapping as possible.
Rather than passing records you plan to discard through the mapping, you then filter out
unwanted data early in the
flow of data from sources to targets.

You cannot concatenate ports from more than one transformation into the Filter
transformation; the input ports for the filter
must come from a single transformation. Filter transformations exist within the flow of the
mapping and cannot be
unconnected. The Filter transformation does not allow setting output default values.


19.What are default sources which will supported by Informatica Powermart ?
Ans :
Relational tables, views, and synonyms.
Fixed-width and delimited flat files that do not contain binary data.
COBOL files.

20. When do u create the Source Definition ? Can I use this Source Defn to any Transformation?
Ans: When working with a file that contains fixed-width binary data, you must create the
source definition.
The Designer displays the source definition as a table, consisting of names, datatypes,
and constraints. To use a source
definition in a mapping, connect a source definition to a Source Qualifier or
Normalizer transformation. The Informatica
Server uses these transformations to read the source data.

21. What is Active & Passive Transformation ?
Ans: Active and Passive Transformations
Transformations can be active or passive. An active transformation can change the
number of records passed through it. A
passive transformation never changes the record count. For example, the Filter
transformation removes rows that do not meet the filter condition defined in the transformation.

Active transformations that might change the record count include the following:
Advanced External Procedure
Aggregator
Filter
Joiner
Normalizer
Rank
Source Qualifier
Note: If you use PowerConnect to access ERP sources, the ERP Source Qualifier is also an
active transformation.
/*
You can connect only one of these active transformations to the same transformation
or target, since the Informatica
Server cannot determine how to concatenate data from different sets of records with
different numbers of rows.
*/
Passive transformations that never change the record count include the following:
Lookup
Expression
External Procedure
Sequence Generator
Stored Procedure
Update Strategy

You can connect any number of these passive transformations, or connect one active
transformation with any number of
passive transformations, to the same transformation or target.


22. What is staging Area and Work Area?
Ans: Staging Area : -
- Holding Tables on DW Server.
- Loaded from Extract Process
- Input for Integration/Transformation
- May function as Work Areas
- Output to a work area or Fact Table
Work Area: -
- Temporary Tables
- Memory

23. What is Metadata? (plz refer DATA WHING IN THE REAL WORLD BOOK page # 125)
Ans: Defn: Data About Data
Metadata contains descriptive data for end users. In a data warehouse the term metadata is
used in a number of different
situations.
Metadata is used for:
Data transformation and load
Data management
Query management
Data transformation and load:
Metadata may be used during data transformation and load to describe the source data and
any changes that need to be made. The advantage of storing metadata about the data being
transformed is that as source data changes the changes can be captured in the metadata, and
transformation programs automatically regenerated.
For each source data field the following information is reqd:
Source Field:
Unique identifier (to avoid any confusion occurring betn 2 fields of the same anme
from different sources).
Name (Local field name).
Type (storage type of data, like character,integer,floating pointand so on).
Location
- system ( system it comes from ex.Accouting system).
- object ( object that contains it ex. Account Table).
The destination field needs to be described in a similar way to the source:
Destination:
Unique identifier
Name
Type (database data type, such as Char, Varchar, Number and so on).
Tablename (Name of the table th field will be part of).

The other information that needs to be stored is the transformation or transformations that
need to be applied to turn the source data into the destination data:
Transformation:
Transformation (s)
- Name
- Language (name of the lanjuage that transformation is written in).
- module name
- syntax

The Name is the unique identifier that differentiates this from any other similar
transformations.
The Language attribute contains the name of the lnguage that the transformation is
written in.
The other attributes are module name and syntax. Generally these will be mutually
exclusive, with only one being defined. For simple transformations such as simple SQL
functions the syntax will be stored. For complex transformations the name of the
module that contains the code is stored instead.
Data management:
Metadata is reqd to describe the data as it resides in the data warehouse.This is needed by the
warhouse manager to allow it to track and control all data movements. Every object in the
database needs to be described.
Metadata is needed for all the following:
Tables
- Columns
- name
- type
Indexes
- Columns
- name
- type
Views
- Columns
- name
- type
Constraints
- name
- type
- table
- columns
Aggregations, Partition information also need to be stored in Metadata( for details refer page
# 30)
Query Generation:
Metadata is also required by the query manger to enable it to generate queries. The same
metadata can be used by the Whouse manager to describe the data in the data warehouse is
also reqd by the query manager.
The query mangaer will also generate metadata about the queries it has run. This metadata
can be used to build a history of all quries run and generate a query profile for each user,
group of users and the data warehouse as a whole.
The metadata that is reqd for each query is:
- query
- tables accessed
- columns accessed
- name
- refence identifier
- restrictions applied

- column name
- table name
- reference identifier
- restriction
- join Criteria applied

- aggregate functions used

- group by criteria

- sort criteria

- syntax
- execution plan
- resources

24. What kind of Unix flavoures u r experienced?
Ans: Solaris 2.5 SunOs 5.5 (Operating System)
Solaris 2.6 SunOs 5.6 (Operating System)
Solaris 2.8 SunOs 5.8 (Operating System)
AIX 4.0.3
5.5.1 2.5.1 May 96 sun4c, sun4m, sun4d, sun4u, x86, ppc
5.6 2.6 Aug. 97 sun4c, sun4m, sun4d, sun4u, x86
5.7 7 Oct. 98 sun4c, sun4m, sun4d, sun4u, x86
5.8 8 2000 sun4m, sun4d, sun4u, x86

25. What are the tasks that are done by Informatica Server?
Ans:The Informatica Server performs the following tasks:
Manages the scheduling and execution of sessions and batches
Executes sessions and batches
Verifies permissions and privileges
Interacts with the Server Manager and pmcmd.
The Informatica Server moves data from sources to targets based on metadata stored in a
repository. For instructions on how to move and transform data, the Informatica Server reads
a mapping (a type of metadata that includes transformations and source and target
definitions). Each mapping uses a session to define additional information and to optionally
override mapping-level options. You can group multiple sessions to run as a single unit,
known as a batch.

26. What are the two programs that communicate with the Informatica Server?
Ans: Informatica provides Server Manager and pmcmd programs to communicate with the
Informatica Server:

Server Manager. A client application used to create and manage sessions and batches, and to
monitor and stop the Informatica Server. You can use information provided through the
Server Manager to troubleshoot sessions and improve session performance.
pmcmd. A command-line program that allows you to start and stop sessions and batches,
stop the Informatica Server, and verify if the Informatica Server is running.
27. When do u reinitialize Aggregate Cache?
Ans: Reinitializing the aggregate cache overwrites historical aggregate data with new aggregate
data. When you reinitialize the
aggregate cache, instead of using the captured changes in source tables, you typically need
to use the use the entire source
table.
For example, you can reinitialize the aggregate cache if the source for a session changes
incrementally every day and
completely changes once a month. When you receive the new monthly source, you might
configure the session to reinitialize
the aggregate cache, truncate the existing target, and use the new source table during the
session.

/? Note: To be clarified when server manger works for following ?/
To reinitialize the aggregate cache:
1.In the Server Manager, open the session property sheet.
2.Click the Transformations tab.
3.Check Reinitialize Aggregate Cache.
4.Click OK three times to save your changes.
5.Run the session.

The Informatica Server creates a new aggregate cache, overwriting the existing aggregate
cache.
/? To be check for step 6 & step 7 after successful run of session ?/

6.After running the session, open the property sheet again.
7.Click the Data tab.
8.Clear Reinitialize Aggregate Cache.
9.Click OK.

28. (i) What is Target Load Order in Designer?
Ans: Target Load Order: - In the Designer, you can set the order in which the Informatica
Server sends records to various target
definitions in a mapping. This feature is crucial if you want to maintain referential integrity
when inserting, deleting, or updating
records in tables that have the primary key and foreign key constraints applied to them. The
Informatica Server writes data to
all the targets connected to the same Source Qualifier or Normalizer simultaneously, to
maximize performance.

28. (ii) What are the minimim condition that u need to have so as to use Targte Load Order
Option in Designer?
Ans: U need to have Multiple Source Qualifier transformations.
To specify the order in which the Informatica Server sends data to targets, create one Source
Qualifier or Normalizer

transformation for each target within a mapping. To set the target load order, you then
determine the order in which each
Source Qualifier sends data to connected targets in the mapping.
When a mapping includes a Joiner transformation, the Informatica Server sends all
records to targets connected to that
Joiner at the same time, regardless of the target load order.

28(iii). How do u set the Target load order?
Ans: To set the target load order:
1. Create a mapping that contains multiple Source Qualifier transformations.
2. After you complete the mapping, choose Mappings-Target Load Plan.
A dialog box lists all Source Qualifier transformations in the mapping, as well as the
targets that receive data from each
Source Qualifier.
3. Select a Source Qualifier from the list.
4. Click the Up and Down buttons to move the Source Qualifier within the load order.
5. Repeat steps 3 and 4 for any other Source Qualifiers you wish to reorder.
6. Click OK and Choose Repository-Save.

29. What u can do with Repository Manager?
Ans: We can do following tasks using Repository Manager : -
To create usernames, you must have one of the following sets of privileges:
- Administer Repository privilege
- Super User privilege
To create a user group, you must have one of the following privileges :
To assign or revoke privileges , u must hv one of the following privilege..
Note: You cannot change the privileges of the default user groups or the default repository
users.

30. What u can do with Designer ?
Ans: The Designer client application provides five tools to help you create mappings:
Source Analyzer. Use to import or create source definitions for flat file, Cobol, ERP, and
relational sources.
Warehouse Designer. Use to import or create target definitions.
Transformation Developer. Use to create reusable transformations.
Mapplet Designer. Use to create mapplets.
Mapping Designer. Use to create mappings.

Note:The Designer allows you to work with multiple tools at one time. You can also work
in multiple folders and repositories

31. What are different types of Tracing Levels u hv in Transformations?
Ans: Tracing Levels in Transformations :-
Level Description

Terse Indicates when the Informatica Server initializes the session and its
components. Summarizes session results, but not at the level of
individual records.
Normal Includes initialization information as well as error messages and
notification of rejected data.
Verbose initialization Includes all information provided with the Normal setting plus
more extensive information about initializing transformations in the
session.
Verbose data Includes all information provided with the Verbose initialization setting.

Note: By default, the tracing level for every transformation is Normal.

To add a slight performance boost, you can also set the tracing level to Terse, writing the
minimum of detail to the session log
when running a session containing the transformation.

31(i). What the difference is between a database, a data warehouse and a data mart?
Ans: -- A database is an organized collection of information.
-- A data warehouse is a very large database with special sets of tools to extract and
cleanse data from operational systems
and to analyze data.
-- A data mart is a focused subset of a data warehouse that deals with a single area of
data and is organized for quick
analysis.

32. What is Data Mart, Data WareHouse and Decision Support System explain briefly?
Ans: Data Mart:
A data mart is a repository of data gathered from operational data and other sources that is
designed to serve a particular
community of knowledge workers. In scope, the data may derive from an enterprise-wide
database or data warehouse or be more specialized. The emphasis of a data mart is on
meeting the specific demands of a particular group of knowledge users in terms of analysis,
content, presentation, and ease-of-use. Users of a data mart can expect to have data presented
in terms that are familiar.
In practice, the terms data mart and data warehouse each tend to imply the presence of the
other in some form. However, most writers using the term seem to agree that the design of a
data mart tends to start from an analysis of user needs and that a data warehouse tends
to start from an analysis of what data already exists and how it can be collected in such a
way that the data can later be used. A data warehouse is a central aggregation of data
(which can be distributed physically); a data mart is a data repository that may derive from a
data warehouse or not and that emphasizes ease of access and usability for a particular
designed purpose. In general, a data warehouse tends to be a strategic but somewhat
unfinished concept; a data mart tends to be tactical and aimed at meeting an immediate need.

Data Warehouse:
A data warehouse is a central repository for all or significant parts of the data that an
enterprise's various business systems collect. The term was coined by W. H. Inmon. IBM
sometimes uses the term "information warehouse."
Typically, a data warehouse is housed on an enterprise mainframe server. Data from various
online transaction processing (OLTP) applications and other sources is selectively extracted

and organized on the data warehouse database for use by analytical applications and user
queries. Data warehousing emphasizes the capture of data from diverse sources for useful
analysis and access, but does not generally start from the point-of-view of the end user or
knowledge worker who may need access to specialized, sometimes local databases. The latter
idea is known as the data mart.
data mining, Web mining, and a decision support system (DSS) are three kinds of
applications that can make use of a data warehouse.

Decision Support System:
A decision support system (DSS) is a computer program application that analyzes business
data and presents it so that users can make business decisions more easily. It is an
"informational application" (in distinction to an "operational application" that collects the
data in the course of normal business operation).

Typical information that a decision support application might gather and present would
be:
Comparative sales figures between one week and the next
Projected revenue figures based on new product sales assumptions
The consequences of different decision alternatives, given past experience in a context that is
described

A decision support system may present information graphically and may include an expert
system or artificial intelligence (AI). It may be aimed at business executives or some other
group of knowledge workers.

33. What r the differences between Heterogeneous and Homogeneous?
Ans: Heterogeneous Homogeneous
Stored in different Schemas Common structure
Stored in different file or db types Same database type
Spread across in several countries Same data center
Different platform n H/W config. Same platform and H/Ware configuration.

34. How do you use DDL commands in PL/SQL block ex. Accept table name from user and drop
it, if available else display msg?
Ans: To invoke DDL commands in PL/SQL blocks we have to use Dynamic SQL, the Package
used is DBMS_SQL.

35. What r the steps to work with Dynamic SQL?
Ans: Open a Dynamic cursor, Parse SQL stmt, Bind i/p variables (if any), Execute SQL stmt of
Dynamic Cursor and
Close the Cursor.

36. Which package, procedure is used to find/check free space available for db objects like
table/procedures/views/synonymsetc?
Ans: The Package is DBMS_SPACE
The Procedure is UNUSED_SPACE
The Table is DBA_OBJECTS

Note: See the script to find free space @ c:\informatica\tbl_free_space


37. Does informatica allow if EmpId is PKey in Target tbl and source data is 2 rows with same
EmpID?If u use lookup for the same
situation does it allow to load 2 rows or only 1?
Ans: => No, it will not it generates pkey constraint voilation. (it loads 1 row)
=> Even then no if EmpId is Pkey.

38. If Ename varchar2(40) from 1 source(siebel), Ename char(100) from another source (oracle)
and the target is having Name
varchar2(50) then how does informatica handles this situation? How Informatica handles
string and numbers datatypes
sources?

39. How do u debug mappings? I mean where do u attack?

40. How do u qry the Metadata tables for Informatica?

41(i). When do u use connected lookup n when do u use unconnected lookup?
Ans:
Connected Lookups : -
A connected Lookup transformation is part of the mapping data flow. With connected
lookups, you can have multiple return values. That is, you can pass multiple values from
the same row in the lookup table out of the Lookup transformation.
Common uses for connected lookups include:
=> Finding a name based on a number ex. Finding a Dname based on deptno
=> Finding a value based on a range of dates
=> Finding a value based on multiple conditions
Unconnected Lookups : -
An unconnected Lookup transformation exists separate from the data flow in the mapping.
You write an expression using
the :LKP reference qualifier to call the lookup within another transformation.
Some common uses for unconnected lookups include:
=> Testing the results of a lookup in an expression
=> Filtering records based on the lookup results
=> Marking records for update based on the result of a lookup (for example, updating
slowly changing dimension tables)
=> Calling the same lookup multiple times in one mapping

41(ii). What r the differences between Connected lookups and Unconnected lookups?
Ans: Although both types of lookups perform the same basic task, there are some
important differences:
--------------------------------------------------------------- ---------------------------------------------
------------------
Connected Lookup Unconnected Lookup
--------------------------------------------------------------- ---------------------------------------------
------------------
Part of the mapping data flow. Separate from the mapping data flow.
Can return multiple values from the same row. Returns one value from each row.

You link the lookup/output ports to another You designate the return value with the Return
port (R).
transformation.
Supports default values. Does not support default values.
If there's no match for the lookup condition, the If there's no match for the lookup
condition, the server
server returns the default value for all output ports. returns NULL.
More visible. Shows the data passing in and out Less visible. You write an expression
using :LKP to tell
of the lookup. the server when to perform the lookup.
Cache includes all lookup columns used in the Cache includes lookup/output ports in
the Lookup condition
mapping (that is, lookup table columns included and lookup/return port.
in the lookup condition and lookup table
columns linked as output ports to other
transformations).

42. What u need concentrate after getting explain plan?
Ans: The 3 most significant columns in the plan table are named OPERATION,OPTIONS, and
OBJECT_NAME.For each step,
these tell u which operation is going to be performed and which object is the target of that
operation.
Ex:-
**************************
TO USE EXPLAIN PLAN FOR A QRY...
**************************
SQL> EXPLAIN PLAN
2 SET STATEMENT_ID = 'PKAR02'
3 FOR
4 SELECT JOB,MAX(SAL)
5 FROM EMP
6 GROUP BY JOB
7 HAVING MAX(SAL) >= 5000;

Explained.

**************************
TO QUERY THE PLAN TABLE :-
**************************
SQL> SELECT RTRIM(ID)||' '||
2 LPAD(' ', 2*(LEVEL-1))||OPERATION
3 ||' '||OPTIONS
4 ||' '||OBJECT_NAME STEP_DESCRIPTION
5 FROM PLAN_TABLE
6 START WITH ID = 0 AND STATEMENT_ID = 'PKAR02'
7 CONNECT BY PRIOR ID = PARENT_ID
8 AND STATEMENT_ID = 'PKAR02'
9 ORDER BY ID;


STEP_DESCRIPTION
----------------------------------------------------
0 SELECT STATEMENT
1 FILTER
2 SORT GROUP BY
3 TABLE ACCESS FULL EMP

43. How components are interfaced in Psoft?
Ans:

44. How do u do the analysis of an ETL?
Ans:

==============================================================

45. What is Standard, Reusable Transformation and Mapplet?
Ans: Mappings contain two types of transformations, standard and reusable. Standard
transformations exist within a single
mapping. You cannot reuse a standard transformation you created in another mapping, nor
can you create a shortcut to that transformation. However, often you want to create
transformations that perform common tasks, such as calculating the average salary in a
department. Since a standard transformation cannot be used by more than one mapping, you
have to set up the same transformation each time you want to calculate the average salary in a
department.
Mapplet: A mapplet is a reusable object that represents a set of transformations. It allows
you to reuse transformation logic
and can contain as many transformations as you need. A mapplet can contain
transformations, reusable transformations, and
shortcuts to transformations.

46. How do u copy Mapping, Repository, Sessions?
Ans: To copy an object (such as a mapping or reusable transformation) from a shared folder,
press the Ctrl key and drag and drop
the mapping into the destination folder.

To copy a mapping from a non-shared folder, drag and drop the mapping into the destination
folder.
In both cases, the destination folder must be open with the related tool active.
For example, to copy a mapping, the Mapping Designer must be active. To copy a Source
Definition, the Source Analyzer must be active.

Copying Mapping:
To copy the mapping, open a workbook.
In the Navigator, click and drag the mapping slightly to the right, not dragging it to the
workbook.
When asked if you want to make a copy, click Yes, then enter a new name and click OK.
Choose Repository-Save.


Repository Copying: You can copy a repository from one database to another. You use this
feature before upgrading, to
preserve the original repository. Copying repositories provides a quick way to copy all
metadata you want to use as a basis for
a new repository.
If the database into which you plan to copy the repository contains an existing repository, the
Repository Manager deletes the existing repository. If you want to preserve the old
repository, cancel the copy. Then back up the existing repository before copying the new
repository.
To copy a repository, you must have one of the following privileges:
Administer Repository privilege
Super User privilege

To copy a repository:
1. In the Repository Manager, choose Repository-Copy Repository.
2. Select a repository you wish to copy, then enter the following information:
-------------------------------- --------------------------- ------------------------------------------------
-
Copy Repository Field Required/ Optional Description
-------------------------------- --------------------------- ------------------------------------------------
-
Repository Required Name for the repository copy. Each repository
name must be unique within
the domain and should be easily distinguished from all
other repositories.
Database Username Required Username required to connect to the database.
This login must have the
appropriate database permissions to create the
repository.
Database Password Required Password associated with the database
username.Must be in US-ASCII.
ODBC Data Source Required Data source used to connect to the database.
Native Connect String Required Connect string identifying the location
of the database.
Code Page Required Character set associated with the repository.
Must be a superset of the code
page of the repository you want to copy.


If you are not connected to the repository you want to copy, the Repository Manager
asks you to log in.
3. Click OK.
5. If asked whether you want to delete an existing repository data in the second repository,
click OK to delete it. Click Cancel to preserve the existing repository.

Copying Sessions:
In the Server Manager, you can copy stand-alone sessions within a folder, or copy sessions in
and out of batches.
To copy a session, you must have one of the following:
Create Sessions and Batches privilege with read and write permission
Super User privilege
To copy a session:
1. In the Server Manager, select the session you wish to copy.
2. Click the Copy Session button or choose Operations-Copy Session.
The Server Manager makes a copy of the session. The Informatica Server names the copy
after the original session, appending a number, such as session_name1.

47. What are shortcuts, and what is advantage?
Ans: Shortcuts allow you to use metadata across folders without making copies, ensuring uniform
metadata. A shortcut inherits all
properties of the object to which it points. Once you create a shortcut, you can configure the
shortcut name and description.

When the object the shortcut references changes, the shortcut inherits those changes. By
using a shortcut instead of a copy,
you ensure each use of the shortcut exactly matches the original object. For example, if you
have a shortcut to a target
definition, and you add a column to the definition, the shortcut automatically inherits the
additional column.

Shortcuts allow you to reuse an object without creating multiple objects in the repository.
For example, you use a source
definition in ten mappings in ten different folders. Instead of creating 10 copies of the same
source definition, one in each
folder, you can create 10 shortcuts to the original source definition.
You can create shortcuts to objects in shared folders. If you try to create a shortcut to a non-
shared folder, the Designer
creates a copy of the object instead.

You can create shortcuts to the following repository objects:
Source definitions
Reusable transformations
Mapplets
Mappings
Target definitions
Business components

You can create two types of shortcuts:
Local shortcut. A shortcut created in the same repository as the original object.

Global shortcut. A shortcut created in a local repository that references an object in a
global repository.

Advantages: One of the primary advantages of using a shortcut is maintenance. If you
need to change all instances of an
object, you can edit the original repository object. All shortcuts accessing the object
automatically inherit the changes.
Shortcuts have the following advantages over copied repository objects:
You can maintain a common repository object in a single location. If you need to edit
the object, all shortcuts immediately inherit the changes you make.
You can restrict repository users to a set of predefined metadata by asking users to
incorporate the shortcuts into their work instead of developing repository objects
independently.
You can develop complex mappings, mapplets, or reusable transformations, then
reuse them easily in other folders.
You can save space in your repository by keeping a single repository object and
using shortcuts to that object, instead of creating copies of the object in multiple
folders or multiple repositories.

48. What are Pre-session and Post-session Options?
(Plzz refer Help Using Shell Commands n Post-Session Commands and Email)
Ans: The Informatica Server can perform one or more shell commands before or after the
session runs. Shell commands are
operating system commands. You can use pre- or post- session shell commands, for
example, to delete a reject file or
session log, or to archive target files before the session begins.

The status of the shell command, whether it completed successfully or failed,
appears in the session log file.
To call a pre- or post-session shell command you must:
1. Use any valid UNIX command or shell script for UNIX servers, or any valid DOS or
batch file for Windows NT servers.
2. Configure the session to execute the pre- or post-session shell commands.

You can configure a session to stop if the Informatica Server encounters an error while
executing pre-session shell commands.

For example, you might use a shell command to copy a file from one directory to another.
For a Windows NT server you would use the following shell command to copy the SALES_
ADJ file from the target directory, L, to the source, H:
copy L:\sales\sales_adj H:\marketing\

For a UNIX server, you would use the following command line to perform a similar
operation:
cp sales/sales_adj marketing/

Tip: Each shell command runs in the same environment (UNIX or Windows NT) as the
Informatica Server. Environment settings in one shell command script do not carry over to
other scripts. To run all shell commands in the same environment, call a single shell script
that in turn invokes other scripts.


49. What are Folder Versions?
Ans: In the Repository Manager, you can create different versions within a folder to help you
archive work in development. You can copy versions to other folders as well. When you save
a version, you save all metadata at a particular point in development. Later versions contain
new or modified metadata, reflecting work that you have completed since the last version.

Maintaining different versions lets you revert to earlier work when needed. By archiving the
contents of a folder into a version each time you reach a development landmark, you can
access those versions if later edits prove unsuccessful.

You create a folder version after completing a version of a difficult mapping, then continue
working on the mapping. If you are unhappy with the results of subsequent work, you can
revert to the previous version, then create a new version to continue development. Thus you
keep the landmark version intact, but available for regression.

Note: You can only work within one version of a folder at a time.

50. How do automate/schedule sessions/batches n did u use any tool for automating
Sessions/batch?
Ans: We scheduled our sessions/batches using Server Manager.
You can either schedule a session to run at a given time or interval, or you can manually
start the session.
U needto hv create sessions n batches with Read n Execute permissions or super user
privilege.
If you configure a batch to run only on demand, you cannot schedule it.

Note: We did not use any tool for automation process.

51. What are the differences between 4.7 and 5.1 versions?
Ans: New Transformations added like XML Transformation and MQ Series Transformation, and
PowerMart and PowerCenter both
are same from 5.1version.

52. What r the procedure that u need to undergo before moving Mappings/sessions from
Testing/Development to Production?
Ans:

53. How many values it (informatica server) returns when it passes thru Connected Lookup n
Unconncted Lookup?
Ans: Connected Lookup can return multiple values where as Unconnected Lookup will return
only one values that is Return Value.

54. What is the difference between PowerMart and PowerCenter in 4.7.2?
Ans: If You Are Using PowerCenter
PowerCenter allows you to register and run multiple Informatica Servers against the
same repository. Because you can run
these servers at the same time, you can distribute the repository session load across
available servers to improve overall

performance.
With PowerCenter, you receive all product functionality, including distributed metadata, the
ability to organize repositories into
a data mart domain and share metadata across repositories.
A PowerCenter license lets you create a single repository that you can configure as a global
repository, the core component
of a data warehouse.
If You Are Using PowerMart
This version of PowerMart includes all features except distributed metadata and multiple
registered servers. Also, the various
options available with PowerCenter (such as PowerCenter Integration Server for BW,
PowerConnect for IBM DB2,
PowerConnect for SAP R/3, and PowerConnect for PeopleSoft) are not available with
PowerMart.

55. What kind of modifications u can do/perform with each Transformation?
Ans: Using transformations, you can modify data in the following ways:
----------------- ------------------------
Task Transformation
----------------- ------------------------
Calculate a value Expression
Perform an aggregate calculations Aggregator
Modify text Expression
Filter records Filter, Source Qualifier
Order records queried by the Informatica Server Source Qualifier
Call a stored procedure Stored Procedure
Call a procedure in a shared library or in the External Procedure
COM layer of Windows NT
Generate primary keys Sequence Generator
Limit records to a top or bottom range Rank
Normalize records, including those read Normalizer
from COBOL sources
Look up values Lookup
Determine whether to insert, delete, update, Update Strategy
or reject records
Join records from different databases Joiner
or flat file systems

56. Expressions in Transformations, Explain briefly how do u use?
Ans: Expressions in Transformations
To transform data passing through a transformation, you can write an expression. The
most obvious examples of these are the
Expression and Aggregator transformations, which perform calculations on either
single values or an entire range of values
within a port. Transformations that use expressions include the following:
--------------------- ------------------------------------------
Transformation How It Uses Expressions
--------------------- ------------------------------------------
Expression Calculates the result of an expression for each row passing
through the transformation, using values from one or more ports.

Aggregator Calculates the result of an aggregate expression, such as a sum or
average, based on all data passing through a port or on groups
within that data.
Filter Filters records based on a condition you enter using an
expression.
Rank Filters the top or bottom range of records, based on a condition
you enter using an expression.
Update Strategy Assigns a numeric code to each record based on an expression,
indicating whether the Informatica Server should use the
information in the record to insert, delete, or update the target.

In each transformation, you use the Expression Editor to enter the expression. The
Expression Editor supports the transformation language for building expressions. The
transformation language uses SQL-like functions, operators, and other components to build
the expression. For example, as in SQL, the transformation language includes the functions
COUNT and SUM. However, the PowerMart/PowerCenter transformation language includes
additional functions not found in SQL.

When you enter the expression, you can use values available through ports. For example,
if the transformation has two input ports representing a price and sales tax rate, you can
calculate the final sales tax using these two values. The ports used in the expression can
appear in the same transformation, or you can use output ports in other transformations.

57. In case of Flat files (which comes thru FTP as source) has not arrived then what
happens?Where do u set this option?
Ans: U get an fatel error which cause server to fail/stop the session.
U can set Event-Based Scheduling Option in Session Properties under General tab--
>Advanced options..
----------------- ------------------- ------------------
Event-Based Required/ Optional Description
----------------- -------------------- ------------------
Indicator File to Wait For Optional Required to use event-based
scheduling. Enter the indicator file
(or directory and file) whose arrival schedules the
session. If you do
not enter a directory, the Informatica Server assumes the
file appears
in the server variable directory $PMRootDir.


58. What is the Test Load Option and when you use in Server Manager?
Ans: When testing sessions in development, you may not need to process the entire source.
If this is true, use the Test Load
Option(Session Properties General Tab Target Options Choose Target Load
options as Normal (option button), with
Test Load cheked (Check box) and No.of rows to test ex.2000 (Text box with Scrolls)).
You can also click the Start button.

-----------------------------------------------------------------------------------------------------------------
--------------------------------------------------------
59. SCD Type 2 and SGT difference?

60. Differences between 4.7 and 5.1?

61. Tuning Informatica Server for improving performance? Performance Issues?
Ans: See /* C:\pkar\Informatica\Performance Issues.doc */

62. What is Override Option? Which is better?

63. What will happen if u increase buffer size?

64. what will happen if u increase commit Intervals? and also decrease commit Intervals?

65. What kind of Complex mapping u did? And what sort of problems u faced?

66. If u have 10 mappings designed and u need to implement some changes(may be in
existing mapping or new mapping need to
be designed) then how much time it takes from easier to complex?

67. Can u refresh Repository in 4.7 and 5.1? and also can u refresh pieces (partially) of
repository in 4.7 and 5.1?

68. What is BI?
Ans: http://www.visionnet.com/bi/index.shtml

69. Benefits of BI?
Ans: http://www.visionnet.com/bi/bi-benefits.shtml

70. BI Faq
Ans: http://www.visionnet.com/bi/bi-faq.shtml

71. What is difference between data scrubbing and data cleansing?
Ans: Scrubbing data is the process of cleaning up the junk in legacy data and making it
accurate and useful for the next generations
of automated systems. This is perhaps the most difficult of all conversion activities.
Very often, this is made more difficult when
the customer wants to make good data out of bad data. This is the dog work. It is also
the most important and can not be done
without the active participation of the user.
DATA CLEANING - a two step process including DETECTION and then
CORRECTION of errors in a data set


72. What is Metadata and Repository?
Ans:
Metadata. Data about data .
It contains descriptive data for end users.
Contains data that controls the ETL processing.
Contains data about the current state of the data warehouse.
ETL updates metadata, to provide the most current state.

Repository. The place where you store the metadata is called a repository. The more
sophisticated your repository, the more
complex and detailed metadata you can store in it. PowerMart and PowerCenter use a
relational database as the
repository.

73. SQL * LOADER?
Ans: http://download-
west.oracle.com/otndoc/oracle9i/901_doc/server.901/a90192/ch03.htm#1004678

74. Debugger in Mapping?

75. Parameters passing in 5.1 vesion exposure?

76. What is the filename which u need to configure in Unix while Installing Informatica?

77. How do u select duplicate rows using Informatica i.e., how do u use
Max(Rowid)/Min(Rowid) in Informatica?

What are the perceptions to use ER and Normalization?

What is ER model and Dimensional Model?
ER Model - Relational
Dimensional - Star Schema(central table fact table with numeric data , all others are
linked to central table, faster , but denormalised ) , Snowflake Schema(one fact table,
Normalizing the dimension tables , Fact Constellation(Different fact tables and
combined from one datamart to other)

What is Metadata?
Information about domain structure of data warehouse

What are different types of Dimensional Modeling?

Dimensional - Star Schema(central table fact table with numeric data , all others are
linked to central table, faster , but denormalised ) , Snowflake Schema(one fact table,
Normalizing the dimension tables , Fact Constellation(Different fact tables and
combined from one datamart to other)

1. What is dimensional modelling ? what is called a dimension ?
What are the different types of dimensional modelling ?
Have you done any ER modelling ? If so, how does it differ from dimensional modelling ?
Which type do you prefer ? Why wouldn't you use the other type ?

2. What is snowflaking ? Example ?
Why do you use snowflaking ? How is it different from star organization ?
What are the advantages or disadvantages of snowflaking ?
What type of data organization do you prefer ? Why ?

4. What RDBMS are you most comfortable in ?
How does it support data warehousing needs ?

5. In data modelling, how do you implement a many-to-many relationship with respect to E-R
modelling ?

6. Do you have any experience in data loading ?
What tools or methods have you used for data loading ?

10. Why do you use dimensional modelling instead of ER modelling for data warehousing
applications ?

1) Erwin - Is it possible to reverse engineer to diff schemes into single data model

2) Suppose there is a star schema where a fact table has 3 dimesnsion tables and this system is in
product. Is it possible to add the more dimension table to the fact table . What is the impact in all
the stages.

Difference between Star & Snowflake Schema
Snowflaking is a star schema design technique to separately store logical attributes usually of low
cardinality along a loosely normalization technique. For example, you could snowflake the
gender of your customers in order for you to track changes on these attributes if your customer
dimension is too large to SCD's.

The technique s not quite recommendable if you are going to use OLAP tools for your front end
due to speed issues.

snowflaking allows for easy update and load of data as redundancy of data is avoided to some
extent, but browsing capabilites are greatly compromised. But sometimes it may become a
necessary evil.

To add a little to this, snowflaking often becomes necessary when you need data for which there
is a one-to-many relationship with a dimension table. To try to consolidate this data into the
dimension table would necessarily lead to redundancy (this is a violation of
second normal form, which will produce a Cartesian product). This sort of redundancy can cause
misleading results in queries, since the count of rows is artificially large (due to the Cartesian

product). A simple example of such a situation might be a "customer" dimension for which there
is a need to store multiple contacts. If the contact information is brought in to the customer table,
there would be one row for each contact (i.e., one for each customer/contact combination). In this
situation, it is better just to create a "contact" snowflake table with a FK to the customer. In
general, it is better to avoid snowflaking if possible, but sometimes the consequences of avoiding
it are much worse.

In star schema, all your dimensions will be linked directly with your fact table. On the other hand
in Snowflake schema, dimensions maybe interlinked or may have one to many relationship with
other tables. As previous mails said this isn't a desirable situation but you can make best choice
once you have gathered all the requirements.

The snowflake is a design like a star but with a connect tables in the dimensions tables is a
relation between 2 dimensions.

3. Q: Which is better, Star or Snowflake?
A: Strict data warehousing rules would have you use a Star schema but in reality most designs
tend to become Snowflakes. They each have their pros and cons but both are far better then trying
to use a transactional system third-normal form design.
4. Q: Why cant I use a copy of my transactional system for my data warehouse?
A: This is one of the absolute worst things you can do. A lot of people initially go down this road
because a tool vendor will support the idea when making their sales pitch. Many of these
attempts will even experience success for a short period of time. Its not until your data sets grow
and your business questions begin to be complex that this design mistake will really come out to
bite you.

Q. What are the responsibilities of a data warehouse consultant/professional?

The basic responsibility of a data warehouse consultant is to publish the right data.

Some of the other responsibilities of a data warehouse consultant are:
1. Understand the end users by their business area, job responsibilities, and computer
tolerance
2. Find out the decisions the end users want to make with the help of the data warehouse
3. Identify the best users who will make effective decisions using the data warehouse
4. Find the potential new users and make them aware of the data warehouse
5. Determining the grain of the data
6. Make the end user screens and applications much simpler and more template driven

Q. Stars and Cubes (Polaris)

The star schema and OLAP cube are intimately related. Star schemas are most appropriate for
very large data sets. OLAP cubes are most appropriate for smaller data sets where analytic tools
can perform complex data comparisons and calculations. In almost all OLAP cube environments,
its recommended that you originally source data into a star schema structure, and then use
wizards to transform the data into the OLAP cube.


Q. What is the necessity of having dimensional modeling instead of an ER modeling?

Compared to entity/relation modeling, it's less rigorous (allowing the designer more discretion in
organizing the tables) but more practical because it accommodates database complexity and
improves performance.

Q. Dimensions and Facts.
Dimensional modeling begins by dividing the world into measurements and context.
Measurements are usually numeric and taken repeatedly. Numeric measurements are facts. Facts
are always surrounded by mostly textual context that's true at the moment the fact is recorded.
Facts are very specific, well-defined numeric attributes. By contrast, the context surrounding the
facts is open-ended and verbose. It's not uncommon for the designer to add context to a set of
facts partway through the implementation.
Dimensional modeling divides the world of data into two major types: Measurements and
Descriptions of the context surrounding those measurements. The measurements, which are
typically numeric, are stored in fact tables, and the descriptions of the context, which are typically
textual, are stored in the dimension tables.

A fact table in a pure star schema consists of multiple foreign keys, each paired with a primary
key in a dimension, together with the facts containing the measurements.
Every foreign key in the fact table has a match to a unique primary key in the respective
dimension (referential integrity). This allows the dimension table to possess primary keys that
arent found in the fact table. Therefore, a product dimension table might be paired with a sales
fact table in which some of the products are never sold.

Dimensional models are full-fledged relational models, where the fact table is in third normal
form and the dimension tables are in second normal form.
The main difference between second and third normal form is that repeated entries are removed
from a second normal form table and placed in their own snowflake. Thus the act of removing
the context from a fact record and creating dimension tables places the fact table in third normal
form.

E.g. for Fact tables Sales, Cost, Profit
E.g. for Dimensions Customer, Product, Store, Time

Q. What are Additive Facts? Or what is meant by Additive Fact?

The fact tables are mostly very huge and almost never fetch a single record into our answer set.
We fetch a very large number of records on which we then do, adding, counting, averaging, or
taking the min or max. The most common of them is adding. Applications are simpler if they
store facts in an additive format as often as possible. Thus, in the grocery example, we dont need
to store the unit price. We compute the unit price by dividing the dollar sales by the unit sales
whenever necessary.

Q. What is meant by averaging over time?

Some facts, like bank balances and inventory levels, represent intensities that are awkward to
express in an additive format. We can treat these semi additive facts as if they were additive but

just before presenting the results to the end user; divide the answer by the number of time periods
to get the right result. This technique is called averaging over time.

Q. What is a Conformed Dimension?

When the enterprise decides to create a set of common labels across all the sources of data, the
separate data mart teams (or, single centralized team) must sit down to create master dimensions
that everyone will use for every data source. These master dimensions are called Conformed
Dimensions.
Two dimensions are conformed if the fields that you use as row headers have the same domain.

Q. What is a Conformed Fact?

If the definitions of measurements (facts) are highly consistent, we call them as Conformed Facts.

Q. What are the 3 important fundamental themes in a data warehouse?

The 3 most important fundamental themes are:
1. Drilling Down
2. Drilling Across and
3. Handling Time

Q. What is meant by Drilling Down?

Drilling down means nothing more than give em more detail.
Drilling Down in a relational database means adding a row header to an existing SELECT
statement. For instance, if you are analyzing the sales of products at a manufacturer level, the
select list of the query reads:
SELECT MANUFACTURER, SUM(SALES).
If you wish to drill down on the list of manufacturers to show the brand sold, you add the
BRAND row header:
SELECT MANUFACTURER, BRAND, SUM(SALES).
Now each manufacturer row expands into multiple rows listing all the brands sold. This is the
essence of drilling down.

We often call a row header a grouping column because everything in the list thats not
aggregated with an operator such as SUM must be mentioned in the SQL GROUP BY clause. So
the GROUP BY clause in the second query reads, GROUP BY MANUFACTURER, BRAND.

Q. What is meant by Drilling Across?

Drilling Across adds more data to an existing row. If drilling down is requesting ever finer and
granular data from the same fact table, then drilling across is the process fo linking two or more
fact tables at the same granularity, or, in other words, tables with the same set of grouping
columns and dimensional constraints.

A drill across report can be created by using grouping columns that apply to all the fact tables
used in the report.


The new fact table called for in the drill-across operation must share certain dimensions with the
fact table in the original query. All fact tables in a drill-across query must use conformed
dimensions.

Q. What is the significance of handling time?

Example, when a customer moves from a property, we might want to know:
1. who the new customer is
2. when did the old customer move out
3. when did the new customer move in
4. how long was the property empty etc

Q. What is menat by Drilling Up?

If drilling down is adding grouping columns from the dimension tables, then drilling up is
subtracting grouping columns.

Q. What is meant by Drilling Around?

The final variant of drilling is drilling around a value circle. This is similar to the linear value
chain that I showed in the previous example, but occurs in a data warehouse where the related
fact tables that share common dimensions are not arranged i n a linear order. The best example is
from health care, where as many as 10 separate entities are processing patient encounters, and are
sharing this information with one another.
E.g. a typical health care value circle with 10 separate entities surrounding the patient.

When the common dimensions are conformed and the requested grouping columns are drawn
from dimensions that tie to all the fact tables in a given report, you can generate really powerful
drill around reports by performing separate queries on each fa ct table and outer joining the
answer sets in the client tool.

Q. What are the important fields in a recommended Time dimension table?

Time_key
Day_of_week
Day_number_in_month
Day_number_overall
Month
Month_number_overall
Quarter
Fiscal_period
Season
Holiday_flag
Weekday_flag
Last_day_in_month_flag

Q. Why have timestamp as a surrogate key rather than a real date?

The tiem stamp in a fact table should be a surrogate key instead of a real date because:


the rare timestamp that is inapplicable, corrupted, or hasnt happened yet needs a
value that cannot be a real date
most end-user calendar navigation constraints, such as fiscal periods, end-of-periods,
holidays, day numbers and week numbers arent supported by database timestamps
integer time keys take up much less disk space than full dates

Q. Why have more than one fact table instead of a single fact table?

We cannot combine all of the business processes into a single fact table because:
the separate fact tables in the value chain do not share all the dimensions. You
simply cant put the customer ship to dimension on the finished goods inventory
data
each fact table possesses different facts, and the fact table records are recorded at
different tiems along the alue chain

Q. What is mean by Slowly Changing Dimensions and what are the different types of
SCDs? (Mascot)

Dimensions dont change in predicable ways. Individual customers and products evolve slowly
and episodically. Some of the changes are true physical changes. Customers change their
addresses because they move. A product is manufactured with different packaging. Other changes
are actually corrections of mistakes in the data. And finally, some changes are changes in how we
label a product or customer and are more a matter of opinion than physical reality. We call these
variations Slowly Changing Dimension (SCD).

The 3 fundamental choices for handling the slowly changing dimension are:

Overwrite the changed attribute, thereby destroying previous history
eg. Useful when correcting an error
Issue a new record for the customer, keeping the customer natural key, but creating a new
surrogate primary key
Create an additional field in the existing customer record, and store the old value of the
attribute in the additional field. Overwrite the original attribute field

A Type 1 SCD is an overwrite of a dimensional attribute. History is definitely lost. We overwrite
when we are correcting an error in the data or when we truly dont want to save history.

A Type 2 SCD creates a new dimension record and requires a generalized or surrogate key for the
dimension. We create surrogate keys when a true physical change occurs in a dimension entity at
a specific point in time, such as the customer address change or the product packing change. We
often add a timestamp and a reason code in the dimension record to precisely describe the change.
The Type 2 SCD records changes of values of dimensional entity attributes over time. The
technique requires adding a new row to the dimension each time theres a change in the value of
an attribute (or group of attributes) and assigning a unique surrogate key to the new row.

A Type 3 SCD adds a new field in the dimension record but does not create a new record. We
might change the designation of the customers sales territory because we redraw the sales
territory map, or we arbitrarily change the category of the product from confectionary to candy.
In both cases, we augment the original dimension attribute with an old attribute so we can
switch between these alternate realities.


Q. What are the techniques for handling SCDs?
Overwriting
Creating another dimension record
Creating a current value filed

Q. What is a Surrogate Key and where do you use it? (Mascot)

A surrogate key is an artificial or synthetic key that is used as a substitute for a natural key. It is
just a unique identifier or number for each row that can be used for the primary key to the table.

It is useful because the natural primary key (i.e. Customer Number in Customer table) can change
and this makes updates more difficult.

Some tables have columns such as AIRPORT_NAME or CITY_NAME which are stated as the
primary keys (according to the business users) but ,not only can these change, indexing on a
numerical value is probably better and you could consider creating a surrogate key called, say,
AIRPORT_ID. This would be internal to the system and as far as the client is concerned you may
display only the AIRPORT_NAME.

Another benefit you can get from surrogate keys (SID) is in Tracking the SCD - Slowly Changing
Dimension.

A classical example:
On the 1st of January 2002, Employee 'E1' belongs to Business Unit 'BU1' (that's what would be
in your Employee Dimension). This employee has a turnover allocated to him on the Business
Unit 'BU1' But on the 2nd of June the Employee 'E1' is muted from Business Unit 'BU1' to
Business Unit 'BU2.' All the new turnover has to belong to the new Business Unit 'BU2' but the
old one should Belong to the Business Unit 'BU1.'

If you used the natural business key 'E1' for your employee within your data warehouse
everything would be allocated to Business Unit 'BU2' even what actually belongs to 'BU1.'

If you use surrogate keys, you could create on the 2nd of June a new record for the Employee 'E1'
in your Employee Dimension with a new surrogate key.

This way, in your fact table, you have your old data (before 2nd of June) with the SID of the
Employee 'E1' + 'BU1.' All new data (after 2nd of June) would take the SID of the employee 'E1'
+ 'BU2.'

You could consider Slowly Changing Dimension as an enlargement of your natural key: natural
key of the Employee was Employee Code 'E1' but for you it becomes
Employee Code + Business Unit - 'E1' + 'BU1' or 'E1' + 'BU2.' But the difference with the natural
key enlargement process is that you might not have all part of your new key within your fact
table, so you might not be able to do the join on the new enlarge key so you need another id.

Every join between dimension tables and fact tables in a data warehouse environment should be
based on surrogate key, not natural keys.

Q. What is the necessity of having surrogate keys?

Production may reuse keys that it has purged but that you are still maintaining

Production might legitimately overwrite some part of a product description or a
customer description with new values but not change the product key or the customer
key to a new value. We might be wondering what to do about the revised attribute
values (slowly changing dimension crisis)
Production may generalize its key format to handle some new situation in the
transaction system. E.g. changing the production keys from integers to alphanumeric
or may have 12-byte keys you are used to have become 20-byte keys
Acquisition of companies

Q. What are the advantages of using Surrogate Keys?

We can save substantial storage space with integer valued surrogate keys
Eliminate administrative surprises coming from production
Potentially adapt to big surprises like a merger or an acquisition
Have a flexible mechanism for handling slowly changing dimensions

Q. What are Factless Fact tables?

Fact tables which do not have any facts are called factless fact tables. They may consist of
nothing but keys.

There are two kinds of fact tables that do not have any facts at all.

The first type of factless fact table is a table that records an event. Many event-tracking tables in
dimensional data warehouses turn out to be factless.
E.g. A student tracking system that detects each student attendance event each day.

The second type of factless fact table is called a coverage table. Coverage tables are frequently
needed when a primary fact table in a dimensional data warehouse is sparse.
E.g. A sales fact table that records the sales of products in stores on particular days under each
promotion condition. The sales fact table does answer many interesting questions but cannot
answer questions about things that did not happen. For instance, it cannot answer the question,
which products were in promotion that did not sell? because it contains only the records of
products that did sell. In this case the coverage table comes to the rescue. A record is placed in
the coverage table for each product in each store that is on promotion in each time period.

Q. What are Causal dimension?

A causal dimension is a kind of advisory dimension that should not change the fundamental grain
of a fact table.
E.g. why the customer bought the product? It can be due to promotion, sales etc.

Q. What is meant by Drill Through? (Mascot)

Operating Data Source - directly connects to application database

Q. What is Operational Data Store? (Mascot)

Q. What is BI? And why do we need BI?

Business Intelligence, it is an ongoing process of various integration packages to analyze data.


Q What is Slicing and Dicing ? How we can do in Impromptu (We cannot do)? It is done only in
Powerplay.

GENERAL

Q. Explain the Project. (Polaris)

Explain about the various projects (MIDAS2/VIP).
Why was MIDAS2 or VIP or SCI developed.

Q. What is the size of the database in your project? (Polaris)

Approximately 900GB.

Q. What is the daily data volume (in GB/records)? Or What is the size of the data extracted
in the extraction process? (Polaris)

Q. How many Data marts are there in your project?

Q. How many Fact and Dimension tables are there in your project?

Q. What is the size of Fact table in your project?

Q. How many dimension tables did you had in your project and name some dimensions
(columns)? (Mascot)

Q. Name some measures in your fact table? (Mascot)

Q. Why couldnt u go for Snowflake schema? (Mascot)

Q. How many Measures u have created? (Mascot)

Q. How many Facts & Dimension Tables are there in your Project? (Mascot)

Q. Have u created Datamarts? (Mascot)

Q. What is the difference between OLTP and OLAP?

OLAP - Online Analytical processing, mainly required for DSS, data is in denormalized manner
and mainly used for non volatile data, highly indexed, improve query response time

OLTP - Transactional Processing - DML, highly normalized to reduce deadlock & increase
concurrency

Q. What is the difference between OLTP and data warehouse?

Operational System Data Warehouse
Transaction Processing Query Processing
Time Sensitive History Oriented

Operator View Managerial View
Organized by transactions (Order,
Input, Inventory)
Organized by subject (Customer,
Product)
Relatively smaller database Large database size
Many concurrent users Relatively few concurrent users
Volatile Data Non Volatile Data
Stores all data Stores relevant data
Not Flexible
Flexible

Q. Explain the DW life cycle

Data warehouses can have many different types of life cycles with independent data marts. The
following is an example of a data warehouse life cycle.
In the life cycle of this example, four important steps are involved.

Extraction - As a first step, heterogeneous data from different online transaction processing
systems is extracted. This data becomes the data source for the data warehouse.
Cleansing/transformation - The source data is sent into the populating systems where the data is
cleansed, integrated, consolidated, secured and stored in the corporate or central data warehouse.
Distribution - From the central data warehouse, data is distributed to independent data marts
specifically designed for the end user.
Analysis - From these data marts, data is sent to the end users who access the data stored in the
data mart depending upon their requirement.

Q. What is the life cycle of DW?
Getting data from OLTP systems from diff data sources
Analysis & staging - Putting in a staging layer- cleaning, purging, putting surrogate keys, SCM ,
dimensional modeling
Loading
Writing of metadata

Q. What are the different Reporting and ETL tools available in the market?

Q. What is a data warehouse?

A data warehouse is a database designed to support a broad range of decision tasks in a specific
organization. It is usually batch updated and structured for rapid online queries and managerial
summaries. Data warehouses contain large amounts of historical data which are derived from
transaction data, but it can include data from other sources also. It is designed for query and
analysis rather than for transaction processing.

It separates analysis workload from transaction workload and enables an organization to
consolidate data from several sources.

The term data warehousing is often used to describe the process of creating, managing and using
a data warehouse.


Q. What is a data mart?

A data mart is a selected part of the data warehouse which supports specific decision support
application requirements of a companys department or geographical region. It usually contains
simple replicates of warehouse partitions or data that has been further summarized or derived
from base warehouse data. Instead of running ad hoc queries against a huge data warehouse, data
marts allow the efficient execution of predicted queries over a significantly smaller database.

Q. How do I differentiate between a data warehouse and a data mart? (KPIT Infotech Pune,
Mascot)

A data warehouse is for very large databases (VLDBs) and a data mart is for smaller databases.
The difference lies in the scope of the things with which they deal.
A data mart is an implementation of a data warehouse with a small and more tightly restricted
scope of data and data warehouse functions. A data mart serves a single department or part of an
organization. In other words, the scope of a data mart is smaller than the data warehouse. It is a
data warehouse for a smaller group of end users.

Q. What is the aim/objective of having a data warehouse? And who needs a data warehouse? Or
what is the use of Data Warehousing? (Polaris)

Data warehousing technology comprises a set of new concepts and tools which support the
executives, managers and analysts with information material for decision making.
The fundamental reason for building a data warehouse is to improve the quality of information in
the organization.

The main goal of data warehouse is to report and present the information in a very user friendly
form.

Q. What approach to be followed for creation of Data Warehouse?

Top Down Approach (Data warehousing first) , Bottom Up (data marts), Enterprise Data Model (
combines both)

Q. Explain the methodology of Data Warehousing? (Polaris)

Q. What are the important concerns of OLTP and DSS systems?

Q. What is the Architecture of a data warehouse?

A data warehouse system (DWS) comprises the data warehouse and all components used for
building, accessing and maintaining the DWH (illustrated in Figure 1). The center of a data
warehouse system is the data warehouse itself. The data import and preparation component is
responsible for data acquisition. It includes all programs, applications and legacy systems
interfaces that are responsible for extracting data from operational sources, preparing and loading
it into the warehouse. The access component includes all different applications (OLAP or data
mining applications) that make use of the information stored in the warehouse.


Additionally, a metadata management component (not shown in Figure 1) is responsible for the
management, definition and access of all different types of metadata. In general, metadata is
defined as data about data or data describing the meaning of data. In data warehousing, there
are various types of metadata, e.g., information about the operational sources, the structure and
semantics of the DWH data, the tasks performed during the construction, the maintenance and
access of a DWH, etc. The need for metadata is well known. Statements like A data warehouse
without adequate metadata is like a filing cabinet stuffed with papers, but without any folders or
labels characterize the situation. Thus, the quality of metadata and the resulting quality of
information gained using a data warehouse solution are tightly linked.

Implementing a concrete DWS is a complex task comprising two major phases. In the DWS
configuration phase, a conceptual view of the warehouse is first specified according to user
requirements (data warehouse design). Then, the involved data sources and the way data will be
extracted and loaded into the warehouse (data acquisition) is determined. Finally, decisions about
persistent storage of the warehouse using database technology and the various ways data will be
accessed during analysis are made.

After the initial load (the first load of the DWH according to the DWH configuration), during the
DWS operation phase, warehouse data must be regularly refreshed, i.e., modifications of
operational data since the last DWH refreshment must be propagated into the warehouse such that
data stored in the DWH reflect the state of the underlying operational systems. Besides DWH
refreshment, DWS operation includes further tasks like archiving and purging of DWH data or
DWH monitoring.

Q. What are the functional requirements for a data warehouse?

A data warehouse must be able to support various types of information applications.
Decision support processing is the principle type of information application in a data warehouse,
but the use of a data warehouse is not restricted to a decision support system.
It is possible that each information application has its own set of requirements in terms of data,
the way that data is modeled, and the way it is used.
The data warehouse is where these applications get their "consolidated data."
A data warehouse must consolidate primitive data and it must provide all facilities to derive
information from it, as required by the end-users. Detailed primitive data is of prime importance,

but data volumes tend to be big and users usually require information derived from the primitive
data. Data in a data warehouse must be organized such that it can be analyzed or explored from
different angles.

Analysis of the historical context (the time dimension) is of prime importance.
Examples of other important contextual dimensions are geography, organization, products,
suppliers, customers, and so on.

Q. What are the characteristics of a data warehouse?

Data in a data warehouse is organized as subject oriented rather than application oriented. It is
designed and constructed as a non-volatile store of business data, transactions and events. Data
warehouse is a logically integrated store of data originating from disparate operational sources.
It is the only source for deriving information needed by the end users. Several temporal modeling
styles are usually used in different areas of the data warehouse.

Q. What are the characteristics of the data in a data warehouse?

Data in the DWH is integrated from various, heterogeneous operational systems (like database
systems, flat files, etc.) and further external data sources (like demographic and statistical
databases, WWW, etc.). Before the integration, structural and semantic differences have to be
reconciled, i.e., data have to be homogenized according to a uniform data model. Furthermore,
data values from operational systems have to be cleaned in order to get correct data into the data
warehouse.

The need to access historical data (i.e., histories of warehouse data over a prolonged period of
time) is one of the primary incentives for adopting the data warehouse approach. Historical data
are necessary for business trend analysis which can be expressed in terms of understanding the
differences between several views of the real-time data (e.g., profitability at the end of each
month). Maintaining historical data means that periodical snapshots of the corresponding
operational data are propagated and stored in the warehouse without overriding previous
warehouse states. However, the potential volume of historical data and the associated storage
costs must always be considered in relation to their potential business benefits.

Furthermore, warehouse data is mostly non-volatile, i.e., access to the DWH is typically read-
oriented. Modifications of the warehouse data takes place only when modifications of the source
data are propagated into the warehouse.

Finally, a data warehouse contains usually additional data, not explicitly stored in the operational
sources, but derived through some process from operational data (called also derived data). For
example, operational sales data could be stored in several aggregation levels (weekly, monthly,
quarterly sales) in the warehouse.

Q. When should a company consider implementing a data warehouse?

Data warehouses or a more focused database called a data mart should be considered when a
significant number of potential users are requesting access to a large amount of related historical
information for analysis and reporting purposes. So-called active or real-time data warehouses
can provide advanced decision support capabilities.

Q. What data is stored in a data warehouse?

In general, organized data about business transactions and business operations is stored in a data
warehouse. But, any data used to manage a business or any type of data that has value to a
business should be evaluated for storage in the warehouse. Some static data may be compiled
for initial loading into the warehouse. Any data that comes from mainframe, client/server, or
web-based systems can then be periodically loaded into the warehouse. The idea behind a data
warehouse is to capture and maintain useful data in a central location. Once data is organized,
managers and analysts can use software tools like OLAP to link different types of data together
and potentially turn that data into valuable information that can be used for a variety of business
decision support needs, including analysis, discovery, reporting and planning.

Q. Database administrators (DBAs) have always said that having non-normalized or de-
normalized data is bad. Why is de-normalized data now okay when it's used for Decision
Support?

Normalization of a relational database for transaction processing avoids processing anomalies and
results in the most efficient use of database storage. A data warehouse for Decision Support is not
intended to achieve these same goals. For Data-driven Decision Support, the main concern is to
provide information to the user as fast as possible. Because of this, storing data in a de-
normalized fashion, including storing redundant data and pre-summarizing data, provides the best
retrieval results. Also, data warehouse data is usually static so anomolies will not occur from
operations like add, delete and update a record or field.

Q. How often should data be loaded into a data warehouse from transaction processing and other
source systems?

It all depends on the needs of the users, how fast data changes and the volume of information that
is to be loaded into the data warehouse. It is common to schedule daily, weekly or monthly
dumps from operational data stores during periods of low activity (for example, at night or on
weekends). The longer the gap between loads, the longer the processing times for the load when it
does run. A technical IS/IT staffer should make some calculations and consult with potential
users to develop a schedule to load new data.

Q. What are the benefits of data warehousing?

Some of the potential benefits of putting data into a data warehouse include:
1. Improving turnaround time for data access and reporting;
2. Standardizing data across the organization so there will be one view of the "truth";
3. Merging data from various source systems to create a more comprehensive information
source;
4. Lowering costs to create and distribute information and reports;
5. Sharing data and allowing others to access and analyze the data;
6. Encouraging and improving fact-based decision making.

Q. What are the limitations of data warehousing?

The major limitations associated with data warehousing are related to user expectations, lack of
data and poor data quality. Building a data warehouse creates some unrealistic expectations that
need to be managed. A data warehouse doesn't meet all decision support needs. If needed data is
not currently collected, transaction systems need to be altered to collect the data. If data quality is
a problem, the problem should be corrected in the source system before the data warehouse is
built. Software can provide only limited support for cleaning and transforming data. Missing and

inaccurate data can not be "fixed" using software. Historical data can be collected manually,
coded and "fixed", but at some point source systems need to provide quality data that can be
loaded into the data warehouse without manual clerical intervention.

Q. How does my company get started with data warehousing?

Build one! The easiest way to get started with data warehousing is to analyze some existing
transaction processing systems and see what type of historical trends and comparisons might be
interesting to examine to support decision making. See if there is a "real" user need for
integrating the data. If there is, then IS/IT staff can develop a data model for a new schema and
load it with some current data and start creating a decision support data store using a database
management system (DBMS). Find some software for query and reporting and build a decision
support interface that's easy to use. Although the initial data warehouse/data-driven DSS may
seem to meet only limited needs, it is a "first step". Start small and build more sophisticated
systems based upon experience and successes.
Q. What is the difference between OLTP database and data warehouse database?
Q. Why should the OLTP database different from data warehouse database?
OLTP and data warehousing require two very differently configured systems
Isolation of Production System from Business Intelligence System
Significant and highly variable resource demands of the data warehouse
Cost of disk space no longer a concern
Production systems not designed for query processing
Data warehouse usually contains historical data that is derived from transaction data, but it can
include data from other sources. Having separate databases will separate analysis workload from
transaction workload and enables an organization to consolidate data from several sources.
Q. What is the main difference between Data Warehousing and Business Intelligence?
The differentials are:

DW - is a way of storing data and creating information through leveraging data marts. DM's are
segments or categories of information and/or data that are grouped together to provide
'information' into that segment or category. DW does not require BI to work. Reporting tools can
generate reports from the DW.

BI - is the leveraging of DW to help make business decisions and recommendations. Information
and data rules engines are leveraged here to help make these decisions along with statistical
analysis tools and data mining tools.

Q. What is data modeling?

Q. What are the different steps for data modeling?

Q. What are the data modeling tools you have used? (Polaris)

Q. What is a Physical data model?


During the physical design process, you convert the data gathered during the logical design phase
into a description of the physical database, including tables and constraints.

Q. What is a Logical data model?

A logical design is a conceptual and abstract design. We do not deal with the physical
implementation details yet; we deal only with defining the types of information that we need.
The process of logical design involves arranging data into a series of logical relationships called
entities and attributes.

Q. What are an Entity, Attribute and Relationship?

An entity represents a chunk of information. In relational databases, an entity often maps to a
table.

An attribute is a component of an entity and helps define the uniqueness of the entity. In
relational databases, an attribute maps to a column.

The entities are linked together using relationships.

Q. What are the different types of Relationships?

Entity-Relationship.

Q. What is the difference between Cardinality and Nullability?

Q. What is Forward, Reverse and Re-engineering?

Q. What is meant by Normalization and De-normalization?

Q. What are the different forms of Normalization?

Q. What is an ETL or ETT? And what are the different types?

ETL is the Data Warehouse acquisition processes of Extracting, Transforming (or Transporting)
and Loading (ETL) data from source systems into the data warehouse.
E.g. Oracle Warehouse Builder, Powermart.

Q. Explain the Extraction process? (Polaris, Mascot)

Q. How do you extract data from different data sources explain with an example? (Polaris)

Q. What are the reporting tools you have used? What is the difference between them? (Polaris)

Q. How do you automate Extraction process? (Polaris)

Q. Without using ETL tool can u prepare a Data Warehouse and maintain? (Polaris)

Q. How do you identify the changed records in operational data (Polaris)


Q. What is a Star Schema?
A star schema is a set of tables comprised of a single, central fact table surrounded by de-
normalized dimensions. Each dimension is represented in a single table. Star schema implement
dimensional data structures with denormalized dimensions. Snowflake schema is an alternative
to star schema. A relational database schema for representing multidimensional data. The data is
stored in a central fact table, with one or more tables holding information on each dimension.
Dimensions have levels, and all levels are usually shown as columns in each dimension table.

Q. What is a Snowflake Schema?
A snowflake schema is a set of tables comprised of a single, central fact table surrounded by
normalized dimension hierarchies. Each dimension level is represented in a table. Snowflake
schema implements dimensional data structures with fully normalized dimensions. Star schema is
an alternative to snowflake schema.
An example would be to break down the Time dimension and create tables for each level; years,
quarters, months; weeks, days These additional branches on the ERD create ore of a
Snowflake shape then Star.
Q. What is Very Large Database?

Q. What are SMP and MPP?

Symmetric multi-processors (SMP)

Q. What is data mining?

Data Mining is the process of automated extraction of predictive information from large
databases. It predicts future trends and finds behaviour that the experts may miss as it lies beyond
their expectations. Data Mining is part of a larger process called knowledge discovery;
specifically, the step in which advanced statistical analysis and modeling techniques are applied
to the data to find useful patterns and relationships.

Data mining can be defined as "a decision support process in which we search for patterns of
information in data." This search may be done just by the user, i.e. just by performing queries, in
which case it is quite hard and in most of the cases not comprehensive enough to reveal intricate
patterns. Data mining uses sophisticated statistical analysis and modeling techniques to uncover
such patterns and relationships hidden in organizational databases patterns that ordinary
methods might miss. Once found, the information needs to be presented in a suitable form, with
graphs, reports, etc.

Q. What is an OLAP? (Mascot)

OLAP is software for manipulating multidimensional data from a variety of sources. The data is
often stored in data warehouse. OLAP software helps a user create queries, views, representations
and reports. OLAP tools can provide a "front-end" for a data-driven DSS.
On-Line Analytical Processing (OLAP) is a category of software technology that enables
analysts, managers and executives to gain insight into data through fast, consistent,
interactive access to a wide variety of possible views of information that has been

transformed from raw data to reflect the real dimensionality of the enterprise as
understood by the user.
OLAP functionality is characterized by dynamic multi-dimensional analysis of consolidated
enterprise data supporting end user analytical and navigational activities
Q. What are the Different types of OLAP's? What are their differences? (Mascot)

OLAP - Desktop OLAP(Cognos), ROLAP, MOLAP(Oracle Discoverer)

ROLAP, MOLAP and HOLAP are specialized OLAP (Online Analytical Analysis) applications.
ROLAP stands for Relational OLAP. Users see their data organized in cubes with dimensions,
but the data is really stored in a Relational Database (RDBMS) like Oracle. The RDBMS will
store data at a fine grain level, response times are usually slow.
MOLAP stands for Multidimensional OLAP. Users see their data organized in cubes with
dimensions, but the data is store in a Multi-dimensional database (MDBMS) like Oracle Express
Server. In a MOLAP system lot of queries have a finite answer and performance is usually
critical and fast.
HOLAP stands for Hybrid OLAP, it is a combination of both worlds. Seagate Software's Holos
is an example HOLAP environment. In a HOLAP system one will find queries on aggregated
data as well as on detailed data.
DOLAP
Q. What is the difference between data warehousing and OLAP?

The terms data warehousing and OLAP are often used interchangeably. As the definitions
suggest, warehousing refers to the organization and storage of data from a variety of sources so
that it can be analyzed and retrieved easily. OLAP deals with the software and the process of
analyzing data, managing aggregations, and partitioning information into cubes for in-depth
analysis, retrieval and visualization. Some vendors are replacing the term OLAP with the terms
analytical software and business intelligence.

Q. What are the facilities provided by data warehouse to analytical users?

Q. What are the facilities provided by OLAP to analytical users?

Q. What is a Histogram? How to generate statistics?

Q. In Erwin what are the different types of models (Honeywell)

Q. Many Suppliers Many Products Model the above scenario in Erwin. How many tables and
what do they contain (Honeywell)

Q. What are the options available in Erwin Tool box (Honeywell)

Q. Aggregate navigation


Q. What are the Data Warehouse Center administration functions?

The functions of Visual Warehouse administration are:

Creating Data Warehouse Center security groups.
Defining Data Warehouse Center privileges for that group.
Registering Data Warehouse Center users.
Adding Data Warehouse Center users to security groups.
Registering data sources.
Registering warehouses (targets).
Creating subjects.
Registering agents.
Registering Data Warehouse Center programs.

Q. How do I set the log level higher for more detailed information within Data Warehouse Center
7.2?

Within DWC, log level capability can be set from 0 to 4. There is a log level 5, yet it cannot be
turned on using the GUI, but must be turned on manually. A command line trace can be used for
any trace level, and this is the only way to turn on a level 5 trace:

Go to start, programs, IBM DB2, command line processor.

Connect to the control database:
db2 => connect to Control_Database_name

Update the configuration table:
db2 => update iwh.configuration set value_int = 5 where name = 'TRACELVL' and (component
= '<component name>')

Valid components are:

Logger trace = log
Agent trace = agent
Server trace = RTK
DDD = DDD
ODBC = VWOdbc

For multiple traces the format is:
db2 => update iwh.configuration set value_int = 5 where name = 'TRACELVL' and (component
= '<component name>' or component = '<component name>')

Reset the connection:
db2 => connect reset

Stop and restart the Warehouse server and logger.

Perform the failing operation.

Be sure to reset the trace level to 0 using the command line when you are done:

db2 => update iwh.configuration set value_int = 0 where name =
'TRACELVL'
and (component = '<component name>')

When you run a trace, the Data Warehouse Center writes information to text files. Data
Warehouse Center programs that are called from steps also write any trace information to this
directory. These files are located in the directory specified by the VWS_LOGGING environment
variable.

The default value of VWS_LOGGING is:

Windows and OS/2 = x:\sqllib\logging
UNIX = /var/IWH
AS/400 = /QIBM/UserData/IWH
For additional information, see basic logging function in the Data Warehouse Center
administration guide.

Q. What types of data sources does Data Warehouse Center support?

The Data Warehouse Center supports a wide variety of relational and non relational data sources.
You can populate your Data Warehouse Center warehouse with data from the following
databases and files:
Any DB2 family database
Oracle
Sybase
Informix
Microsoft SQL Server
IBM DataJoiner
Multiple Virtual Storage (OS/390), Virtual Machine (VM), and local area network (LAN) files
IMS and Virtual Storage Access Method (VSAM) (with Data Joiner Classic
Connect)

Q. What is the Data Warehouse Center control database?

When you install the warehouse server, the warehouse control database that you specify during
installation is initialized. Initialization is the process in which the Data Warehouse Center creates
the control tables that are required to store Data Warehouse Center metadata. If you have more
than one warehouse control database, you can use the Data Warehouse Center -->
Control Database Management window to initialize the second warehouse control database.
However, only one warehouse control database can be active at a time.

Q. What databases need to be registered as system ODBC data sources for the Data Warehouse
Center?

The Data Warehouse Center database that needs to be registered as system
ODBC data sources are:
source
target
control databases
1. What was the original business problem that led you to do this project?

Whether the consultant is being hired to gather requirements or to customize an OLAP
application, this question indicates that shes interested in the big picture. Shell keep the
answer in mind as she does her work, which is a measure of quality assurance.
2. Where are you in your current implementation process?
A consultant who asks this question knows not to make any assumptions about how much
progress youve made. She probably also understands that you might be wrong. There are
plenty of clients who have begun application development without having gathered
requirements. Understanding where the client thinks he is is just as important as
understanding where he wants to be. It also helps the consultant in making improvement
suggestions or recommendations for additional skills or technologies.
3. How long do you see this position being filled by an external resource?
While the question might seem self-serving at first, a good consultant is ever mindful of
his responsibility to render himself dispensable over time. Your answer will give him a
good idea of how much time he has to perform the work as well as to cross train
permanent staff within your organization. A variation on this question is: "Is there a
dedicated person or group targeted for knowledge transfer in this area?"
4. What deliverables do you expect from this engagement?
The consultant who doesnt ask about deliverables is the consultant who expects to sit
around giving advice. Beware of the "ivory tower" consultants, who are too light for
heavy work and too heavy for light work. Every consultant you talk to should expect to
produce some sort of deliverable, be it a requirements document, a data model, HTML, a
project plan, test procedures or a mission statement.
5. Would you like to talk to a past client or two?
The fact that a consultant would offer references is testimony that she knows her stuff.
Many do not. Those consultants who hide behind nondisclosures for not giving
references should be avoided. While its often valid to deny prospective clients work
samples because of confidentiality agreements, theres no good reason not to offer the
name and phone number of someone who will sing the consultants praises. Dont be
satisfied with a reference for the entire firm. Many good firms can employ below-average
consultants. Ask to talk to someone whos worked with the person or team youre
considering. Once youve hired that consultant and are happy with his work, offer to be a
reference. It comes around.

Doc3

Repository related Questions

Q. What is the difference between PowerCenter and PowerMart?
With PowerCenter, you receive all product functionality, including the ability to register multiple
servers, share metadata across repositories, and partition data.

repository, the core component of a data warehouse.
PowerMart includes all features except distributed metadata, multiple registered servers, and data
partitioning. Also, the various options available with PowerCenter (such as PowerCenter
Integration Server for BW, PowerConnect for IBM DB2, PowerConnect for IBM MQSeries,
PowerConnect for SAP R/3, PowerConnect for Siebel, and PowerConnect for PeopleSoft) are not
available with PowerMart.

Q. What are the new features and enhancements in PowerCenter 5.1?
The major features and enhancements to PowerCenter 5.1 are:
a) Performance Enhancements
High precision decimal arithmetic. The Informatica Server optimizes data throughput
to increase performance of sessions using the Enable Decimal Arithmetic option.
To_Decimal and Aggregate functions. The Informatica Server uses improved
algorithms to increase performance of To_Decimal and all aggregate functions such as
percentile, median, and average.
Cache management. The Informatica Server uses better cache management to increase
performance of Aggregator, Joiner, Lookup, and Rank transformations.
Partition sessions with sorted aggregation. You can partition sessions with Aggregator
transformation that use sorted input. This improves memory usage and increases
performance of sessions that have sorted data.
b) Relaxed Data Code Page Validation
When enabled, the Informatica Client and Informatica Server lift code page selection and
validation restrictions. You can select any supported code page for source, target, lookup,
and stored procedure data.
c) Designer Features and Enhancements
Debug mapplets. You can debug a mapplet within a mapping in the Mapping Designer.
You can set breakpoints in transformations in the mapplet.
Support for slash character (/) in table and field names. You can use the Designer to
import source and target definitions with table and field names containing the slash
character (/). This allows you to import SAP BW source definitions by connecting
directly to the underlying database tables.
d) Server Manager Features and Enhancements
Continuous sessions. You can schedule a session to run continuously. A continuous
session starts automatically when the Load Manager starts. When the session stops, it
restarts immediately without rescheduling. Use continuous sessions when reading real
time sources, such as IBM MQSeries.
Partition sessions with sorted aggregators. You can partition sessions with sorted
aggregators in a mapping.
Register multiple servers against a local repository. You can register multiple
PowerCenter Servers against a local repository.

Q. What is a repository?

The Informatica repository is a relational database that stores information, or metadata, used by
the Informatica Server and Client tools. The repository also stores administrative information
such as usernames and passwords, permissions and privileges, and product version.
We create and maintain the repository with the Repository Manager client tool. With the
Repository Manager, we can also create folders to organize metadata and groups to organize
users.
Q. What are different kinds of repository objects? And what it will contain?

Repository objects displayed in the Navigator can include sources, targets, transformations,
mappings, mapplets, shortcuts, sessions, batches, and session logs.

Q. What is a metadata?

Designing a data mart involves writing and storing a complex set of instructions. You need to
know where to get data (sources), how to change it, and where to write the information (targets).
PowerMart and PowerCenter call this set of instructions metadata. Each piece of metadata (for
example, the description of a source table in an operational database) can contain comments
about it.

In summary, Metadata can include information such as mappings describing how to transform
source data, sessions indicating when you want the Informatica Server to perform the
transformations, and connect strings for sources and targets.
Q. What are folders?
Folders let you organize your work in the repository, providing a way to separate different types
of metadata or different projects into easily identifiable areas.
Q. What is a Shared Folder?
A shared folder is one, whose contents are available to all other folders in the same repository. If
we plan on using the same piece of metadata in several projects (for example, a description of the
CUSTOMERS table that provides data for a variety of purposes), you might put that metadata in
the shared folder.
Q. What are mappings?
A mapping specifies how to move and transform data from sources to targets. Mappings include
source and target definitions and transformations. Transformations describe how the Informatica
Server transforms data. Mappings can also include shortcuts, reusable transformations, and
mapplets. Use the Mapping Designer tool in the Designer to create mappings.
Q. What are mapplets?

You can design a mapplet to contain sets of transformation logic to be reused in multiple
mappings within a folder, a repository, or a domain. Rather than recreate the same set of

transformations each time, you can create a mapplet containing the transformations, then add
instances of the mapplet to individual mappings. Use the Mapplet Designer tool in the Designer
to create mapplets.

Q. What are Transformations?

A transformation generates, modifies, or passes data through ports that you connect in a mapping
or mapplet. When you build a mapping, you add transformations and configure them to handle
data according to your business purpose. Use the Transformation Developer tool in the Designer
to create transformations.

Q. What are Reusable transformations?

You can design a transformation to be reused in multiple mappings within a folder, a repository,
or a domain. Rather than recreate the same transformation each time, you can make the
transformation reusable, then add instances of the transformation to individual mappings. Use the
Transformation Developer tool in the Designer to create reusable transformations.

Q. What are Sessions and Batches?

Sessions and batches store information about how and when the Informatica Server moves data
through mappings. You create a session for each mapping you want to run. You can group several
sessions together in a batch. Use the Server Manager to create sessions and batches.

Q. What are Shortcuts?

We can create shortcuts to objects in shared folders. Shortcuts provide the easiest way to reuse
objects. We use a shortcut as if it were the actual object, and when we make a change to the
original object, all shortcuts inherit the change.

Shortcuts to folders in the same repository are known as local shortcuts. Shortcuts to the global
repository are called global shortcuts.

We use the Designer to create shortcuts.

Q. What are Source definitions?

Detailed descriptions of database objects (tables, views, synonyms), flat files, XML files, or
Cobol files that provide source data. For example, a source definition might be the complete
structure of the EMPLOYEES table, including the table name, column names and datatypes, and
any constraints applied to these columns, such as NOT NULL or PRIMARY KEY. Use the
Source Analyzer tool in the Designer to import and create source definitions.

Q. What are Target definitions?

Detailed descriptions for database objects, flat files, Cobol files, or XML files to receive
transformed data. During a session, the Informatica Server writes the resulting data to session
targets. Use the Warehouse Designer tool in the Designer to import or create target definitions.

Q. What is Dynamic Data Store?

The need to share data is just as pressing as the need to share metadata. Often, several data marts
in the same organization need the same information. For example, several data marts may need to
read the same product data from operational sources, perform the same profitability calculations,
and format this information to make it easy to review.
If each data mart reads, transforms, and writes this product data separately, the throughput for the
entire organization is lower than it could be. A more efficient approach would be to read,
transform, and write the data to one central data store shared by all data marts. Transformation is
a processing-intensive task, so performing the profitability calculations once saves time.
Therefore, this kind of dynamic data store (DDS) improves throughput at the level of the entire
organization, including all data marts. To improve performance further, you might want to
capture incremental changes to sources. For example, rather than reading all the product data each
time you update the DDS, you can improve performance by capturing only the inserts, deletes,
and updates that have occurred in the PRODUCTS table since the last time you updated the DDS.
The DDS has one additional advantage beyond performance: when you move data into the DDS,
you can format it in a standard fashion. For example, you can prune sensitive employee data that
should not be stored in any data mart. Or you can display date and time values in a standard
format. You can perform these and other data cleansing tasks when you move data into the DDS
instead of performing them repeatedly in separate data marts.
Q. When should you create the dynamic data store? Do you need a DDS at all?
To decide whether you should create a dynamic data store (DDS), consider the following issues:
How much data do you need to store in the DDS? The one principal advantage of data
marts is the selectivity of information included in it. Instead of a copy of everything
potentially relevant from the OLTP database and flat files, data marts contain only the
information needed to answer specific questions for a specific audience (for example,
sales performance data used by the sales division). A dynamic data store is a hybrid of
the galactic warehouse and the individual data mart, since it includes all the data needed
for all the data marts it supplies. If the dynamic data store contains nearly as much
information as the OLTP source, you might not need the intermediate step of the dynamic
data store. However, if the dynamic data store includes substantially less than all the data
in the source databases and flat files, you should consider creating a DDS staging area.

What kind of standards do you need to enforce in your data marts? Creating a DDS
is an important technique in enforcing standards. If data marts depend on the DDS for
information, you can provide that data in the range and format you want everyone to use.
For example, if you want all data marts to include the same information on customers,
you can put all the data needed for this standard customer profile in the DDS. Any data
mart that reads customer data from the DDS should include all the information in this
profile.

How often do you update the contents of the DDS? If you plan to frequently update
data in data marts, you need to update the contents of the DDS at least as often as you
update the individual data marts that the DDS feeds. You may find it easier to read data
directly from source databases and flat file systems if it becomes burdensome to update
the DDS fast enough to keep up with the needs of individual data marts. Or, if particular

data marts need updates significantly faster than others, you can bypass the DDS for
these fast update data marts.

Is the data in the DDS simply a copy of data from source systems, or do you plan to
reformat this information before storing it in the DDS? One advantage of the dynamic
data store is that, if you plan on reformatting information in the same fashion for several
data marts, you only need to format it once for the dynamic data store. Part of this
question is whether you keep the data normalized when you copy it to the DDS.

How often do you need to join data from different systems? On occasion, you may
need to join records queried from different databases or read from different flat file
systems. The more frequently you need to perform this type of heterogeneous join, the
more advantageous it would be to perform all such joins within the DDS, then make the
results available to all data marts that use the DDS as a source.
Q. What is a Global repository?

The centralized repository in a domain, a group of connected repositories. Each domain can
contain one global repository. The global repository can contain common objects to be shared
throughout the domain through global shortcuts. Once created, you cannot change a global
repository to a local repository. You can promote an existing local repository to a global
repository.

Q. What is Local Repository?

Each local repository in the domain can connect to the global repository and use objects in its
shared folders. A folder in a local repository can be copied to other local repositories while
keeping all local and global shortcuts intact.

Q. What are the different types of locks?
There are five kinds of locks on repository objects:
Read lock. Created when you open a repository object in a folder for which you do not
have write permission. Also created when you open an object with an existing write lock.
Write lock. Created when you create or edit a repository object in a folder for which you
have write permission.
Execute lock. Created when you start a session or batch, or when the Informatica Server
starts a scheduled session or batch.
Fetch lock. Created when the repository reads information about repository objects from
the database.
Save lock. Created when you save information to the repository.
Q. After creating users and user groups, and granting different sets of privileges, I find that
none of the repository users can perform certain tasks, even the Administrator.
Repository privileges are limited by the database privileges granted to the database user who
created the repository. If the database user (one of the default users created in the Administrators
group) does not have full database privileges in the repository database, you need to edit the
database user to allow all privileges in the database.

Q. I created a new group and removed the Browse Repository privilege from the group.
Why does every user in the group still have that privilege?
Privileges granted to individual users take precedence over any group restrictions. Browse
Repository is a default privilege granted to all new users and groups. Therefore, to remove the
privilege from users in a group, you must remove the privilege from the group, and every user in
the group.
Q. I do not want a user group to create or edit sessions and batches, but I need them to
access the Server Manager to stop the Informatica Server.
To permit a user to access the Server Manager to stop the Informatica Server, you must grant
them both the Create Sessions and Batches, and Administer Server privileges. To restrict the user
from creating or editing sessions and batches, you must restrict the user's write permissions on a
folder level.
Alternatively, the user can use pmcmd to stop the Informatica Server with the Administer Server
privilege alone.
Q. How does read permission affect the use of the command line program, pmcmd?
To use pmcmd, you do not need to view a folder before starting a session or batch within the
folder. Therefore, you do not need read permission to start sessions or batches with pmcmd. You
must, however, know the exact name of the session or batch and the folder in which it exists.
With pmcmd, you can start any session or batch in the repository if you have the Session Operator
privilege or execute permission on the folder.
Q. My privileges indicate I should be able to edit objects in the repository, but I cannot edit
any metadata.
You may be working in a folder with restrictive permissions. Check the folder permissions to see
if you belong to a group whose privileges are restricted by the folder owner.
Q. I have the Administer Repository Privilege, but I cannot access a repository using the
Repository Manager.
To perform administration tasks in the Repository Manager with the Administer Repository
privilege, you must also have the default privilege Browse Repository. You can assign Browse
Repository directly to a user login, or you can inherit Browse Repository from a group.

Questions related to Server Manager

Q. What is Event-Based Scheduling?

When you use event-based scheduling, the Informatica Server starts a session when it locates the
specified indicator file. To use event-based scheduling, you need a shell command, script, or
batch file to create an indicator file when all sources are available. The file must be created or
sent to a directory local to the Informatica Server. The file can be of any format recognized by the
Informatica Server operating system. The Informatica Server deletes the indicator file once the
session starts.
Use the following syntax to ping the Informatica Server on a UNIX system:
pmcmd ping [{user_name | %user_env_var} {password | %password_env_var}]
[hostname:]portno
Use the following syntax to start a session or batch on a UNIX system:
pmcmd start {user_name | %user_env_var} {password | %password_env_var} [hostname:]portno
[folder_name:]{session_name | batch_name} [:pf=param_file] session_flag wait_flag
Use the following syntax to stop a session or batch on a UNIX system:
pmcmd stop {user_name | %user_env_var} {password | %password_env_var}
[hostname:]portno[folder_name:]{session_name | batch_name} session_flag
Use the following syntax to stop the Informatica Server on a UNIX system:
pmcmd stopserver {user_name | %user_env_var} {password | %password_env_var}
[hostname:]portno
Q. What are the different types of Commit intervals?

The different commit intervals are:
Target-based commit. The Informatica Server commits data based on the number of
target rows and the key constraints on the target table. The commit point also depends on
the buffer block size and the commit interval.
Source-based commit. The Informatica Server commits data based on the number of
source rows. The commit point is the commit interval you configure in the session
properties.

Designer Questions
Q. What are the tools provided by Designer?
The Designer provides the following tools:
Source Analyzer. Use to import or create source definitions for flat file, XML, Cobol,
ERP, and relational sources.
Warehouse Designer. Use to import or create target definitions.

Transformation Developer. Use to create reusable transformations.
Mapplet Designer. Use to create mapplets.
Mapping Designer. Use to create mappings.
Q. What is a transformation?
A transformation is a repository object that generates, modifies, or passes data. You configure
logic in a transformation that the Informatica Server uses to transform data. The Designer
provides a set of transformations that perform specific functions. For example, an Aggregator
transformation performs calculations on groups of data.
Each transformation has rules for configuring and connecting in a mapping. For more information
about working with a specific transformation, refer to the chapter in this book that discusses that
particular transformation.
You can create transformations to use once in a mapping, or you can create reusable
transformations to use in multiple mappings.
Q. What are the different types of Transformations? (Mascot)

a) Aggregator transformation: The Aggregator transformation allows you to perform aggregate
calculations, such as averages and sums. The Aggregator transformation is unlike the Expression
transformation, in that you can use the Aggregator transformation to perform calculations on
groups. The Expression transformation permits you to perform calculations on a row-by-row
basis only. (Mascot)

b) Expression transformation: You can use the Expression transformations to calculate values
in a single row before you write to the target. For example, you might need to adjust employee
salaries, concatenate first and last names, or convert strings to numbers. You can use the
Expression transformation to perform any non-aggregate calculations. You can also use the
Expression transformation to test conditional statements before you output the results to target
tables or other transformations.

c) Filter transformation: The Filter transformation provides the means for filtering rows in a
mapping. You pass all the rows from a source transformation through the Filter transformation,
and then enter a filter condition for the transformation. All ports in a Filter transformation are
input/output, and only rows that meet the condition pass through the Filter transformation.

d) Joiner transformation: While a Source Qualifier transformation can join data originating
from a common source database, the Joiner transformation joins two related heterogeneous
sources residing in different locations or file systems.
e) Lookup transformation: Use a Lookup transformation in your mapping to look up data in a
relational table, view, or synonym. Import a lookup definition from any relational database to
which both the Informatica Client and Server can connect. You can use multiple Lookup
transformations in a mapping.
The Informatica Server queries the lookup table based on the lookup ports in the transformation.
It compares Lookup transformation port values to lookup table column values based on the
lookup condition. Use the result of the lookup to pass to other transformations and the target.

Q. What is the difference between Aggregate and Expression Transformation? (Mascot)

Q. What is Update Strategy?

When we design our data warehouse, we need to decide what type of information to store in
targets. As part of our target table design, we need to determine whether to maintain all the
historic data or just the most recent changes.
The model we choose constitutes our update strategy, how to handle changes to existing records.

Update strategy flags a record for update, insert, delete, or reject. We use this transformation
when we want to exert fine control over updates to a target, based on some condition we apply.
For example, we might use the Update Strategy transformation to flag all customer records for
update when the mailing address has changed, or flag all employee records for reject for people
no longer working for the company.

Q. Where do you define update strategy?

We can set the Update strategy at two different levels:
Within a session. When you configure a session, you can instruct the Informatica Server
to either treat all records in the same way (for example, treat all records as inserts), or use
instructions coded into the session mapping to flag records for different database
operations.
Within a mapping. Within a mapping, you use the Update Strategy transformation to
flag records for insert, delete, update, or reject.
Q. What are the advantages of having the Update strategy at Session Level?
Q. What is a lookup table? (KPIT Infotech, Pune)
The lookup table can be a single table, or we can join multiple tables in the same database using a
lookup query override. The Informatica Server queries the lookup table or an in-memory cache of
the table for all incoming rows into the Lookup transformation.

If your mapping includes heterogeneous joins, we can use any of the mapping sources or mapping
targets as the lookup table.

Q. What is a Lookup transformation and what are its uses?

We use a Lookup transformation in our mapping to look up data in a relational table, view or
synonym.

We can use the Lookup transformation for the following purposes:

Get a related value. For example, if our source table includes employee ID, but
we want to include the employee name in our target table to make our summary
data easier to read.
Perform a calculation. Many normalized tables include values used in a
calculation, such as gross sales per invoice or sales tax, but not the calculated
value (such as net sales).

Update slowly changing dimension tables. We can use a Lookup transformation
to determine whether records already exist in the target.
Q. What are connected and unconnected Lookup transformations?
We can configure a connected Lookup transformation to receive input directly from the mapping
pipeline, or we can configure an unconnected Lookup transformation to receive input from the
result of an expression in another transformation.

An unconnected Lookup transformation exists separate from the pipeline in the mapping. We
write an expression using the :LKP reference qualifier to call the lookup within another
transformation.

A common use for unconnected Lookup transformations is to update slowly changing dimension
tables.
Q. What is the difference between connected lookup and unconnected lookup?
Differences between Connected and Unconnected Lookups:

Receives input values directly from the
pipeline.
Receives input values from the result of
a :LKP expression in another
transformation.
We can use a dynamic or static cache We can use a static cache
Supports user-defined default values Does not support user-defined default
values
Q. What is Sequence Generator Transformation? (Mascot)
The Sequence Generator transformation generates numeric values. We can use the Sequence
Generator to create unique primary key values, replace missing primary keys, or cycle through a
sequential range of numbers.

The Sequence Generation transformation is a connected transformation. It contains two output
ports that we can connect to one or more transformations.

Q. What are the uses of a Sequence Generator transformation?

We can perform the following tasks with a Sequence Generator transformation:
o Create keys
o Replace missing values
o Cycle through a sequential range of numbers

Q. What are the advantages of Sequence generator? Is it necessary, if so why?


We can make a Sequence Generator reusable, and use it in multiple mappings. We might reuse a
Sequence Generator when we perform multiple loads to a single target.

For example, if we have a large input file that we separate into three sessions running in parallel,
we can use a Sequence Generator to generate primary key values. If we use different Sequence
Generators, the Informatica Server might accidentally generate duplicate key values. Instead, we
can use the same reusable Sequence Generator for all three sessions to provide a unique value for
each target row.

Q. How is the Sequence Generator transformation different from other transformations?

The Sequence Generator is unique among all transformations because we cannot add, edit, or
delete its default ports (NEXTVAL and CURRVAL).

Unlike other transformations we cannot override the Sequence Generator transformation
properties at the session level. This protecxts the integrity of the sequence values generated.

Q. What does Informatica do? How it is useful?

Q. What is the difference between Informatica version 1.7.2 and 1.7.3?

Q. What are the complex filters used till now in your applications?

Q. Feartures of Informatica

Q. Have you used Informatica? which version?

Q. How do you set up a schedule for data loading from scratch? describe step-by-step.

Q. How do you use mapplet?

Q. What are the different data source types you have used with Informatica?

Q. Is it possible to run one loading session with one particular target and multiple types of data
sources?

Q. What is a Join?

A join is a query that combines rows from two or more tables, views, or materialized views
("snapshots"). Oracle performs a join whenever multiple tables appear in the queries FROM
clause. The querys select list can select any columns from any of these tables. If any two of these
tables have a column name in common, you must qualify all references to these columns
throughout the query with table names to avoid ambiguity.

Q. What are join conditions?

Most join queries contain WHERE clause conditions that compare two columns, each from a
different table. Such a condition is called a join condition. To execute a join, Oracle combines
pairs of rows, each containing one row from each table, for which the join condition evaluates to
TRUE. The columns in the join conditions need not also appear in the select list.


Q. What is an equijoin?

An equijoin is a join with a join condition containing an equality operator. An equijoin combines
rows that have equivalent values for the specified columns.

Eg:
Select ename, job, dept.deptno, dname
From emp, dept
Where emp.deptno = dept.deptno;

Q. What are self joins?

A self join is a join of a table to itself. This table appears twice in the FROM clause and is
followed by table aliases that qualify column names in the join condition.

Eg:
SELECT e.ename || works for || e2.name Employees and their Managers
FROM emp e1, emp e2
WHERE e1.mgr = e2.empno;

ENAME EMPNO MGR
BLAKE 12345 67890
KING 67890 22446

Result: BLAKE works for KING

Q. What is an Outer Join?

An outer join extends the result of a simple join. An outer join returns all rows that satisfy the join
condition and those rows from one table for which no rows from the other satisfy the join
condition. Such rows are not returned by a simple join. To write a query that performs an outer
join of tables A and B and returns all rows from A, apply the outer join operator (+) to all
columns of B in the join condition.
For all rows in A that have no matching rows in B, Oracle returns null for any select list
expressions containing columns of B.
Outer join queries are subject to the following rules and restrictions:
The (+) operator can appear only in the WHERE clause or, in the context of left correlation
(that is, when specifying the TABLE clause) in the FROM clause, and can be applied only to
a column of a table or view.
If A and B are joined by multiple join conditions, you must use the (+) operator in all of these
conditions. If you do not, Oracle will return only the rows resulting from a simple join, but
without a warning or error to advise you that you do not have the results of an outer join.
The (+) operator can be applied only to a column, not to an arbitrary expression. However, an
arbitrary expression can contain a column marked with the (+) operator.
A condition containing the (+) operator cannot be combined with another condition using the
OR logical operator.
A condition cannot use the IN comparison operator to compare a column marked with the (+)
operator with an expression.
A condition cannot compare any column marked with the (+) operator with a subquery.


If the WHERE clause contains a condition that compares a column from table B with a constant,
the (+) operator must be applied to the column so that Oracle returns the rows from table A for
which it has generated NULLs for this column. Otherwise Oracle will return only the results of a
simple join.
In a query that performs outer joins of more than two pairs of tables, a single table can be the
null-generated table for only one other table. For this reason, you cannot apply the (+) operator to
columns of B in the join condition for A and B and the join condition for B and C.

Set Operators: UNION [ALL], INTERSECT, MINUS
Set operators combine the results of two component queries into a single result. Queries
containing set operators are called compound queries.
The number and datatypes of the columns selected by each component query must be the same,
but the column lengths can be different.

If you combine more than two queries with set operators, Oracle evaluates adjacent queries from
left to right. You can use parentheses to specify a different order of evaluation.

Restrictions:
These set operators are not valid on columns of type BLOB, CLOB, BFILE, varray, or nested
table.
The UNION, INTERSECT, and MINUS operators are not valid on LONG columns.
To reference a column, you must use an alias to name the column.
You cannot also specify the for_update_clause with these set operators.
You cannot specify the order_by_clause in the subquery of these operators.

All set operators have equal precedence. If a SQL statement contains multiple set operators,
Oracle evaluates them from the left to right if no parentheses explicitly specify another order.

The corresponding expressions in the select lists of the component queries of a compound query
must match in number and datatype. If component queries select character data, the datatype of
the return values are determined as follows:
If both queries select values of datatype CHAR, the returned values have datatype CHAR.
If either or both of the queries select values of datatype VARCHAR2, the returned values
have datatype VARCHAR2.

Q. What is a UNION?

The UNION operator eliminates duplicate records from the selected rows. We must match
datatype (using the TO_DATE and TO_NUMBER functions) when columns do not exist in one
or the other table.


Q. What is UNION ALL?

The UNION ALL operator does not eliminate duplicate selected rows.

Note: The UNION operator returns only distinct rows that appear in either result, while the
UNION ALL operator returns all rows.

Q. What is an INTERSECT?

The INTERSECT operator returns only those rows returned by both queries. It shows only the
distinct values from the rows returned by both queries.

Q. What is MINUS?

The MINUS operator returns only rows returned by the first query but not by the second. It also
eliminates the duplicates from the first query.

Note: For compound queries (containing set operators UNION, INTERSECT, MINUS, or
UNION ALL), the ORDER BY clause must use positions, rather than explicit expressions. Also,
the ORDER BY clause can appear only in the last component query. The ORDER BY clause
orders all rows returned by the entire compound query.

Q. How many types of Sql Statements are there in Oracle?

There are basically 6 types of sql statements. They are:

a) Data Definition Language (DDL)
The DDL statements define and maintain objects and drop objects.

b) Data Manipulation Language (DML)
The DML statements manipulate database data.

c) Transaction Control Statements
Manage change by DML

d) Session Control
Used to control the properties of current session enabling and disabling roles and
changing. E.g. Alter Statements, Set Role

e) System Control Statements
Change Properties of Oracle Instance. E.g. Alter System

f) Embedded Sql
Incorporate DDL, DML and TCS in Programming Language. E.g. Using the Sql
Statements in languages such as 'C', Open, Fetch, execute and close

Q) What is a Transaction in Oracle?


A transaction is a Logical unit of work that compromises one or more SQL Statements executed
by a single User. According to ANSI, a transaction begins with first executable statement and
ends when it is explicitly committed or rolled back.
A transaction is an atomic unit.

Q. What are some of the Key Words Used in Oracle?

Some of the Key words that are used in Oracle are:

a) Committing: A transaction is said to be committed when the transaction makes permanent
changes resulting from the SQL statements.

b) Rollback: A transaction that retracts any of the changes resulting from SQL statements in
Transaction.

c) SavePoint: For long transactions that contain many SQL statements, intermediate markers or
savepoints are declared. Savepoints can be used to divide a transaction into smaller points.

We can declare intermediate markers called savepoints within the context of a transaction.
Savepoints divide a long transaction into smaller parts. Using savepoints, we can arbitrarily mark
our work at any point within a long transaction. We then have the option later of rolling back
work performed before the current point in the transaction but after a declared savepoint within
the transaction.
For example, we can use savepoints throughout a long complex series of updates so that if we
make an error, we do not need to resubmit every statement.

d) Rolling Forward: Process of applying redo log during recovery is called rolling forward.

e) Cursor: A cursor is a handle (name or a pointer) for the memory associated with a specific
statement. A cursor is basically an area allocated by Oracle for executing the Sql Statement.
Oracle uses an implicit cursor statement for Single row query and Uses Explicit cursor for a multi
row query.

f) System Global Area (SGA): The SGA is a shared memory region allocated by the Oracle that
contains Data and control information for one Oracle Instance. It consists of Database Buffer
Cache and Redo log Buffer. (KPIT Infotech, Pune)

g) Program Global Area (PGA): The PGA is a memory buffer that contains data and control
information for server process.

g) Database Buffer Cache: Database Buffer of SGA stores the most recently used blocks of
database data. The set of database buffers in an instance is called Database Buffer Cache.

h) Redo log Buffer: Redo log Buffer of SGA stores all the redo log entries.

i) Redo Log Files: Redo log files are set of files that protect altered database data in memory that
has not been written to Data Files. They are basically used for backup when a database crashes.

j) Process: A Process is a 'thread of control' or mechanism in Operating System that executes
series of steps.


Q. What are Procedure, functions and Packages?

Procedures and functions consist of set of PL/SQL statements that are grouped together as a unit
to solve a specific problem or perform set of related tasks.
Procedures do not return values while Functions return one Value.

Packages: Packages provide a method of encapsulating and storing related procedures, functions,
variables and other Package Contents

Q. What are Database Triggers and Stored Procedures?

Database Triggers: Database Triggers are Procedures that are automatically executed as a result
of insert in, update to, or delete from table.
Database triggers have the values old and new to denote the old value in the table before it is
deleted and the new indicated the new value that will be used. DT is useful for implementing
complex business rules which cannot be enforced using the integrity rules. We can have the
trigger as Before trigger or After Trigger and at Statement or Row level.
e.g:: operations insert, update ,delete 3
before ,after 3*2 A total of 6 combinations
At statement level(once for the trigger) or row level( for every execution ) 6 * 2 A total
of 12.
Thus a total of 12 combinations are there and the restriction of usage of 12 triggers has been
lifted from Oracle 7.3 Onwards.

Stored Procedures: Stored Procedures are Procedures that are stored in Compiled form in the
database. The advantage of using the stored procedures is that many users can use the same
procedure in compiled and ready to use format.

Q. How many Integrity Rules are there and what are they?

There are Three Integrity Rules. They are as follows:

a) Entity Integrity Rule: The Entity Integrity Rule enforces that the Primary key cannot
be Null

b) Foreign Key Integrity Rule: The FKIR denotes that the relationship between the
foreign key and the primary key has to be enforced. When there is data in Child Tables
the Master tables cannot be deleted.

c) Business Integrity Rules: The Third Integrity rule is about the complex business
processes which cannot be implemented by the above 2 rules.

Q. What are the Various Master and Detail Relationships?

The various Master and Detail Relationship are

a) No Isolated : The Master cannot be deleted when a child is existing
b) Isolated : The Master can be deleted when the child is existing
c) Cascading : The child gets deleted when the Master is deleted.

Q. What are the Various Block Coordination Properties?


The various Block Coordination Properties are:

a) Immediate
Default Setting. The Detail records are shown when the Master Record are shown.

b) Deferred with Auto Query
Oracle Forms defer fetching the detail records until the operator navigates to the detail block.

c) Deferred with No Auto Query
The operator must navigate to the detail block and explicitly execute a query

Q. What are the Different Optimization Techniques?

The Various Optimization techniques are:

a) Execute Plan: we can see the plan of the query and change it accordingly based on the
indexes

b) Optimizer_hint: set_item_property
('DeptBlock',OPTIMIZER_HINT,'FIRST_ROWS');
Select /*+ First_Rows */ Deptno,Dname,Loc,Rowid from dept where (Deptno > 25)

c) Optimize_Sql: By setting the Optimize_Sql = No, Oracle Forms assigns a single cursor
for all SQL statements. This slow downs the processing because for every time the SQL
must be parsed whenever they are executed.
f45run module = my_firstform userid = scott/tiger optimize_sql = No

d) Optimize_Tp:
By setting the Optimize_Tp= No, Oracle Forms assigns seperate cursor only for each
query SELECT statement. All other SQL statements reuse the cursor.
f45run module = my_firstform userid = scott/tiger optimize_Tp = No

Q. How do u implement the If statement in the Select Statement?

We can implement the if statement in the select statement by using the Decode statement.
e.g select DECODE (EMP_CAT,'1','First','2','Second, Null);

Q. How many types of Exceptions are there?

There are 2 types of exceptions. They are:

a) System Exceptions
e.g. When no_data_found, When too_many_rows
b) User Defined Exceptions

e.g. My_exception exception
When My_exception then

Q. What are the inline and the precompiler directives?


The inline and precompiler directives detect the values directly.

Q. How do you use the same lov for 2 columns?

We can use the same lov for 2 columns by passing the return values in global values and using
the global values in the code.

Q. How many minimum groups are required for a matrix report?

The minimum number of groups in matrix report is 4.

Q. What is the difference between static and dynamic lov?

The static lov contains the predetermined values while the dynamic lov contains values that
come at run time.

Q. What are the OOPS concepts in Oracle?

Oracle does implement the OOPS concepts. The best example is the Property Classes. We can
categorize the properties by setting the visual attributes and then attach the property classes for
the objects. OOPS supports the concepts of objects and classes and we can consider the property
classes as classes and the items as objects

Q. What is the difference between candidate key, unique key and primary key?

Candidate keys are the columns in the table that could be the primary keys and the primary key is
the key that has been selected to identify the rows. Unique key is also useful for identifying the
distinct rows in the table.

Q. What is concurrency?

Q. Concurrency is allowing simultaneous access of same data by different users.
Locks useful for accessing the database are:
a) Exclusive
The exclusive lock is useful for locking the row when an insert, update or delete is being
done. This lock should not be applied when we do only select from the row.
b) Share lock
We can do the table as Share_Lock and as many share_locks can be put on the same
resource.

Q. What are Privileges and Grants?

Privileges are the right to execute a particular type of SQL statements.
E.g. Right to Connect, Right to create, Right to resource

Grants are given to the objects so that the object might be accessed accordingly. The grant has to
be given by the owner of the object.

Q. What are Table Space, Data Files, Parameter File and Control Files?

Table Space: The table space is useful for storing the data in the database.


When a database is created two table spaces are created.
a) System Table space: This data file stores all the tables related to the system and dba
tables
b) User Table space: This data file stores all the user related tables

We should have separate table spaces for storing the tables and indexes so that the access is fast.

Data Files: Every Oracle Data Base has one or more physical data files. They store the data for
the database. Every data file is associated with only one database. Once the Data file is created
the size cannot change. To increase the size of the database to store more data we have to add
data file.
Parameter Files: Parameter file is needed to start an instance.A parameter file contains the list of
instance configuration parameters.
e.g.
db_block_buffers = 500
db_name = ORA7
db_domain = u.s.acme lang

Control Files: Control files record the physical structure of the data files and redo log files
They contain the Db name, name and location of dbs, data files, redo log files and time stamp.

Q. Some of the terms related to Physical Storage of the Data.

The finest level of granularity of the data base is the data blocks.

Data Block : One Data Block correspond to specific number of physical database space

Extent : Extent is the number of specific number of contiguous data blocks.

Segments : Set of Extents allocated for Extents. There are three types of Segments.

a) Data Segment: Non Clustered Table has data segment data of every table is stored in
cluster data segment
b) Index Segment: Each Index has index segment that stores data
c) Roll Back Segment: Temporarily store 'undo' information

Q. What are the Pct Free and Pct Used?

Pct Free is used to denote the percentage of the free space that is to be left when creating a table.
Similarly Pct Used is used to denote the percentage of the used space that is to be used when
creating a table
E.g. Pctfree 20, Pctused 40

Q. What is Row Chaining?

The data of a row in a table may not be able to fit the same data block. Data for row is stored in a
chain of data blocks.

Q. What is a 2 Phase Commit?


Two Phase commit is used in distributed data base systems. This is useful to maintain the
integrity of the database so that all the users see the same values. It contains DML statements or
Remote Procedural calls that reference a remote object.
There are basically 2 phases in a 2 phase commit.
a) Prepare Phase: Global coordinator asks participants to prepare
b) Commit Phase: Commit all participants to coordinator to Prepared, Read only or abort
Reply

A two-phase commit mechanism guarantees that all database servers participating in a distributed
transaction either all commit or all roll back the statements in the transaction. A two-phase
commit mechanism also protects implicit DML operations performed by integrity constraints,
remote procedure calls, and triggers.

Q. What is the difference between deleting and truncating of tables?
Deleting a table will not remove the rows from the table but entry is there in the database
dictionary and it can be retrieved But truncating a table deletes it completely and it cannot be
retrieved.

Q. What are mutating tables?

When a table is in state of transition it is said to be mutating. E.g. If a row has been deleted then
the table is said to be mutating and no operations can be done on the table except select.

Q. What are Codd Rules?

Codd Rules describe the ideal nature of a RDBMS. No RDBMS satisfies all the 12 codd rules and
Oracle Satisfies 11 of the 12 rules and is the only RDBMS to satisfy the maximum number of
rules.

Q. What is Normalization?

Normalization is the process of organizing the tables to remove the redundancy. There are mainly
5 Normalization rules.
1 Normal Form - A table is said to be in 1st Normal Form when the
attributes are atomic
2 Normal Form - A table is said to be in 2nd Normal Form when all the
candidate keys are dependant on the primary key
3rd Normal Form - A table is said to be third Normal form when it is not
dependant transitively

Q. What is the Difference between a post query and a pre query?

A post query will fire for every row that is fetched but the pre query will fire only once.

Q. How can we delete the duplicate rows in the table?

We can delete the duplicate rows in the table by using the Rowid.

Q. Can U disable database trigger? How?

Yes. With respect to table

ALTER TABLE TABLE
[ DISABLE all_trigger ]

Q. What are pseudocolumns? Name them?

A pseudocolumn behaves like a table column, but is not actually stored in the table. You can select from
pseudocolumns, but you cannot insert, update, or delete their values. This section describes these
pseudocolumns:
* CURRVAL
* NEXTVAL
* LEVEL
* ROWID
* ROWNUM

Q. How many columns can table have?

The number of columns in a table can range from 1 to 254.

Q. Is space acquired in blocks or extents?

In extents.

Q. What is clustered index?

In an indexed cluster, rows are stored together based on their cluster key values. Can not be applied for
HASH.

Q. What are the datatypes supported By oracle (INTERNAL)?

Varchar2, Number, Char, MLSLABEL.

Q. What are attributes of cursor?

%FOUND , %NOTFOUND , %ISOPEN,%ROWCOUNT

Q. Can you use select in FROM clause of SQL select ?

Yes.

Q. Describe the difference between a procedure, function and anonymous pl/sql block.

Candidate should mention use of DECLARE statement, a function must return a value while a
procedure doesnt have to.

Q. What is a mutating table error and how can you get around it?

This happens with triggers. It occurs because the trigger is trying to modify a row it is currently
using. The usual fix involves either use of views or temporary tables so the database is selecting
from one while updating the other.


Q. Describe the use of %ROWTYPE and %TYPE in PL/SQL.

%ROWTYPE allows you to associate a variable with an entire table row. The %TYPE associates
a variable with a single column type.

Q. What packages (if any) has Oracle provided for use by developers?

Oracle provides the DBMS_ series of packages. There are many which developers should be
aware of such as DBMS_SQL, DBMS_PIPE, DBMS_TRANSACTION, DBMS_LOCK,
DBMS_ALERT, DBMS_OUTPUT, DBMS_JOB, DBMS_UTILITY, DBMS_DDL, UTL_FILE.
If they can mention a few of these and describe how they used them, even better. If they include
the SQL routines provided by Oracle, great, but not really what was asked.

Q. Describe the use of PL/SQL tables.

PL/SQL tables are scalar arrays that can be referenced by a binary integer. They can be used to
hold values for use in later queries or calculations. In Oracle 8 they will be able to be of the
%ROWTYPE designation, or RECORD.

Q. When is a declare statement needed?

The DECLARE statement is used in PL/SQL anonymous blocks such as with stand alone, non-
stored PL/SQL procedures. It must come first in a PL/SQL standalone file if it is used.

Q. In what order should a open/fetch/loop set of commands in a PL/SQL block be
implemented if you use the %NOTFOUND cursor variable in the exit when statement?
Why?

OPEN then FETCH then LOOP followed by the exit when. If not specified in this order will
result in the final return being done twice because of the way the %NOTFOUND is handled by
PL/SQL.

Q. What are SQLCODE and SQLERRM and why are they important for PL/SQL
developers?

SQLCODE returns the value of the error number for the last error encountered. The SQLERRM
returns the actual error message for the last error encountered. They can be used in exception
handling to report, or, store in an error log table, the error that occurred in the code. These are
especially useful for the WHEN OTHERS exception.

Q. How can you find within a PL/SQL block, if a cursor is open?

Use the %ISOPEN cursor status variable.


Q. How can you generate debugging output from PL/SQL?

Use the DBMS_OUTPUT package. Another possible method is to just use the SHOW ERROR
command, but this only shows errors. The DBMS_OUTPUT package can be used to show
intermediate results from loops and the status of variables as the procedure is executed. The new
package UTL_FILE can also be used.

Q. What are the types of triggers?

There are 12 types of triggers in PL/SQL that consist of combinations of the BEFORE, AFTER,
ROW, TABLE, INSERT, UPDATE, DELETE and ALL key words:

BEFORE ALL ROW INSERT
AFTER ALL ROW INSERT
BEFORE INSERT
AFTER INSERT
etc.

Q. How can variables be passed to a SQL routine?

By use of the & or double && symbol. For passing in variables numbers can be used (&1,
&2,...,&8) to pass the values after the command into the SQLPLUS session. To be prompted for a
specific variable, place the ampersanded variable in the code itself:
select * from dba_tables where owner=&owner_name; . Use of double ampersands tells
SQLPLUS to resubstitute the value for each subsequent use of the variable, a single ampersand
will cause a reprompt for the value unless an ACCEPT statement is used to get the value from the
user.

Q. You want to include a carriage return/linefeed in your output from a SQL script, how
can you do this?

The best method is to use the CHR() function (CHR(10) is a return/linefeed) and the
concatenation function ||. Another method, although it is hard to document and isnt always
portable is to use the return/linefeed as a part of a quoted string.

Q. How can you call a PL/SQL procedure from SQL?

By use of the EXECUTE (short form EXEC) command. You can also wrap the call in a BEGIN
END block and treat it as an anonymous PL/SQL block.

Q. How do you execute a host operating system command from within SQL?

By use of the exclamation point ! (in UNIX and some other OS) or the HOST (HO) command.

Q. You want to use SQL to build SQL, what is this called and give an example?


This is called dynamic SQL. An example would be:

set lines 90 pages 0 termout off feedback off verify off
spool drop_all.sql
select drop user ||username|| cascade; from dba_users
where username not in (SYS,SYSTEM);
spool off

Essentially you are looking to see that they know to include a command (in this case DROP
USER...CASCADE;) and that you need to concatenate using the || the values selected from the
database.

Q. What SQLPlus command is used to format output from a select?

This is best done with the COLUMN command.

Q. You want to group the following set of select returns, what can you group on?
Max(sum_of_cost), min(sum_of_cost), count(item_no), item_no

The only column that can be grouped on is the item_no column, the rest have aggregate
functions associated with them.

Q. What special Oracle feature allows you to specify how the cost based system treats a
SQL statement?

The COST based system allows the use of HINTs to control the optimizer path selection. If they
can give some example hints such as FIRST ROWS, ALL ROWS, USING INDEX, STAR, even
better.

Q. You want to determine the location of identical rows in a table before attempting to place
a unique index on the table, how can this be done?

Oracle tables always have one guaranteed unique column, the rowid column. If you use a
min/max function against your rowid and then select against the proposed primary key you can
squeeze out the rowids of the duplicate rows pretty quick. For example:

select rowid from emp e
where e.rowid > (select min(x.rowid)
from emp x
where x.emp_no = e.emp_no);

In the situation where multiple columns make up the proposed key, they must all be used in the
where clause.


Q. What is a Cartesian product?

A Cartesian product is the result of an unrestricted join of two or more tables. The result set of a
three table Cartesian product will have x * y * z number of rows where x, y, z correspond to the
number of rows in each table involved in the join. This occurs if there are not at least n-1 joins
where n is the number of tables in a SELECT.

Q. You are joining a local and a remote table, the network manager complains about the
traffic involved, how can you reduce the network traffic?

Push the processing of the remote data to the remote instance by using a view to pre-select the
information for the join. This will result in only the data required for the join being sent across.

Q. What is the default ordering of an ORDER BY clause in a SELECT statement?

Ascending

Q. What is tkprof and how is it used?

The tkprof tool is a tuning tool used to determine cpu and execution times for SQL statements.
You use it by first setting timed_statistics to true in the initialization file and then turning on
tracing for either the entire database via the sql_trace parameter or for the session using the
ALTER SESSION command. Once the trace file is generated you run the tkprof tool against the
trace file and then look at the output from the tkprof tool. This can also be used to generate
explain plan output.

Q. What is explain plan and how is it used?

The EXPLAIN PLAN command is a tool to tune SQL statements. To use it you must have an
explain_table generated in the user you are running the explain plan for. This is created using the
utlxplan.sql script. Once the explain plan table exists you run the explain plan command giving as
its argument the SQL statement to be explained. The explain_plan table is then queried to see the
execution plan of the statement. Explain plans can also be run using tkprof.

Q. How do you set the number of lines on a page of output? The width?

The SET command in SQLPLUS is used to control the number of lines generated per page and
the width of those lines, for example SET PAGESIZE 60 LINESIZE 80 will generate reports that
are 60 lines long with a line width of 80 characters. The PAGESIZE and LINESIZE options can
be shortened to PAGES and LINES.

Q. How do you prevent output from coming to the screen?

The SET option TERMOUT controls output to the screen. Setting TERMOUT OFF turns off
screen output. This option can be shortened to TERM.


Q. How do you prevent Oracle from giving you informational messages during and after a
SQL statement execution?

The SET options FEEDBACK and VERIFY can be set to OFF.

Q. How do you generate file output from SQL?

By use of the SPOOL command.

Data Modeler:

Q. Describe third normal form?

Expected answer: Something like: In third normal form all attributes in an entity are related to the
primary key and only to the primary key

Q. Is the following statement true or false? Why or why not?

All relational databases must be in third normal form

False. While 3NF is good for logical design most databases, if they have more than just a few
tables, will not perform well using full 3NF. Usually some entities will be denormalized in the
logical to physical transfer process.

Q. What is an ERD?

An ERD is an Entity-Relationship-Diagram. It is used to show the entities and relationships for a
database logical model.

Q. Why are recursive relationships bad? How do you resolve them?

A recursive relationship (one where a table relates to itself) is bad when it is a hard relationship
(i.e. neither side is a may both are must) as this can result in it not being possible to put in a
top or perhaps a bottom of the table (for example in the EMPLOYEE table you couldnt put in the
PRESIDENT of the company because he has no boss, or the junior janitor because he has no
subordinates). These type of relationships are usually resolved by adding a small intersection
entity.

Q. What does a hard one-to-one relationship mean (one where the relationship on both ends
is must)?

This means the two entities should probably be made into one entity.

Q. How should a many-to-many relationship be handled?

By adding an intersection entity table


Q. What is an artificial (derived) primary key? When should an artificial (or derived)
primary key be used?

A derived key comes from a sequence. Usually it is used when a concatenated key becomes too
cumbersome to use as a foreign key.

Q. When should you consider denormalization?

Whenever performance analysis indicates it would be beneficial to do so without compromising
data integrity.

Q. What is a Schema?

Associated with each database user is a schema. A schema is a collection of schema objects.
Schema objects include tables, views, sequences, synonyms, indexes, clusters, database links,
snapshots, procedures, functions, and packages.

Q. What do you mean by table?

Tables are the basic unit of data storage in an Oracle database. Data is stored in rows and
columns.
A row is a collection of column information corresponding to a single record.

Q. Is there an alternative of dropping a column from a table? If yes, what?

Dropping a column in a large table takes a considerable amount of time. A quicker alternative is
to mark a column as unused with the SET UNUSED clause of the ALTER TABLE statement.
This makes the column data unavailable, although the data remains in each row of the table. After
marking a column as unused, you can add another column that has the same name to the table.
The unused column can then be dropped at a later time when you want to reclaim the space
occupied by the column data.

Q. What is a rowid?

The rowid identifies each row piece by its location or address. Once assigned, a given row piece
retains its rowid until the corresponding row is deleted, or exported and imported using the
Export and Import utilities.

Q. What is a view? (KPIT Infotech, Pune)

A view is a tailored presentation of the data contained in one or more tables or other views. A
view takes the output of a query and treats it as a table. Therefore, a view can be thought of as a
stored query or a virtual table.

Unlike a table, a view is not allocated any storage space, nor does a view actually contain data.
Rather, a view is defined by a query that extracts or derives data from the tables that the view
references. These tables are called base tables. Base tables can in turn be actual tables or can be
views themselves (including snapshots). Because a view is based on other objects, a view requires

no storage other than storage for the definition of the view (the stored query) in the data
dictionary.

Q. What are the advantages of having a view?

The advantages of having a view are:
To provide an additional level of table security by restricting access to a predetermined set of
rows or columns of a table
To hide data complexity
To simplify statements for the user
To present the data in a different perspective from that of the base table
To isolate applications from changes in definitions of base tables
To save complex queries
For example, a query can perform extensive calculations with table information.
By saving this query as a view, you can perform the calculations each time the view is
queried.

Q. What is a Materialized View? (Honeywell, KPIT Infotech, Pune)

Materialized views, also called snapshots, are schema objects that can be used to summarize,
precompute, replicate, and distribute data. They are suitable in various computing environments
especially for data warehousing.
From a physical design point of view, Materialized Views resembles tables or partitioned tables
and behave like indexes.

Q. What is the significance of Materialized Views in data warehousing?

In data warehouses, materialized views are used to precompute and store aggregated data such as
sums and averages. Materialized views in these environments are typically referred to as
summaries because they store summarized data. They can also be used to precompute joins with
or without aggregations.

Cost-based optimization can use materialized views to improve query performance by
automatically recognizing when a materialized view can and should be used to satisfy a request.
The optimizer transparently rewrites the request to use the materialized view. Queries are then
directed to the materialized view and not to the underlying detail tables or views.

Q. Differentiate between Views and Materialized Views? (KPIT Infotech, Pune)

Q. What is the major difference between an index and Materialized view?

Unlike indexes, materialized views can be accessed directly using a SELECT statement.

Q. What are the procedures for refreshing Materialized views?

Oracle maintains the data in materialized views by refreshing them after changes are made to
their master tables.
The refresh method can be:
a) incremental (fast refresh) or
b) complete

For materialized views that use the fast refresh method, a materialized view log or direct loader
log keeps a record of changes to the master tables.

Materialized views can be refreshed either on demand or at regular time intervals.
Alternatively, materialized views in the same database as their master tables can be refreshed
whenever a transaction commits its changes to the master tables.

Q. What are materialized view logs?

A materialized view log is a schema object that records changes to a master tables data so that a
materialized view defined on the master table can be refreshed incrementally. Another name for
materialized view log is snapshot log.
Each materialized view log is associated with a single master table. The materialized view log
resides in the same database and schema as its master table.

Q. What is a synonym?

A synonym is an alias for any table, view, snapshot, sequence, procedure, function, or package.
Because a synonym is simply an alias, it requires no storage other than its definition in the data
dictionary.

Q. What are the advantages of having synonyms?

Synonyms are often used for security and convenience.
For example, they can do the following:
1. Mask the name and owner of an object
2. Provide location transparency for remote objects of a distributed database
3. Simplify SQL statements for database users

Q. What are the advantages of having an index? Or What is an index?

The purpose of an index is to provide pointers to the rows in a table that contain a given key
value. In a regular index, this is achieved by storing a list of rowids for each key corresponding to
the rows with that key value. Oracle stores each key value repeatedly with each stored rowid.

Q. What are the different types of indexes supported by Oracle?

The different types of indexes are:
a. B-tree indexes
b. B-tree cluster indexes
c. Hash cluster indexes
d. Reverse key indexes
e. Bitmap indexes

Q. Can we have function based indexes?

Yes, we can create indexes on functions and expressions that involve one or more columns in the
table being indexed. A function-based index precomputes the value of the function or expression
and stores it in the index.
You can create a function-based index as either a B-tree or a bitmap index.


Q. What are the restrictions on function based indexes?

The function used for building the index can be an arithmetic expression or an expression that
contains a PL/SQL function, package function, C callout, or SQL function. The expression cannot
contain any aggregate functions, and it must be DETERMINISTIC. For building an index on a
column containing an object type, the function can be a method of that object, such as a map
method. However, you cannot build a function-based index on a LOB column, REF, or nested
table column, nor can you build a function-based index if the object type contains a LOB, REF, or
nested table.

Q. What are the advantages of having a B-tree index?

The major advantages of having a B-tree index are:
1. B-trees provide excellent retrieval performance for a wide range of queries, including
exact match and range searches.
2. Inserts, updates, and deletes are efficient, maintaining key order for fast retrieval.
3. B-tree performance is good for both small and large tables, and does not degrade as the
size of a table grows.

Q. What is a bitmap index? (KPIT Infotech, Pune)

The purpose of an index is to provide pointers to the rows in a table that contain a given key
value. In a regular index, this is achieved by storing a list of rowids for each key corresponding to
the rows with that key value. Oracle stores each key value repeatedly with each stored rowid. In a
bitmap index, a bitmap for each key value is used instead of a list of rowids.
Each bit in the bitmap corresponds to a possible rowid. If the bit is set, then it means that the row
with the corresponding rowid contains the key value. A mapping function converts the bit
position to an actual rowid, so the bitmap index provides the same functionality as a regular index
even though it uses a different representation internally. If the number of different key values is
small, then bitmap indexes are very space efficient.
Bitmap indexing efficiently merges indexes that correspond to several conditions in a WHERE
clause. Rows that satisfy some, but not all, conditions are filtered out before the table itself is
accessed. This improves response time, often dramatically.

Q. What are the advantages of having bitmap index for data warehousing applications?
(KPIT Infotech, Pune)

Bitmap indexing benefits data warehousing applications which have large amounts of data and ad
hoc queries but a low level of concurrent transactions. For such applications, bitmap indexing
provides:
1. Reduced response time for large classes of ad hoc queries
2. A substantial reduction of space usage compared to other indexing techniques
3. Dramatic performance gains even on very low end hardware
4. Very efficient parallel DML and loads

Q. What is the advantage of bitmap index over B-tree index?

Fully indexing a large table with a traditional B-tree index can be prohibitively expensive in
terms of space since the index can be several times larger than the data in the table. Bitmap
indexes are typically only a fraction of the size of the indexed data in the table.


Q. What is the limitation/drawback of a bitmap index?

Bitmap indexes are not suitable for OLTP applications with large numbers of concurrent
transactions modifying the data. These indexes are primarily intended for decision support in data
warehousing applications where users typically query the data rather than update it.

Bitmap indexes are not suitable for high-cardinality data.

Q. How do you choose between B-tree index and bitmap index?

The advantages of using bitmap indexes are greatest for low cardinality columns: that is, columns
in which the number of distinct values is small compared to the number of rows in the table. If the
values in a column are repeated more than a hundred times, then the column is a candidate for a
bitmap index. Even columns with a lower number of repetitions and thus higher cardinality, can
be candidates if they tend to be involved in complex conditions in the WHERE clauses of queries.

For example, on a table with one million rows, a column with 10,000 distinct values is a
candidate for a bitmap index. A bitmap index on this column can out-perform a B-tree index,
particularly when this column is often queried in conjunction with other columns.

B-tree indexes are most effective for high-cardinality data: that is, data with many possible
values, such as CUSTOMER_NAME or PHONE_NUMBER. A regular Btree index can be
several times larger than the indexed data. Used appropriately, bitmap indexes can be
significantly smaller than a corresponding B-tree index.

Q. What are clusters?

Clusters are an optional method of storing table data. A cluster is a group of tables that share the
same data blocks because they share common columns and are often used together.
For example, the EMP and DEPT table share the DEPTNO column. When you cluster the EMP
and DEPT tables, Oracle physically stores all rows for each department from both the EMP and
DEPT tables in the same data blocks.

Q. What is partitioning? (KPIT Infotech, Pune)

Partitioning addresses the key problem of supporting very large tables and indexes by allowing
you to decompose them into smaller and more manageable pieces called partitions. Once
partitions are defined, SQL statements can access and manipulate the partitions rather than entire
tables or indexes. Partitions are especially useful in data warehouse applications, which
commonly store and analyze large amounts of historical data.

Q. What are the different partitioning methods?

Two primary methods of partitioning are available:
1. range partitioning, which partitions the data in a table or index according to a
range of values, and
2. hash partitioning, which partitions the data according to a hash function.

Another method, composite partitioning, partitions the data by range and further subdivides the
data into sub partitions using a hash function.


Q. What is the necessity to have table partitions?

The need to partition large tables is driven by:
Data Warehouse and Business Intelligence demands for ad hoc analysis on great
quantities of historical data
Cheaper disk storage
Application performance failure due to use of traditional techniques

Q. What are the advantages of storing each partition in a separate tablespace?

The major advantages are:
1. You can contain the impact of data corruption.
2. You can back up and recover each partition or subpartition independently.
3. You can map partitions or subpartitions to disk drives to balance the I/O load.

Q. What are the advantages of partitioning?

Partitioning is useful for:
1. Very Large Databases (VLDBs)
2. Reducing Downtime for Scheduled Maintenance
3. Reducing Downtime Due to Data Failures
4. DSS Performance
5. I/O Performance
6. Disk Striping: Performance versus Availability
7. Partition Transparency

Q. What is Range Partitioning? (KPIT Infotech, Pune)

Range partitioning maps rows to partitions based on ranges of column values. Range partitioning
is defined by the partitioning specification for a table or index:

PARTITION BY RANGE ( column_list )

and by the partitioning specifications for each individual partition:

VALUES LESS THAN ( value_list )

Q. What is Hash Partitioning?

Hash partitioning uses a hash function on the partitioning columns to stripe data into partitions.
Hash partitioning allows data that does not lend itself to range partitioning to be easily partitioned
for performance reasons such as parallel DML, partition pruning, and partition-wise joins.
Q. What are the advantages of Hash partitioning over Range Partitioning?

Hash partitioning is a better choice than range partitioning when:
a) You do not know beforehand how much data will map into a given range
b) Sizes of range partitions would differ quite substantially
c) Partition pruning and partition-wise joins on a partitioning key are important

Q. What are the rules for partitioning a table?


A table can be partitioned if:
It is not part of a cluster
It does not contain LONG or LONG RAW datatypes

Q. What is a global partitioned index?

In a global partitioned index, the keys in a particular index partition may refer to rows stored in
more than one underlying table partition or subpartition. A global index can only be range-
partitioned, but it can be defined on any type of partitioned table.

Q. What is a local index?

In a local index, all keys in a particular index partition refer only to rows stored in a single
underlying table partition. A local index is created by specifying the LOCAL attribute.

Q. What are CLOB and NCLOB datatypes? (Mascot)

The CLOB and NCLOB datatypes store up to four gigabytes of character data in the database.
CLOBs store single-byte character set data and NCLOBs store fixed-width and varying-width
multibyte national character set data (NCHAR data).

Q. What is PL/SQL?

PL/SQL is Oracles procedural language extension to SQL. PL/SQL enables you to mix SQL
statements with procedural constructs. With PL/SQL, you can define and execute PL/SQL
program units such as procedures, functions, and packages.

PL/SQL program units generally are categorized as anonymous blocks and stored procedures.

Q. What is an anonymous block?

An anonymous block is a PL/SQL block that appears within your application and it is not named
or stored in the database.

Q. What is a Stored Procedure?

A stored procedure is a PL/SQL block that Oracle stores in the database and can be called by
name from an application. When you create a stored procedure, Oracle parses the procedure and
stores its parsed representation in the database.

Q. What is a distributed transaction?

A distributed transaction is a transaction that includes one or more statements that update data on
two or more distinct nodes of a distributed database.

Q. What are packages? (KPIT Infotech, Pune)

A package is a group of related procedures and functions, together with the cursors and variables
they use, stored together in the database for continued use as a unit.


While packages allow the administrator or application developer the ability to organize such
routines, they also offer increased functionality (for example, global package variables can be
declared and used by any procedure in the package) and performance (for example, all objects of
the package are parsed, compiled, and loaded into memory once).

Q. What are procedures and functions? (KPIT Infotech, Pune)

A procedure or function is a schema object that consists of a set of SQL statements and other
PL/SQL constructs, grouped together, stored in the database, and executed as a unit to solve a
specific problem or perform a set of related tasks. Procedures and functions permit the caller to
provide parameters that can be input only, output only, or input and output values.

Q. What is the difference between Procedure and Function?

Procedures and functions are identical except that functions always return a single value to the
caller, while procedures do not return values to the caller.

Q. What is a DML and what do they do?

Data manipulation language (DML) statements query or manipulate data in existing schema
objects. They enable you to:
1. Retrieve data from one or more tables or views (SELECT)
2. Add new rows of data into a table or view (INSERT)
3. Change column values in existing rows of a table or view (UPDATE)
4. Remove rows from tables or views (DELETE)
5. See the execution plan for a SQL statement (EXPLAIN PLAN)
6. Lock a table or view, temporarily limiting other users access (LOCK TABLE)

Q. What is a DDL and what do they do?

Data definition language (DDL) statements define, alter the structure of, and drop schema objects.
DDL statements enable you to:
1. Create, alter, and drop schema objects and other database structures, including the
database itself and database users (CREATE, ALTER, DROP)
2. Change the names of schema objects (RENAME)
3. Delete all the data in schema objects without removing the objects structure
(TRUNCATE)
4. Gather statistics about schema objects, validate object structure, and list chained rows
within objects (ANALYZE)
5. Grant and revoke privileges and roles (GRANT, REVOKE)
6. Turn auditing options on and off (AUDIT, NOAUDIT)
7. Add a comment to the data dictionary (COMMENT)

Q. What are shared sqls?

Oracle automatically notices when applications send identical SQL statements to the database.
The SQL area used to process the first occurrence of the statement is sharedthat is, used for
processing subsequent occurrences of that same statement. Therefore, only one shared SQL area
exists for a unique statement. Since shared SQL areas are shared memory areas, any Oracle

process can use a shared SQL area. The sharing of SQL areas reduces memory usage on the
database server, thereby increasing system throughput.

Q. What are triggers?

Oracle allows to define procedures called triggers that execute implicitly when an INSERT,
UPDATE, or DELETE statement is issued against the associated table or, in some cases, against a
view, or when database system actions occur. These procedures can be written in PL/SQL or Java
and stored in the database, or they can be written as C callouts.

Q. What is Cost-based Optimization?

Using the cost-based approach, the optimizer determines which execution plan is most efficient
by considering available access paths and factoring in information based on statistics for the
schema objects (tables or indexes) accessed by the SQL statement.

Q. What is Rule-Based Optimization?

Using the rule-based approach, the optimizer chooses an execution plan based on the access paths
available and the ranks of these access paths.

Q. What is meant by degree of parallelism?

The number of parallel execution servers associated with a single operation is known as the
degree of parallelism.

Q. What is meant by data consistency?

Data consistency means that each user sees a consistent view of the data, including visible
changes made by the users own transactions and transactions of other users.

Q. What are Locks?
Locks are mechanisms that prevent destructive interaction between transactions accessing the
same resourceeither user objects such as tables and rows or system objects not visible to users,
such as shared data structures in memory and data dictionary rows.

Q. What are the locking modes used in Oracle?
Oracle uses two modes of locking in a multiuser database:

Exclusive lock mode: Prevents the associates resource from being shared. This lock mode is
obtained to modify data. The first transaction to lock a resource exclusively is the only transaction
that can alter the resource until the exclusive lock is released.

Share lock mode: Allows the associated resource to be shared, depending on the operations
involved. Multiple users reading data can share the data, holding share locks to prevent
concurrent access by a writer (who needs an exclusive lock). Several transactions can acquire
share locks on the same resource.

Q. What is a deadlock?
A deadlock can occur when two or more users are waiting for data locked by each other.


Q. How can you avoid deadlocks?
Multitable deadlocks can usually be avoided if transactions accessing the same tables lock those
tables in the same order, either through implicit or explicit locks.
For example, all application developers might follow the rule that when both a master and detail
table are updated, the master table is locked first and then the detail table. If such rules are
properly designed and then followed in all applications, deadlocks are very unlikely to occur.

Q. What is redo log?
The redo log, present for every Oracle database, records all changes made in an Oracle database.
The redo log of a database consists of at least two redo log files that are separate from the
datafiles (which actually store a databases data). As part of database recovery from an instance
or media failure, Oracle applies the appropriate changes in the databases redo log to the datafiles,
which updates database data to the instant that the failure occurred.
A databases redo log can consist of two parts: the online redo log and the archived redo log.

Q. What are Rollback Segments?
Rollback segments are used for a number of functions in the operation of an Oracle database. In
general, the rollback segments of a database store the old values of data changed by ongoing
transactions for uncommitted transactions.
Among other things, the information in a rollback segment is used during database recovery to
undo any uncommitted changes applied from the redo log to the datafiles. Therefore, if database
recovery is necessary, then the data is in a consistent state after the rollback segments are used to
remove all uncommitted data from the datafiles.

Q. What is SGA?
The System Global Area (SGA) is a shared memory region that contains data and control
information for one Oracle instance. An SGA and the Oracle background processes constitute an
Oracle instance.
Oracle allocates the system global area when an instance starts and deallocates it when the
instance shuts down. Each instance has its own system global area.

Users currently connected to an Oracle server share the data in the system global area. For
optimal performance, the entire system global area should be as large as possible (while still
fitting in real memory) to store as much data in memory as possible and minimize disk I/O.

The information stored within the system global area is divided into several types of memory
structures, including the database buffers, redo log buffer, and the shared pool. These areas have
fixed sizes and are created during instance startup.

Q. What is PCTFREE?
The PCTFREE parameter sets the minimum percentage of a data block to be reserved as free
space for possible updates to rows that already exist in that block.

Q. What is PCTUSED?
The PCTUSED parameter sets the minimum percentage of a block that can be used for row data
plus overhead before new rows will be added to the block. After a data block is filled to the limit
determined by PCTFREE, Oracle considers the block unavailable for the insertion of new rows
until the percentage of that block falls below the parameter PCTUSED. Until this value is
achieved, Oracle uses the free space of the data block only for updates to rows already contained
in the data block.


Notes:
Nulls are stored in the database if they fall between columns with data values. In these cases they
require one byte to store the length of the column (zero).
Trailing nulls in a row require no storage because a new row header signals that the remaining
columns in the previous row are null. For example, if the last three columns of a table are null, no
information is stored for those columns. In tables with many columns, the columns more likely to
contain nulls should be defined last to conserve disk space.

Two rows can both contain all nulls without violating a unique index.

NULL values in indexes are considered to be distinct except when all the non-NULL values in
two or more rows of an index are identical, in which case the rows are considered to be identical.
Therefore, UNIQUE indexes prevent rows containing NULL values from being treated as
identical.

Bitmap indexes include rows that have NULL values, unlike most other types of indexes.
Indexing of nulls can be useful for some types of SQL statements, such as queries with the
aggregate function COUNT.

Bitmap indexes on partitioned tables must be local indexes.

PL/SQL is Oracles procedural language extension to SQL. PL/SQL combines the
ease and flexibility of SQL with the procedural functionality of a structured
programming language, such as IF ... THEN, WHILE, and LOOP.
When designing a database application, a developer should consider the
advantages of using stored PL/SQL:
Because PL/SQL code can be stored centrally in a database, network traffic
between applications and the database is reduced, so application and system
performance increases.
Data access can be controlled by stored PL/SQL code. In this case, the users of
PL/SQL can access data only as intended by the application developer (unless
another access route is granted).
PL/SQL blocks can be sent by an application to a database, executing complex
operations without excessive network traffic.
Even when PL/SQL is not stored in the database, applications can send blocks of
PL/SQL to the database rather than individual SQL statements, thereby again
reducing network traffic.
The following sections describe the different program units that can be defined and
stored centrally in a database.

Committing and Rolling Back Transactions
The changes made by the SQL statements that constitute a transaction can be either committed or
rolled back. After a transaction is committed or rolled back, the next transaction begins with the
next SQL statement.
Committing a transaction makes permanent the changes resulting from all SQL statements in the
transaction. The changes made by the SQL statements of a transaction become visible to other
user sessions transactions that start only after the transaction is committed.

Rolling back a transaction retracts any of the changes resulting from the SQL statements in the
transaction. After a transaction is rolled back, the affected data is left unchanged as if the SQL
statements in the transaction were never executed.

Introduction to the Data Dictionary
One of the most important parts of an Oracle database is its data dictionary, which is
a read-only set of tables that provides information about its associated database. A
data dictionary contains:
The definitions of all schema objects in the database (tables, views, indexes,
clusters, synonyms, sequences, procedures, functions, packages, triggers,
and so on)
How much space has been allocated for, and is currently used by, the
schema objects
Default values for columns
Integrity constraint information
The names of Oracle users
Privileges and roles each user has been granted
Auditing information, such as who has accessed or updated various
schema objects
Other general database information
The data dictionary is structured in tables and views, just like other database data.
All the data dictionary tables and views for a given database are stored in that
databases SYSTEM tablespace.
Not only is the data dictionary central to every Oracle database, it is an important
tool for all users, from end users to application designers and database
administrators. To access the data dictionary, you use SQL statements. Because the
data dictionary is read-only, you can issue only queries (SELECT statements)
against the tables and views of the data dictionary.

Q. What is the function of DUMMY table?

The table named DUAL is a small table in the data dictionary that Oracle and user written
programs can reference to guarantee a known result. This table has one column called DUMMY
and one row containing the value "X".

Databases, tablespaces, and datafiels are closely related, but they have important differences:

Databases and tablespaces: An Oracle database consists of one or more logical storage units
called tablespaces, which collectively store all of the databases data.

Tablespaces and datafiles: Each table in an Oracle database consists of one or more files called
datafiles, which are physical structures that conform with the operating system in which Oracle is
running.

databases and datafiles:
A databases data is collectively stored in the datafiles that
constitute each tablespace of the database. For example, the
simplest Oracle database would have one tablespace and one
datafile. Another database might have three tablespaces, each
consisting of two datafiles (for a total of six datafiles).


Nulls
A null is the absence of a value in a column of a row. Nulls indicate missing,
unknown, or inapplicable data. A null should not be used to imply any other value,
such as zero. A column allows nulls unless a NOT NULL or PRIMARY KEY
integrity constraint has been defined for the column, in which case no row can be
inserted without a value for that column.
Nulls are stored in the database if they fall between columns with data values. In
these cases they require one byte to store the length of the column (zero).
Trailing nulls in a row require no storage because a new row header signals that the
remaining columns in the previous row are null. For example, if the last three
columns of a table are null, no information is stored for those columns. In tables
with many columns, the columns more likely to contain nulls should be defined last
to conserve disk space.

Most comparisons between nulls and other values are by definition neither true nor
false, but unknown. To identify nulls in SQL, use the IS NULL predicate. Use the
SQL function NVL to convert nulls to non-null values.
Nulls are not indexed, except when the cluster key column value is null or the index
is a bitmap index.

What are different types of locks?

Q. Master table and Child table performances and comparisons in Oracle?

Q. What are the different types of Cursors? Explain. (Honeywell)

Q. What are the different types of Deletes?

Q. Can a View be updated?

Interview Questions from Honeywell

1. What is pragma?
2. Can you write commit in triggers?
3. Can you call user defined functions in select statements
4. Can you call insert/update/delete in select statements. If yes how? If no what is the other
way?
5. After update how do you know, how many records got updated
6. Select statement does not retrieve any records. What exception is raised?

Interview Questions from Shreesoft

1. How many columns can a PLSQL table have

Interview Questions from mascot
1. What is Load balancing & what u have used to do this? (SQL Loader )
2. What r Routers?



PL/SQL

1. What are different types of joins?
2. Difference between Packages and Procedures
3. Difference between Function and Procedures
4. How many types of triggers are there? When do you use Triggers
5. Can you write DDL statements in Triggers? (No)
6. What is Hint?
7. How do you tune a SQL query?

Interview Questions from KPIT Infotech, Pune

1. Package body
2. What is molar query?
3. What is row level security

General:
Why ORACLE is the best database for Datawarehousing
For data loading in Oracle, what are conventional loading and direct-path loading ?
7. If you use oracle SQL*Loader, how do you transform data with it during loading ? Example.
Three ways SQL*Loader could doad data, what are those three types ?
What are the contents of "bad files" and "discard files" when using SQL*Loader ?
How do you use commit frequencies ? how does it affect loading performance ?
What are the other factors of the database on which the loading performance depend ?
* WHAT IS PARALLELISM ?
* WHAT IS A PARALLEL QUERY ?
* WHAT ARE DIFFERENT WAYS OF LOADING DATA TO DATAWAREHOUSE USING
ORACLE?
* WHAT IS TABLE PARTITIONING? HOW IT IS USEFUL TO WAREHOUSE DATABASE?
* WHAT ARE DIFFERENT TYPES OF PARTITIONING IN ORACLE?
* WHAT IS A MATERIALIZED VIEW? HOW IT IS DIFFERENT FROM NORMAL AND
INLINE VIEWS?
* WHAT IS INDEXING? WHAT ARE DIFFERENT TYPES OF INDEXES SUPPORTED BY
ORACLE?
* WHAT ARE DIFFERENT STORAGE OPTIONS SUPPORTED BY ORACLE?
* WHAT IS QUERY OPTIMIZER? WHAT ARE DIFFERENT TYPES OF OPTIMIZERS
SUPPORTED BY ORACLE?
* EXPLAIN ROLLUP,CUBE,RANK AND DENSE_RANK FUNCTIONS OF ORACLE 8i.

The advantages of using bitmap indexes are greatest for low cardinality columns: that is, columns
in which the number of distinct values is small compared to the number of rows in the table. A
gender column, which only has two distinct values (male and female), is ideal for a bitmap index.
However, data warehouse administrators will also choose to build bitmap indexes on columns
with much higher cardinalities.


Local vs global: A B-tree index on a partitioned table can be local or global. Global indexes must
be
fully rebuilt after a direct load, which can be very costly when loading a relatively
small number of rows into a large table. For this reason, it is strongly recommended
that indexes on partitioned tables should be defined as local indexes unless there is
a well-justified performance requirement for a global index. Bitmap indexes on
partitioned tables are always local.

Why Constraints are Useful in a Data Warehouse
Constraints provide a mechanism for ensuring that data conforms to guidelines
specified by the database administrator. The most common types of constraints
include unique constraints (ensuring that a given column is unique), not-null
constraints, and foreign-key constraints (which ensure that two keys share a
primary key-foreign key relationship).

Materialized Views for Data Warehouses
In data warehouses, materialized views can be used to precompute and store
aggregated data such as the sum of sales. Materialized views in these environments
are typically referred to as summaries, because they store summarized data. They
can also be used to precompute joins with or without aggregations. A materialized
view eliminates the overhead associated with expensive joins or aggregations for a
large or important class of queries.

The Need for Materialized Views
Materialized views are used in data warehouses to increase the speed of queries on
very large databases. Queries to large databases often involve joins between tables
or aggregations such as SUM, or both. These operations are very expensive in terms
of time and processing power.

How does MVs work?
The query optimizer can use materialized views by
automatically recognizing when an existing materialized view can and should be
used to satisfy a request. It then transparently rewrites the request to use the
materialized view. Queries are then directed to the materialized view and not to the
underlying detail tables. In general, rewriting queries to use materialized views
rather than detail tables results in a significant performance gain.

If a materialized view is to be used by query rewrite, it must be stored in the same
database as its fact or detail tables. A materialized view can be partitioned, and you
can define a materialized view on a partitioned table and one or more indexes on
the materialized view.

The types of materialized views are:
Materialized Views with Joins and Aggregates
Single-Table Aggregate Materialized Views
Materialized Views Containing Only Joins

Some Useful system tables:

user_tab_partitions
user_tab_columns

INFORMATICA TRANSFORMATIONS

Aggregator
Expression
External Procedure
Advanced External Procedure
Filter
Joiner
Lookup
Normalizer
Rank
Router
Sequence Generator
Stored Procedure
Source Qualifier
Update Strategy
XML source qualifier

Expression Transformation

- You can use ET to calculate values in a single row before you write to the target
- You can use ET, to perform any non-aggregate calculation
- To perform calculations involving multiple rows, such as sums of averages, use the
Aggregator. Unlike ET the Aggregator Transformation allow you to group and sort
data

Calculation

To use the Expression Transformation to calculate values for a single row, you must include
the following ports.

- Input port for each value used in the calculation
- Output port for the expression

NOTE

You can enter multiple expressions in a single ET. As long as you enter only one expression
for each port, you can create any number of output ports in the Expression Transformation.
In this way, you can use one expression transformation rather than creating separate
transformations for each calculation that requires the same set of data.

Sequence Generator Transformation

- Create keys

- Replace missing values

- This contains two output ports that you can connect to one or more transformations.
The server generates a value each time a row enters a connected transformation, even
if that value is not used.

- There are two parameters NEXTVAL, CURRVAL

- The SGT can be reusable

- You can not edit any default ports (NEXTVAL, CURRVAL)

SGT Properties

- Start value
- Increment By
- End value
- Current value
- Cycle (If selected, server cycles through sequence range. Otherwise,
Stops with configured end value)
- Reset
- No of cached values


NOTE

- Reset is disabled for Reusable SGT
- Unlike other transformations, you cannot override SGT properties at session level.
This protects the integrity of sequence values generated.

Aggregator Transformation

Difference between Aggregator and Expression Transformation

We can use Aggregator to perform calculations on groups. Where as the Expression
transformation permits you to calculations on row-by-row basis only.

The server performs aggregate calculations as it reads and stores necessary data group and row
data in an aggregator cache.

When Incremental aggregation occurs, the server passes new source data through the mapping
and uses historical cache data to perform new calculation incrementally.

Components

- Aggregate Expression
- Group by port
- Aggregate cache

When a session is being run using aggregator transformation, the server creates Index and
data caches in memory to process the transformation. If the server requires more space, it
stores overflow values in cache files.

NOTE

The performance of aggregator transformation can be improved by using Sorted Input option.
When this is selected, the server assumes all data is sorted by group.

Incremental Aggregation

- Using this, you apply captured changes in the source to aggregate calculation in a
session. If the source changes only incrementally and you can capture changes, you
can configure the session to process only those changes

- This allows the sever to update the target incrementally, rather than forcing it to
process the entire source and recalculate the same calculations each time you run the
session.

Steps:

- The first time you run a session with incremental aggregation enabled, the server
process the entire source.
- At the end of the session, the server stores aggregate data from that session ran in two
files, the index file and data file. The server creates the file in local directory.

- The second time you run the session, use only changes in the source as source data
for the session. The server then performs the following actions:

(1) For each input record, the session checks the historical information in the index
file for a corresponding group, then:

If it finds a corresponding group

The server performs the aggregate operation incrementally, using the
aggregate data for that group, and saves the incremental changes.

Else

Server create a new group and saves the record data

(2) When writing to the target, the server applies the changes to the existing target.

o Updates modified aggregate groups in the target
o Inserts new aggregate data
o Delete removed aggregate data
o Ignores unchanged aggregate data
o Saves modified aggregate data in Index/Data files to be used as historical data the
next time you run the session.

Each Subsequent time you run the session with incremental aggregation, you use only the
incremental source changes in the session.

If the source changes significantly, and you want the server to continue saving the aggregate
data for the future incremental changes, configure the server to overwrite existing aggregate
data with new aggregate data.

Use Incremental Aggregator Transformation Only IF:

- Mapping includes an aggregate function
- Source changes only incrementally
- You can capture incremental changes. You might do this by filtering source data by
timestamp.


External Procedure Transformation

- When Informaticas transformation does not provide the exact functionality we need,
we can develop complex functions with in a dynamic link library or Unix shared
library.
- To obtain this kind of extensibility, we can use Transformation Exchange (TX)
dynamic invocation interface built into Power mart/Power Center.
- Using TX, you can create an External Procedure Transformation and bind it to an
External Procedure that you have developed.
- Two types of External Procedures are available

COM External Procedure (Only for WIN NT/2000)
Informatica External Procedure ( available for WINNT, Solaris, HPUX etc)

Components of TX:

(a) External Procedure
This exists separately from Informatica Server. It consists of C++, VB code
written by developer. The code is compiled and linked to a DLL or Shared memory,
which is loaded by the Informatica Server at runtime.

(b) External Procedure Transformation
This is created in Designer and it is an object that resides in the Informatica
Repository. This serves in many ways

o This contains metadata describing External procedure
o This allows an External procedure to be references in a mappingby adding an
instance of an External Procedure transformation.

All External Procedure Transformations must be defined as reusable transformations.
Therefore you cannot create External Procedure transformation in designer. You can create
only with in the transformation developer of designer and add instances of the transformation
to mapping.

Difference Between Advanced External Procedure And External Procedure Transformation

Advanced External Procedure Transformation


- The Input and Output functions occur separately
- The output function is a separate callback function provided by Informatica that can
be called from Advanced External Procedure Library.
- The Output callback function is used to pass all the output port values from the
Advanced External Procedure library to the informatica Server.
- Multiple Outputs (Multiple row Input and Multiple rows output)
- Supports Informatica procedure only
- Active Transformation
- Connected only

External Procedure Transformation

- In the External Procedure Transformation, an External Procedure function does both
input and output, and its parameters consists of all the ports of the transformation.
- Single return value ( One row input and one row output )
- Supports COM and Informatica Procedures
- Passive transformation
- Connected or Unconnected

By Default, The Advanced External Procedure Transformation is an active transformation.
However, we can configure this to be a passive by clearing IS ACTIVE option on the
properties tab.

LOOKUP Transformation

- We are using this for lookup data in a related table, view or synonym
- You can use multiple lookup transformations in a mapping
- The server queries the Lookup table based in the Lookup ports in the transformation.
It compares lookup port values to lookup table column values, bases on lookup
condition.

Types:

(a) Connected (or) unconnected.
(b) Cached (or) uncached .

If you cache the lkp table , you can choose to use a dynamic or static
cache . by default ,the LKP cache remains static and doesnt change during the
session .with dynamic cache ,the server inserts rows into the cache during the
session ,information recommends that you cache the target table as Lookup .this
enables you to lookup values in the target and insert them if they dont exist..


You can configure a connected LKP to receive input directly from the
mapping pipeline .(or) you can configure an unconnected LKP to receive input
from the result of an expression in another transformation.

Differences Between Connected and Unconnected Lookup:
connected

o Receives input values directly from the pipeline.
o uses Dynamic or static cache

o Returns multiple values
o supports user defined default values.

Unconnected

o Recieves input values from the result of LKP expression in another
transformation
o Use static cache only.
o Returns only one value.
o Doesnt supports user-defined default values.

NOTES

o Common use of unconnected LKP is to update slowly changing dimension
tables.
o Lookup components are
(a) Lookup table.
(b) Ports
(c) Properties
(d) condition.

Lookup tables: This can be a single table, or you can join multiple tables in the
same Database using a Lookup query override.

You can improve Lookup initialization time by adding an index to the Lookup
table.

Lookup ports: There are 3 ports in connected LKP transformation
(I/P,O/P,LKP) and 4 ports unconnected LKP(I/P,O/P,LKP and return ports).

o if youve certain that a mapping doesnt use a Lookup ,port ,you delete it from
the transformation. This reduces the amount of memory.


Lookup Properties: you can configure properties such as SQL override .for the
Lookup,the Lookup table name ,and tracing level for the transformation.

Lookup condition: you can enter the conditions ,you want the server to use to
determine whether input data qualifies values in the Lookup or cache .

when you configure a LKP condition for the transformation, you compare
transformation input values with values in the Lookup table or cache ,which
represented by LKP ports .when you run session ,the server queries the LKP table or
cache for all incoming values based on the condition.

NOTE

- If you configure a LKP to use static cache ,you can following operators
=,>,<,>=,<=,!=.

but if you use an dynamic cache only =can be used .

- when you dont configure the LKP for caching ,the server queries the LKP table
for each input row .the result will be same, regardless of using cache

However using a Lookup cache can increase session performance, by Lookup table,
when the source table is large.

Performance tips:

- Add an index to the columns used in a Lookup condition.
- Place conditions with an equality opertor (=) first.
- Cache small Lookup tables .
- Dont use an ORDER BY clause in SQL override.
- Call unconnected Lookups with :LKP reference qualifier.

Normalizer Transformation

- Normalization is the process of organizing data.


- In database terms ,this includes creating normalized tables and establishing
relationships between those tables. According to rules designed to both protect
the data, and make the database more flexible by eliminating redundancy and
inconsistent dependencies.

- NT normalizes records from COBOL and relational sources ,allowing you to
organizet the data according to you own needs.

- A NT can appear anywhere is a data flow when you normalize a relational
source.

- Use a normalizer transformation, instead of source qualifier transformation when
you normalize a COBOL source.

- The occurs statement is a COBOL file nests multiple records of information in a
single record.

- Using the NT ,you breakout repeated data with in a record is to separate record
into separate records.

For each new record it creates, the NT generates an unique identifier. You can use
this key value to join the normalized records.

Stored Procedure Transformation

- DBA creates stored procedures to automate time consuming tasks that are too
complicated for standard SQL statements.
- A stored procedure is a precompiled collection of transact SQL statements and
optional flow control statements, similar to an executable script.
- Stored procedures are stored and run with in the database. You can run a stored
procedure with EXECUTE SQL statement in a database client tool, just as SQL
statements. But unlike standard procedures allow user defined variables, conditional
statements and programming features.

Usages of Stored Procedure

- Drop and recreate indexes.
- Check the status of target database before moving records into it.
- Determine database space.
- Perform a specialized calculation.

NOTE


- The Stored Procedure must exist in the database before creating a Stored Procedure
Transformation, and the Stored procedure can exist in a source, target or any database
with a valid connection to the server.

TYPES

- Connected Stored Procedure Transformation (Connected directly to the mapping)
- Unconnected Stored Procedure Transformation (Not connected directly to the flow of
the mapping. Can be called from an Expression Transformation or other
transformations)

Running a Stored Procedure

The options for running a Stored Procedure Transformation:
- Normal
- Pre load of the source
- Post load of the source
- Pre load of the target
- Post load of the target

You can run several stored procedure transformation in different modes in the same mapping.

Stored Procedure Transformations are created as normal type by default, which means that
they run during the mapping, not before or after the session. They are also not created as
reusable transformations.

If you want to: Use below mode
Run a SP before/after the session Unconnected
Run a SP once during a session Unconnected
Run a SP for each row in data flow Unconnected/Connected
Pass parameters to SP and receive a single return value Connected

A normal connected SP will have an I/P and O/P port and return port also an output port,
which is marked as R.

Error Handling


- This can be configured in server manager (Log & Error handling)
- By default, the server stops the session

Rank Transformation

- This allows you to select only the top or bottom rank of data. You can get returned
the largest or smallest numeric value in a port or group.
- You can also use Rank Transformation to return the strings at the top or the bottom
of a session sort order. During the session, the server caches input data until it can
perform the rank calculations.
- Rank Transformation differs from MAX and MIN functions, where they allows to
select a group of top/bottom values, not just one value.
- As an active transformation, Rank transformation might change the number of rows
passed through it.

Rank Transformation Properties

- Cache directory
- Top or Bottom rank
- Input/Output ports that contain values used to determine the rank.

Different ports in Rank Transformation

I - Input
O - Output
V - Variable
R - Rank

Rank Index

The designer automatically creates a RANKINDEX port for each rank transformation. The
server uses this Index port to store the ranking position for each row in a group.

The RANKINDEX is an output port only. You can pass the RANKINDEX to another
transformation in the mapping or directly to a target.

Filter Transformation

- As an active transformation, the Filter Transformation may change the no of rows
passed through it.
- A filter condition returns TRUE/FALSE for each row that passes through the
transformation, depending on whether a row meets the specified condition.

- Only rows that return TRUE pass through this filter and discarded rows do not appear
in the session log/reject files.
- To maximize the session performance, include the Filter Transformation as close to
the source in the mapping as possible.
- The filter transformation does not allow setting output default values.
- To filter out row with NULL values, use the ISNULL and IS_SPACES functions.

Joiner Transformation

Source Qualifier: can join data origination from a common source database

Joiner Transformation: Join tow related heterogeneous sources residing in different locations
or File systems.

To join more than two sources, we can add additional joiner transformations.

SESSION LOGS

Information that reside in a session log:

- Allocation of system shared memory
- Execution of Pre-session commands/ Post-session commands
- Session Initialization
- Creation of SQL commands for reader/writer threads
- Start/End timings for target loading
- Error encountered during session
- Load summary of Reader/Writer/ DTM statistics

Other Information

- By default, the server generates log files based on the server code page.

Thread Identifier

Ex: CMN_1039

Reader and Writer thread codes have 3 digit and Transformation codes have 4 digits.


The number following a thread name indicate the following:
(a) Target load order group number
(b) Source pipeline number
(c) Partition number
(d) Aggregate/ Rank boundary number

Log File Codes

Error Codes Description

BR - Related to reader process, including ERP, relational and flat file.
CMN - Related to database, memory allocation
DBGR - Related to debugger
EP- External Procedure
LM - Load Manager
TM - DTM
REP - Repository
WRT - Writer

Load Summary

(a) Inserted
(b) Updated
(c) Deleted
(d) Rejected

Statistics details

(a) Requested rows shows the no of rows the writer actually received for the specified
operation
(b) Applied rows shows the number of rows the writer successfully applied to the target
(Without Error)
(c) Rejected rows show the no of rows the writer could not apply to the target
(d) Affected rows shows the no of rows affected by the specified operation

Detailed transformation statistics


The server reports the following details for each transformation in the mapping

(a) Name of Transformation
(b) No of I/P rows and name of the Input source
(c) No of O/P rows and name of the output target
(d) No of rows dropped

Tracing Levels

Normal - Initialization and status information, Errors encountered, Transformation
errors, rows skipped, summarize session details (Not at the level of
individual rows)

Terse - Initialization information as well as error messages, and notification of
rejected data

Verbose Init - Addition to normal tracing, Names of Index, Data files used and detailed
transformation statistics.
Verbose Data - Addition to Verbose Init, Each row that passes in to mapping detailed
transformation statistics.

NOTE

When you enter tracing level in the session property sheet, you override tracing levels
configured for transformations in the mapping.

MULTIPLE SERVERS

With Power Center, we can register and run multiple servers against a local or global
repository. Hence you can distribute the repository session load across available servers to
improve overall performance. (You can use only one Power Mart server in a local repository)

Issues in Server Organization

- Moving target database into the appropriate server machine may improve efficiency
- All Sessions/Batches using data from other sessions/batches need to use the same
server and be incorporated into the same batch.
- Server with different speed/sizes can be used for handling most complicated sessions.


Session/Batch Behavior

- By default, every session/batch run on its associated Informatica server. That is
selected in property sheet.
- In batches, that contain sessions with various servers, the property goes to the
servers, thats of outer most batch.

Session Failures and Recovering Sessions

Two types of errors occurs in the server
- Non-Fatal
- Fatal

(a) Non-Fatal Errors

It is an error that does not force the session to stop on its first occurrence. Establish the
error threshold in the session property sheet with the stop on option. When you enable
this option, the server counts Non-Fatal errors that occur in the reader, writer and
transformations.

Reader errors can include alignment errors while running a session in Unicode mode.

Writer errors can include key constraint violations, loading NULL into the NOT-NULL
field and database errors.

Transformation errors can include conversion errors and any condition set up as an
ERROR,. Such as NULL Input.

(b) Fatal Errors

This occurs when the server can not access the source, target or repository. This can
include loss of connection or target database errors, such as lack of database space to load
data.

If the session uses normalizer (or) sequence generator transformations, the server can not
update the sequence values in the repository, and a fatal error occurs.

Others
Usages of ABORT function in mapping logic, to abort a session when the server
encounters a transformation error.


Stopping the server using pmcmd (or) Server Manager

Performing Recovery

- When the server starts a recovery session, it reads the OPB_SRVR_RECOVERY
table and notes the rowid of the last row commited to the target database. The server
then reads all sources again and starts processing from the next rowid.
- By default, perform recovery is disabled in setup. Hence it wont make entries in
OPB_SRVR_RECOVERY table.
- The recovery session moves through the states of normal session schedule, waiting to
run, Initializing, running, completed and failed. If the initial recovery fails, you can
run recovery as many times.
- The normal reject loading process can also be done in session recovery process.
- The performance of recovery might be low, if
o Mapping contain mapping variables
o Commit interval is high

Un recoverable Sessions

Under certain circumstances, when a session does not complete, you need to truncate the
target and run the session from the beginning.

Commit Intervals

A commit interval is the interval at which the server commits data to relational targets during
a session.

(a) Target based commit

- Server commits data based on the no of target rows and the key constraints on the
target table. The commit point also depends on the buffer block size and the commit
interval.
- During a session, the server continues to fill the writer buffer, after it reaches the
commit interval. When the buffer block is full, the Informatica server issues a
commit command. As a result, the amount of data committed at the commit point
generally exceeds the commit interval.
- The server commits data to each target based on primary foreign key constraints.

(b) Source based commit


- Server commits data based on the number of source rows. The commit point is the
commit interval you configure in the session properties.
- During a session, the server commits data to the target based on the number of rows
from an active source in a single pipeline. The rows are referred to as source rows.
- A pipeline consists of a source qualifier and all the transformations and targets that
receive data from source qualifier.
- Although the Filter, Router and Update Strategy transformations are active
transformations, the server does not use them as active sources in a source based
commit session.
- When a server runs a session, it identifies the active source for each pipeline in the
mapping. The server generates a commit row from the active source at every commit
interval.
- When each target in the pipeline receives the commit rows the server performs the
commit.

Reject Loading

During a session, the server creates a reject file for each target instance in the mapping. If the
writer of the target rejects data, the server writers the rejected row into the reject file.

You can correct those rejected data and re-load them to relational targets, using the reject
loading utility. (You cannot load rejected data into a flat file target)

Each time, you run a session, the server appends a rejected data to the reject file.

Locating the BadFiles

$PMBadFileDir
Filename.bad

When you run a partitioned session, the server creates a separate reject file for each partition.

Reading Rejected data

Ex: 3,D,1,D,D,0,D,1094345609,D,0,0.00

To help us in finding the reason for rejecting, there are two main things.


(a) Row indicator

Row indicator tells the writer, what to do with the row of wrong data.

Row indicator Meaning Rejected By
0 Insert Writer or target
1 Update Writer or target
2 Delete Writer or target
3 Reject Writer

If a row indicator is 3, the writer rejected the row because an update strategy expression
marked it for reject.

(b) Column indicator

Column indicator is followed by the first column of data, and another column indicator.
They appears after every column of data and define the type of data preceding it

Column Indicator Meaning Writer Treats
as
D Valid Data Good Data. The
target accepts
it unless a
database error
occurs, such as
finding
duplicate key.

O Overflow Bad Data.
N Null Bad Data.
T Truncated Bad Data

NOTE

NULL columns appear in the reject file with commas marking their column.

Correcting Reject File

Use the reject file and the session log to determine the cause for rejected data.

Keep in mind that correcting the reject file does not necessarily correct the source of the
reject.
Correct the mapping and target database to eliminate some of the rejected data when you run
the session again.

Trying to correct target rejected rows before correcting writer rejected rows is not
recommended since they may contain misleading column indicator.

For example, a series of N indicator might lead you to believe the target database does not
accept NULL values, so you decide to change those NULL values to Zero.

However, if those rows also had a 3 in row indicator. Column, the row was rejected b the
writer because of an update strategy expression, not because of a target database restriction.

If you try to load the corrected file to target, the writer will again reject those rows, and they
will contain inaccurate 0 values, in place of NULL values.
Why writer can reject ?

- Data overflowed column constraints
- An update strategy expression

Why target database can Reject ?

- Data contains a NULL column
- Database errors, such as key violations

Steps for loading reject file:

- After correcting the rejected data, rename the rejected file to reject_file.in
- The rejloader used the data movement mode configured for the server. It also used
the code page of server/OS. Hence do not change the above, in middle of the reject
loading
- Use the reject loader utility
Pmrejldr pmserver.cfg [folder name] [session name]




Other points

The server does not perform the following option, when using reject loader

(a) Source base commit
(b) Constraint based loading
(c) Truncated target table
(d) FTP targets
(e) External Loading

Multiple reject loaders

You can run the session several times and correct rejected data from the several session at
once. You can correct and load all of the reject files at once, or work on one or two reject
files, load then and work on the other at a later time.

External Loading

You can configure a session to use Sybase IQ, Teradata and Oracle external loaders to load
session target files into the respective databases.

The External Loader option can increase session performance since these databases can load
information directly from files faster than they can the SQL commands to insert the same data
into the database.

Method:

When a session used External loader, the session creates a control file and target flat file. The
control file contains information about the target flat file, such as data format and loading
instruction for the External Loader. The control file has an extension of *.ctl and you can
view the file in $PmtargetFilesDir.

For using an External Loader:

The following must be done:

- configure an external loader connection in the server manager
- Configure the session to write to a target flat file local to the server.
- Choose an external loader connection for each target file in session property sheet.


Issues with External Loader:

- Disable constraints
- Performance issues
o Increase commit intervals
o Turn off database logging

- Code page requirements
- The server can use multiple External Loader within one session (Ex: you are having
a session with the two target files. One with Oracle External Loader and another with
Sybase External Loader)

Other Information:

- The External Loader performance depends upon the platform of the server
- The server loads data at different stages of the session
- The serve writes External Loader initialization and completing messaging in the
session log. However, details about EL performance, it is generated at EL log, which
is getting stored as same target directory.
- If the session contains errors, the server continues the EL process. If the session fails,
the server loads partial target data using EL.
- The EL creates a reject file for data rejected by the database. The reject file has an
extension of *.ldr reject.
- The EL saves the reject file in the target file directory
- You can load corrected data from the file, using database reject loader, and not
through Informatica reject load utility (For EL reject file only)

Configuring EL in session

- In the server manager, open the session property sheet
- Select File target, and then click flat file options

Caches

- server creates index and data caches in memory for aggregator ,rank ,joiner and
Lookup transformation in a mapping.
- Server stores key values in index caches and output values in data caches : if the
server requires more memory ,it stores overflow values in cache files .
- When the session completes, the server releases caches memory, and in most
circumstances, it deletes the caches files .
- Caches Storage overflow :
- releases caches memory, and in most circumstances, it deletes the caches files .



Caches Storage overflow :

Transformation index cache data cache
Aggregator stores group values stores calculations
As configured in the based on Group-by
ports
Group-by ports.
Rank stores group values as stores ranking information
Configured in the Group-by based on Group-by
ports .
Joiner stores index values for stores master source rows .
The master source table
As configured in Joiner condition.
Lookup stores Lookup condition stores lookup data thats
Information. Not stored in the index
cache.

Determining cache requirements

To calculate the cache size, you need to consider column and row requirements as well
as processing overhead.

- server requires processing overhead to cache data and index information.
Column overhead includes a null indicator, and row overhead can include row to key
information.

Steps:

- first, add the total column size in the cache to the row overhead.
- Multiply the result by the no of groups (or) rows in the cache this gives the minimum
cache requirements .
- For maximum requirements, multiply min requirements by 2.


Location:

-by default , the server stores the index and data files in the directory $PMCacheDir.

-the server names the index files PMAGG*.idx and data files PMAGG*.dat. if the size
exceeds 2GB,you may find multiple index and data files in the directory .The server appends
a number to the end of filename(PMAGG*.id*1,id*2,etc).

Aggregator Caches

- when server runs a session with an aggregator transformation, it stores data in memory
until it completes the aggregation.

- when you partition a source, the server creates one memory cache and one disk cache
and one and disk cache for each partition .It routes data from one partition to another
based on group key values of the transformation.

- server uses memory to process an aggregator transformation with sort ports. It doesnt
use cache memory .you dont need to configure the cache memory, that use sorted ports.

Index cache:

#Groups (( column size) + 7)

Aggregate data cache:


Rank Cache

- when the server runs a session with a Rank transformation, it compares an input row
with rows with rows in data cache. If the input row out-ranks a stored row,the
Informatica server replaces the stored row with the input row.
- If the rank transformation is configured to rank across multiple groups, the server
ranks incrementally for each group it finds .


Index Cache :


Rank Data Cache:

#Group [(#Ranks * ( column size + 10)) + 20]

Joiner Cache:

- When server runs a session with joiner transformation, it reads all rows from the
master source and builds memory caches based on the master rows.
- After building these caches, the server reads rows from the detail source and
performs the joins
- Server creates the Index cache as it reads the master source into the data cache. The
server uses the Index cache to test the join condition. When it finds a match, it
retrieves rows values from the data cache.
- To improve joiner performance, the server aligns all data for joiner cache or an eight
byte boundary.

Index Cache :

#Master rows [( column size) + 16)

Joiner Data Cache:

#Master row [( column size) + 8]

Lookup cache:

- When server runs a lookup transformation, the server builds a cache in memory,
when it process the first row of data in the transformation.
- Server builds the cache and queries it for the each row that enters the transformation.
- If you partition the source pipeline, the server allocates the configured amount of
memory for each partition. If two lookup transformations share the cache, the server
does not allocate additional memory for the second lookup transformation.
- The server creates index and data cache files in the lookup cache drectory and used
the server code page to create the files.



Index Cache :

#Rows in lookup table [( column size) + 16)

Lookup Data Cache:

#Rows in lookup table [( column size) + 8]

\Transformations

A transformation is a repository object that generates, modifies or passes data.

(a) Active Transformation:
a. Can change the number of rows, that passes through it (Filter, Normalizer, Rank
..)

(b) Passive Transformation:
a. Does not change the no of rows that passes through it (Expression, lookup ..)

NOTE:

- Transformations can be connected to the data flow or they can be unconnected
- An unconnected transformation is not connected to other transformation in the
mapping. It is called with in another transformation and returns a value to that
transformation

Reusable Transformations:

When you are using reusable transformation to a mapping, the definition of transformation
exists outside the mapping while an instance appears with mapping.

All the changes you are making in transformation will immediately reflect in instances.

You can create reusable transformation by two methods:

(a) Designing in transformation developer
(b) Promoting a standard transformation


Change that reflects in mappings are like expressions. If port name etc. are changes they
wont reflect.

Handling High-Precision Data:

- Server process decimal values as doubles or decimals.
- When you create a session, you choose to enable the decimal data type or let the
server process the data as double (Precision of 15)

Example:
- You may have a mapping with decimal (20,0) that passes through. The value may be
40012030304957666903.

If you enable decimal arithmetic, the server passes the number as it is. If you do not
enable decimal arithmetic, the server passes 4.00120303049577 X 10
19
.

If you want to process a decimal value with a precision greater than 28 digits, the
server automatically treats as a double value.
Mapplets

When the server runs a session using a mapplets, it expands the mapplets. The server then runs
the session as it would any other sessions, passing data through each transformations in the
mapplet.
If you use a reusable transformation in a mapplet, changes to these can invalidate the mapplet and
every mapping using the mapplet.

You can create a non-reusable instance of a reusable transformation.

Mapplet Objects:
(a) Input transformation
(b) Source qualifier
(c) Transformations, as you need
(d) Output transformation

Mapplet Wont Support:

- Joiner
- Normalizer
- Pre/Post session stored procedure

- Target definitions
- XML source definitions

Types of Mapplets:
(a) Active Mapplets - Contains one or more active transformations
(b) Passive Mapplets - Contains only passive transformation

Copied mapplets are not an instance of original mapplets. If you make changes to the original, the
copy does not inherit your changes

You can use a single mapplet, even more than once on a mapping.

Ports

Default value for I/P port - NULL
Default value for O/P port - ERROR
Default value for variables - Does not support default values

Session Parameters

This parameter represent values you might want to change between sessions, such as DB
Connection or source file.

We can use session parameter in a session property sheet, then define the parameters in a session
parameter file.

The user defined session parameter are:
(a) DB Connection
(b) Source File directory
(c) Target file directory
(d) Reject file directory

Description:

Use session parameter to make sessions more flexible. For example, you have the same type of
transactional data written to two different databases, and you use the database connections
TransDB1 and TransDB2 to connect to the databases. You want to use the same mapping for both
tables.


Instead of creating two sessions for the same mapping, you can create a database connection
parameter, like $DBConnectionSource, and use it as the source database connection for the
session.

When you create a parameter file for the session, you set $DBConnectionSource to TransDB1
and run the session. After it completes set the value to TransDB2 and run the session again.

NOTE:

You can use several parameter together to make session management easier.
Session parameters do not have default value, when the server can not find a value for a session
parameter, it fails to initialize the session.

Session Parameter File

- A parameter file is created by text editor.

- In that, we can specify the folder and session name, then list the parameters and
variables used in the session and assign each value.

- Save the parameter file in any directory, load to the server

- We can define following values in a parameter
o Mapping parameter
o Mapping variables
o Session parameters

- You can include parameter and variable information for more than one session in a
single parameter file by creating separate sections, for each session with in the
parameter file.

- You can override the parameter file for sessions contained in a batch by using a batch
parameter file. A batch parameter file has the same format as a session parameter file

Locale

Informatica server can transform character data in two modes


(a) ASCII
a. Default one
b. Passes 7 byte, US-ASCII character data

(b) UNICODE
a. Passes 8 bytes, multi byte character data
b. It uses 2 bytes for each character to move data and performs additional checks at
session level, to ensure data integrity.

Code pages contains the encoding to specify characters in a set of one or more languages. We can
select a code page, based on the type of character data in the mappings.

Compatibility between code pages is essential for accurate data movement.

The various code page components are

- Operating system Locale settings
- Operating system code page
- Informatica server data movement mode
- Informatica server code page
- Informatica repository code page

Locale

(a) System Locale - System Default
(b) User locale - setting for date, time, display
Input locale

Mapping Parameter and Variables

These represent values in mappings/mapplets.

If we declare mapping parameters and variables in a mapping, you can reuse a mapping by
altering the parameter and variable values of the mappings in the session.
This can reduce the overhead of creating multiple mappings when only certain attributes of
mapping needs to be changed.

When you want to use the same value for a mapping parameter each time you run the session.


Unlike a mapping parameter, a mapping variable represent a value that can change through
the session. The server saves the value of a mapping variable to the repository at the end of
each successful run and used that value the next time you run the session.

Mapping objects:

Source, Target, Transformation, Cubes, Dimension

Debugger

We can run the Debugger in two situations

(a) Before Session: After saving mapping, we can run some initial tests.
(b) After Session: real Debugging process

Metadata Reporter:

- Web based application that allows to run reports against repository metadata
- Reports including executed sessions, lookup table dependencies, mappings and
source/target schemas.

Repository

Types of Repository

(a) Global Repository
a. This is the hub of the domain use the GR to store common objects that multiple
developers can use through shortcuts. These may include operational or
application source definitions, reusable transformations, mapplets and mappings

(b) Local Repository
a. A Local Repository is with in a domain that is not the global repository. Use4 the
Local Repository for development.

Standard Repository

a. A repository that functions individually, unrelated and unconnected to other
repository
NOTE:
- Once you create a global repository, you can not change it to a local repository
- However, you can promote the local to global repository

Batches

- Provide a way to group sessions for either serial or parallel execution by server
- Batches
o Sequential (Runs session one after another)
o Concurrent (Runs sessions at same time)

Nesting Batches
Each batch can contain any number of session/batches. We can nest batches several levels
deep, defining batches within batches

Nested batches are useful when you want to control a complex series of sessions that must
run sequentially or concurrently

Scheduling
When you place sessions in a batch, the batch schedule override that session schedule by
default. However, we can configure a batched session to run on its own schedule by selecting
the Use Absolute Time Session Option.

Server Behavior
Server configured to run a batch overrides the server configuration to run sessions within the
batch. If you have multiple servers, all sessions within a batch run on the Informatica server
that runs the batch.

The server marks a batch as failed if one of its sessions is configured to run if Previous
completes and that previous session fails.

Sequential Batch
If you have sessions with dependent source/target relationship, you can place them in a
sequential batch, so that Informatica server can run them is consecutive order.

They are two ways of running sessions, under this category
(a) Run the session, only if the previous completes successfully
(b) Always run the session (this is default)

Concurrent Batch
In this mode, the server starts all of the sessions within the batch, at same time
Concurrent batches take advantage of the resource of the Informatica server, reducing the
time it takes to run the session separately or in a sequential batch.


Concurrent batch in a Sequential batch

If you have concurrent batches with source-target dependencies that benefit from running
those batches in a particular order, just like sessions, place them into a sequential batch.

Server Concepts
The Informatica server used three system resources
(a) CPU
(b) Shared Memory
(c) Buffer Memory

Informatica server uses shared memory, buffer memory and cache memory for session
information and to move data between session threads.

LM Shared Memory

Load Manager uses both process and shared memory. The LM keeps the information server
list of sessions and batches, and the schedule queue in process memory.

Once a session starts, the LM uses shared memory to store session details for the duration of
the session run or session schedule. This shared memory appears as the configurable
parameter (LMSharedMemory) and the server allots 2,000,000 bytes as default.

This allows you to schedule or run approximately 10 sessions at one time.

DTM Buffer Memory

The DTM process allocates buffer memory to the session based on the DTM buffer poll size
settings, in session properties. By default, it allocates 12,000,000 bytes of memory to the
session.

DTM divides memory into buffer blocks as configured in the buffer block size settings.
(Default: 64,000 bytes per block)

Running a Session

The following tasks are being done during a session

1. LM locks the session and read session properties
2. LM reads parameter file
3. LM expands server/session variables and parameters
4. LM verifies permission and privileges
5. LM validates source and target code page
6. LM creates session log file
7. LM creates DTM process
8. DTM process allocates DTM process memory
9. DTM initializes the session and fetches mapping
10. DTM executes pre-session commands and procedures
11. DTM creates reader, writer, transformation threads for each pipeline
12. DTM executes post-session commands and procedures
13. DTM writes historical incremental aggregation/lookup to repository
14. LM sends post-session emails

Stopping and aborting a session

- If the session you want to stop is a part of batch, you must stop the batch
- If the batch is part of nested batch, stop the outermost batch
- When you issue the stop command, the server stops reading data. It continues
processing and writing data and committing data to targets
- If the server cannot finish processing and committing data, you can issue the ABORT
command. It is similar to stop command, except it has a 60 second timeout. If the
server cannot finish processing and committing data within 60 seconds, it kills the
DTM process and terminates the session.

Recovery:

- After a session being stopped/aborted, the session results can be recovered. When the
recovery is performed, the session continues from the point at which it stopped.
- If you do not recover the session, the server runs the entire session the next time.
- Hence, after stopping/aborting, you may need to manually delete targets before the
session runs again.

NOTE:

ABORT command and ABORT function, both are different.

When can a Session Fail

- Server cannot allocate enough system resources
- Session exceeds the maximum no of sessions the server can run concurrently
- Server cannot obtain an execute lock for the session (the session is already locked)
- Server unable to execute post-session shell commands or post-load stored procedures
- Server encounters database errors
- Server encounter Transformation row errors (Ex: NULL value in non-null fields)
- Network related errors

When Pre/Post Shell Commands are useful

- To delete a reject file
- To archive target files before session begins

Session Performance

- Minimum log (Terse)
- Partitioning source data.
- Performing ETL for each partition, in parallel. (For this, multiple CPUs are needed)
- Adding indexes.
- Changing commit Level.
- Using Filter trans to remove unwanted data movement.
- Increasing buffer memory, when large volume of data.
- Multiple lookups can reduce the performance. Verify the largest lookup table and
tune the expressions.
- In session level, the causes are small cache size, low buffer memory and small
commit interval.
- At system level,
o WIN NT/2000-U the task manager.
o UNIX: VMSTART, IOSTART.

Hierarchy of optimization

- Target.
- Source.
- Mapping
- Session.
- System.

Optimizing Target Databases:

- Drop indexes /constraints
- Increase checkpoint intervals.

- Use bulk loading /external loading.
- Turn off recovery.
- Increase database network packet size.

Source level

- Optimize the query (using group by, group by).
- Use conditional filters.
- Connect to RDBMS using IPC protocol.

Mapping
- Optimize data type conversions.
- Eliminate transformation errors.
- Optimize transformations/ expressions.

Session:
- concurrent batches.
- Partition sessions.
- Reduce error tracing.
- Remove staging area.
- Tune session parameters.

System:
- improve network speed.
- Use multiple preservers on separate systems.
- Reduce paging.

Session Process

Info server uses both process memory and system shared memory to perform
ETL process.
It runs as a daemon on UNIX and as a service on WIN NT.

The following processes are used to run a session:
(a) LOAD manager process: - starts a session
creates DTM process, which creates the session.

(b) DTM process: - creates threads to initialize the session
- read, write and transform data.
- handle pre/post session opertions.

Load manager processes:
- manages session/batch scheduling.
- Locks session.
- Reads parameter file.
- Expands server/session variables, parameters .
- Verifies permissions/privileges.
- Creates session log file.

DTM process:
The primary purpose of the DTM is to create and manage threads that carry out the
session tasks.

The DTM allocates process memory for the session and divides it into buffers. This is
known as buffer memory. The default memory allocation is 12,000,000 bytes .it creates
the main thread, which is called master thread .this manages all other threads.

Various threads functions

Master thread- handles stop and abort requests from
load manager.

Mapping thread- one thread for each session.
Fetches session and mapping
information.
Compiles mapping.
Cleans up after execution.

Reader thread- one thread for each partition.
Relational sources uses relational
threads and
Flat files use file threads.

Writer thread- one thread for each partition writes to
target.

Transformation thread- one or more transformation for each partition.

Note:

When you run a session, the threads for a partitioned source execute concurrently.
The threads use buffers to move/transform data.

This section describes new features and enhancements to PowerCenter 6.0 and PowerMart 6.0.
Designer
Compare objects. The Designer allows you to compare two repository objects of the
same type to identify differences between them. You can compare sources, targets,
transformations, mapplets, mappings, instances, or mapping/mapplet dependencies in
detail. You can compare objects across open folders and repositories.
Copying objects. In each Designer tool, you can use the copy and paste functions to copy
objects from one workspace to another. For example, you can select a group of
transformations in a mapping and copy them to a new mapping.
Custom tools. The Designer allows you to add custom tools to the Tools menu. This
allows you to start programs you use frequently from within the Designer.
Flat file targets. You can create flat file target definitions in the Designer to output data
to flat files. You can create both fixed-width and delimited flat file target definitions.
Heterogeneous targets. You can create a mapping that outputs data to multiple database
types and target types. When you run a session with heterogeneous targets, you can
specify a database connection for each relational target. You can also specify a file name
for each flat file or XML target.
Link paths. When working with mappings and mapplets, you can view link paths. Link
paths display the flow of data from a column in a source, through ports in
transformations, to a column in the target.
Linking ports. You can now specify a prefix or suffix when automatically linking ports
between transformations based on port names.
Lookup cache. You can use a dynamic lookup cache in a Lookup transformation to
insert and update data in the cache and target when you run a session.
Mapping parameter and variable support in lookup SQL override. You can use
mapping parameters and variables when you enter a lookup SQL override.
Mapplet enhancements. Several mapplet restrictions are removed. You can now include
multiple Source Qualifier transformations in a mapplet, as well as Joiner transformations
and Application Source Qualifier transformations for IBM MQSeries. You can also
include both source definitions and Input transformations in one mapplet. When you
work with a mapplet in a mapping, you can expand the mapplet to view all
transformations in the mapplet.
Metadata extensions. You can extend the metadata stored in the repository by creating
metadata extensions for repository objects. The Designer allows you to create metadata
extensions for source definitions, target definitions, transformations, mappings, and
mapplets.
Numeric and datetime formats. You can define formats for numeric and datetime
values in flat file sources and targets. When you define a format for a numeric or
datetime value, the Informatica Server uses the format to read from the file source or to
write to the file target.
Pre- and post-session SQL. You can specify pre- and post-session SQL in a Source
Qualifier transformation and in a mapping target instance when you create a mapping in
the Designer. The Informatica Server issues pre-SQL commands to the database once
before it runs the session. Use pre-session SQL to issue commands to the database such
as dropping indexes before extracting data. The Informatica Server issues post-session

SQL commands to the database once after it runs the session. Use post-session SQL to
issue commands to a database such as re-creating indexes.
Renaming ports. If you rename a port in a connected transformation, the Designer
propagates the name change to expressions in the transformation.
Sorter transformation. The Sorter transformation is an active transformation that allows
you to sort data from relational or file sources in ascending or descending order according
to a sort key. You can increase session performance when you use the Sorter
transformation to pass data to an Aggregator transformation configured for sorted input in
a mapping.
Tips. When you start the Designer, it displays a tip of the day. These tips help you use the
Designer more efficiently. You can display or hide the tips by choosing Help-Tip of the
Day.
Tool tips for port names. Tool tips now display for port names. To view the full contents
of the column, position the mouse over the cell until the tool tip appears.
View dependencies. In each Designer tool, you can view a list of objects that depend on a
source, source qualifier, transformation, or target. Right-click an object and select the
View Dependencies option.
Working with multiple ports or columns. In each Designer tool, you can move multiple
ports or columns at the same time.
Informatica Server
Add timestamp to workflow logs. You can configure the Informatica Server to add a
timestamp to messages written to the workflow log.
Expanded pmcmd capability. You can use pmcmd to issue a number of commands to the
Informatica Server. You can use pmcmd in either an interactive or command line mode.
The interactive mode prompts you to enter information when you omit parameters or
enter invalid commands. In both modes, you can enter a command followed by its
command options in any order. In addition to commands for starting and stopping
workflows and tasks, pmcmd now has new commands for working in the interactive
mode and getting details on servers, sessions, and workflows.
Error handling. The Informatica Server handles the abort command like the stop
command, except it has a timeout period. You can specify when and how you want the
Informatica Server to stop or abort a workflow by using the Control task in the workflow.
After you start a workflow, you can stop or abort it through the Workflow Monitor or
pmcmd.
Export session log to external library. You can configure the Informatica Server to
write the session log to an external library.
Flat files. You can specify the precision and field length for columns when the
Informatica Server writes to a flat file based on a flat file target definition, and when it
reads from a flat file source. You can also specify the format for datetime columns that
the Informatica Server reads from flat file sources and writes to flat file targets.
Write Informatica Windows Server log to a file. You can now configure the
Informatica Server on Windows to write the Informatica Server log to a file.
Metadata Reporter
List reports for jobs, sessions, workflows, and worklets. You can run a list report that
lists all jobs, sessions, workflows, or worklets in a selected repository.
Details reports for sessions, workflows, and worklets. You can run a details report to
view details about each session, workflow, or worklet in a selected repository.

Completed session, workflow, or worklet detail reports. You can run a completion
details report, which displays details about how and when a session, workflow, or worklet
ran, and whether it ran successfully.
Installation on WebLogic. You can now install the Metadata Reporter on WebLogic and
run it as a web application.
Repository Manager
metadata extensions for repository objects. The Repository Manager allows you to create
metadata extensions for source definitions, target definitions, transformations, mappings,
mapplets, sessions, workflows, and worklets.
pmrep security commands. You can use pmrep to create or delete repository users and
groups. You can also use pmrep to modify repository privileges assigned to users and
groups.
Tips. When you start the Repository Manager, it displays a tip of the day. These tips help
you use the Repository Manager more efficiently. You can display or hide the tips by
choosing Help-Tip of the Day.
Repository Server
The Informatica Client tools and the Informatica Server now connect to the repository database
over the network through the Repository Server.
Repository Server. The Repository Server manages the metadata in the repository
database. It accepts and manages all repository client connections and ensures repository
consistency by employing object locking. The Repository Server can manage multiple
repositories on different machines on the network.

Repository connectivity changes. When you connect to the repository, you must specify
the host name of the machine hosting the Repository Server and the port number the
Repository Server uses to listen for connections. You no longer have to create an ODBC
data source to connect a repository client application to the repository.
Transformation Language
New functions. The transformation language includes two new functions, ReplaceChr
and ReplaceStr. You can use these functions to replace or remove characters or strings in
text data.
SETVARIABLE. The SETVARIABLE function now executes for rows marked as insert
or update.
Workflow Manager
The Workflow Manager and Workflow Monitor replace the Server Manager. Instead of creating a
session, you now create a process called a workflow in the Workflow Manager. A workflow is a
set of instructions on how to execute tasks such as sessions, emails, and shell commands. A
session is now one of the many tasks you can execute in the Workflow Manager.
The Workflow Manager provides other tasks such as Assignment, Decision, and Event-Wait
tasks. You can also create branches with conditional links. In addition, you can batch workflows
by creating worklets in the Workflow Manager.
DB2 external loader. You can use the DB2 EE external loader to load data to a DB2 EE
database. You can use the DB2 EEE external loader to load data to a DB2 EEE database.
The DB2 external loaders can insert data, replace data, restart load operations, or
terminate load operations.

Environment SQL. For relational databases, you may need to execute some SQL
commands in the database environment when you connect to the database. For example,
you might want to set isolation levels on the source and target systems to avoid
deadlocks. You configure environment SQL in the database connection. You can use
environment SQL for source, target, lookup, and stored procedure connections.
Email. You can create email tasks in the Workflow Manager to send emails when you
run a workflow. You can configure a workflow to send an email anywhere in the
workflow logic, including after a session completes or after a session fails. You can also
configure a workflow to send an email when the workflow suspends on error.
Flat file targets. In the Workflow Manager, you can output data to a flat file from either
a flat file target definition or a relational target definition.
Heterogeneous targets. You can output data to different database types and target types
in the same session. When you run a session with heterogeneous targets, you can specify
a database connection for each relational target. You can also specify a file name for each
flat file or XML target.
metadata extensions for repository objects. The Workflow Manager allows you to create
metadata extensions for sessions, workflows, and worklets.
Oracle 8 direct path load support. You can load data directly to Oracle 8i in bulk mode
without using an external loader. You can load data directly to an Oracle client database
version 8.1.7.2 or higher.
Partitioning enhancements. To improve session performance, you can set partition
points at multiple transformations in a pipeline. You can also specify different partition
types at each partition point.
Server variables. You can use new server variables to define the workflow log directory
and workflow log count.
Teradata TPump external loader. You can use the Teradata TPump external loader to load
data to a Teradata database. You can use TPump in sessions that contain multiple
partitions.
Tips. When you start the Workflow Manager, it displays a tip of the day. These tips help
you use the Workflow Manager more efficiently. You can display or hide the tips by
choosing Help-Tip of the Day.
Workflow log. In addition to session logs, you can configure the Informatica Server to
create a workflow log to record details about workflow runs.
Workflow Monitor. You use a tool called the Workflow Monitor to monitor workflows,
worklets, and tasks. The Workflow Monitor displays information about workflow runs in
two views: Gantt Chart view or Task view. You can run, stop, abort, and resume
workflows from the Workflow Monitor.



1. What was the complex transformation u created

a. Normalizer transformation to create multiple rows out of single row
manipulating the occurrences of the Key block of the row.

e.g. input employee_id dept tasks
112 HR admin, interview, payroll

output :
112 HR admin
112 HR interview
112 HR pay roll

b. Used Informatica Metadata Exchange (MX) (Rep_Session_tbl_log, Rep_targ_tbls,
Rep_targ_Mapping, Rep_Src_Mapping etc)views to extract source, target, mapping, session
information's

c. Used parameter file ( .txt file stored in the server file system )
with input values for the Batch
e.g. [s_m_Map1] $$ACC_YEAR=2003 $$ACC_PERIOD=12
[s_m_Map2] $$ACC_YEAR=2003

2. How did u implement the Update Strategy multiple
Not sure : Is it ?

Target Update Override
By default, the Informatica Server updates targets based on
key values. However, you can override the default UPDATE statement for each
target in a mapping. You might want to update the target based on non-key
columns.
For a mapping without an Update Strategy transformation,
configure the session to mark source records as update.
If your mapping includes an Update Strategy transformation,
the Target Update option only affects source records marked as update. The
Informatica Server processes all records marked as insert, delete, or reject
normally. When you configure the session, mark source records as
data-driven. The Target Update Override only affects source rows marked as
update by the Update Strategy transformation.

Overriding the WHERE Clause
You can override the WHERE clause to include non-key columns. For example,
you might want to update records for employees named Mike Smith only. To do
this, you edit the WHERE clause as follows:
UPDATE T_SALES SET DATE_SHIPPED = :TU.DATE_SHIPPED,

TOTAL_SALES = :TU.TOTAL_SALES WHERE :TU.EMP_NAME = EMP_NAME and
EMP_NAME = 'MIKE SMITH'
Entering a Target Update Statement
Follow these instructions to create an update statement.
To enter a target update statement:
1. Double-click the title bar of a target instance.
2. Click Properties.
4. Click the arrow button in the Update Override field.
5. The SQL Editor displays.
5. Select Generate SQL.
The default UPDATE statement appears.
6. Modify the update statement.
You can override the WHERE clause to include non-key columns.
7. Click OK.

3. How did u handle error handling
Use of filter to remove the existing records so only the
non-existing will Flow through.

Basic error handling can be done using exp transformation to
check for the known possible errors, if found give appropriate label to the
rows, then pass them through router transformation and direct the rows to
the error tables or the target.

Some of the generic checks which are common to most of the
mapping e.g. check for zero, null, length etc can be defined as reusable
expression transformation which then can used in different mappings.

4. Strength and weakness of informatica

1. Rows are processed one at a time, so calculation and
checks among the rows is difficult, even though to some extent values can be
stored in variables for further processing. This causes increase in the
steps required to accomplish the job. E.g. first load the rows in a
temporary table with some calculation, and then make this temp table the
source for another tables for further row manipulations etc.

1. Very good for load and extraction, however not so flexible to implement procedural
programming logic.
2. 2. Every mapping needs a source and target, so sometimes
Just end up using dummy source and target (dual tables) to build the logic.

3. No Replace character function in 5.1 (6.0 has it)
4. Ease of use, graphical representation of the ELT
process, Maplets, and reusable exp transformation helps to standardize the
ETL process across the organization.
5. Easy for knowledge transfer and maintenance.

Give an example of a complex transformation u created using Informatica

ETL Process in recent project and error handling
How Many Mapping u has in Ur recent Prj?

What are Dashboards in BO ?
. like CEO want to see his company performance in one page or two page report, like
consolidated data this quarter revenues, and form last quarter to this quarter growth rate and
couple of graphs etc.. What are my top ten customers, how many new customers you got this
quarter or year etc....
What is a repository
o A Repository is a Meta data, which is stored in a centralized location, can be
imported and exported to various systems stores information about all the
components of informatica like information about sessions, mappings,
transformation, mapplets. Session parameters, mapping parameters, local and
global variable, reusable transformations
What are session Parameters , Difference between pre and post session parameters
Pre session parameters Ex:- like executing a stored procedure or a sql query dropping
indexes to improve performance
Post Session Parameters Ex:- Recreating the indexes, EMAIL
Explain the ETL Architecture
Data Staging Area :- Temporary like also called operational data store one level down to
data mart
Difference between PL/SQL and informatica. Why do u need informatica justify
Pl/SQl is free with oracle
Informatica is expensive
Easily u can load the data in a more sophisticated way using informatica
U can monitor the load process , Schedule Sessions etc in informatica

What are the Differences Between informatica 4.7 and 5.1 and 6.0
o Router Tranformation is avialble from 5.0 onwards
o Debugging in Designer
o Partition of session in session manager
o In 6.0 complete heterogeneous targets one in oracle one in db2 into multiple
targets
o Data partitioning run as multiple sessions in informatica 6.0
o Repository Server (New Component)
o Workflow Manager (New Component)
o Workflow Monitor (New Component)
Explain the process of moving ur data to production
o 1. Through Xml way of moving into production
o 2. Th scripts
o System testing environment: - u will have already repository in test environment
by xml way or th scripts,
o Prepare the scripts for Tables, views with constraints, and indexes.
Change the session connections
What are the tools used in ur environment for Data Mining and Data Modeling
o Data Modeling :- Erwin from platinum,Visio from Microsoft
o Data Mining: - A component of Data Warehousing Solution, used to discover
patterns and relationships in ur data in order to help u make better business
decisions. Also called knowledge discovery in databases (KDD), Oracles sol to
this business pr of data mining is known as Oracle Darwin or as oracles Data

mining Suite. Used to help u build predictive models, Data Mining is a discovery
process that allows users to understand the substance of and the relationships
between their data Ex: - What is likely to happen to the Boston Sales next month
and why? , Characteristics of Data Mining are Prospective, Proactive
information delivery
o To access remote unix servers tiny tern
o How did u go to oracle
o Using tiny tern
Diff between lkp cache and unlkp cache
o U can disable the option use lkp cache when u are creating a lkp tranfor
o Performance issue if lkp is too large in size lkp cache also takes lot of space
What is diff between data warehousing and enterprise data warehousing
Widely used by whole enterprise data warehousing
Total GE warehouse is called enterprise data warehouse
Data warehouse build for all business in an enterprise is enterprise data warehousing and
for a single business is data warehouse and module in a data warehouse is data mart
Ex:- GE capital , GE appliances, GE electrical each has their own data data
warehouse
What was the challenging task u felt when u are working with informatica

What is sql override
Overriding a SQL in source qualifier or lookup for additional logic.
How do u fine tune a session
A mapping is running very slow how do u handle :- While creating a mapping in the
development process I followed all the rules to fine tune like joiner transformation in the
beginning reducing no of expressions
In the beginning only I check how much time its taking to retrive
At mapping level :- Source sql override (explain plan statistics )
Target we write to flat file then check the time againt to rdbms then u will know bottle
neck is in the db or in the mapping
Then we check network by calling network admin
Then data cache index cache increase
o How did u implement update strategy
Complex Session scheduling in ur experience

Have u used a normalizer transformation where
No
Explain the difference between oracle normalization and data warehouse
denormalization why do we need to denormalize in datawarehousing
Reverse engineering is for the performance i.e. removing additional joins and decreasing
the load on the oltp system
How did u handle error handling
ETL-Row-Errors
Always I used have problems when I am retrieving data from flat files the data is not
formatted arranged properly what I used to do is Using sql * loader I pulled flat file data
into data
They r temporarily tables then I used these temporary tables as source then applied
transformations and loaded into target then deleted the temporary tables

Why did u choose this method
informatica taking lot of time to load data into source using sql * loader the job can be
done in minutes

What are the different kinds of repots u have created in BO in the recent project

What are the different kinds of repots u have created in BO in the recent project Tabular,
cross tab, Master detail, crosstab master detail, table multi master detail.
Can we have multiple conditions in a Lookup?
yes
Can we have multiple conditions in a Filter?
YES
How the flags are called in Update strategy?
IIF
0 - DD_INSERT , 1- DD_UPDATE , 2- DD_DELETE , 3- DD_REJECT
Is it possible to run the session other than Server manager? If so how?
YES USING PMCMD
What is the use of power plug?
For 3
rd
party connectors to sap, mainframe,Peoplesoft
1. What are all the versions of Power Mart/Power Center 1.7/4.7/5.0/5.1 -??
2.

Business Objects:

1. When U are installing BO with what profile you will enter?

What is Short Cut join
What is load balancing ? Fail Over?
What is clustering ?
Diff between lkp cache and unlkp cache
Look up cache Indicates whether the Lookup transformation caches lookup values during the
session. When lookup caching is enabled, the Informatica Server queries the lookup table once,
caches the values, and looks up values in the cache during the session. This can improve session
performance. When you disable caching, each time a row passes into the transformation, the
Informatica Server issues a select statement to the lookup table for lookup values.

When U are installing BO with what profile you will enter?

What is the difference between inner and outer join
Inner is self join and outer join retrieves the information for the given condition and the
information that dose not satisfy the condition

What is the syntax difference for using outer join in SQL Server / Oracle
In Oracle u use (+) symbol and in SQL Server u use *
How do you do requirement analysis?
This is a big question, we start with learning the existing system and interact with the end users
for their requirements and then decide the steps that need to go in modeling, ETL and reporting
What is Exception?

Same as exception handler in Oracle
What is slowly changing Dimension?
I guess you know the TYPE 1,2,3
What is Slowly Growing Dimension?
The dimension, which dose not have many changes for he warehouse, e.g. Region dimension
dose not have many changes it may add a row in a year. You have a wizard in Informatica just go
through
The Target mapping filters source rows based on user-defined comparisons, and then inserts only
those found to be new to the target. Use the Target mapping to determine which source rows are
new and to load them to an existing target table. In the Target mapping, all rows are current. Use
the Target mapping to load a fact or dimension table, one in which existing data does not require
updates.
For example, you have a site code dimension table that contains only a store name and a
corresponding site code that you update only after your company opens a new store. Although
listed stores might close; you want to keep the store code and name in the dimension for historical
analysis. With the Target mapping, you can load new source rows to the site code dimension table
without deleting historical sites.

What kind of Test plan? What kind of validation you do?
In Informatica we create some test SQL to compare the number or records and validate scripts if
the data in the warehouse is loaded for the logic incorporated.
What is the usage of unconnected/connected look up?
We use a lookup for connecting to a table in the source or a target. There are 2 ways in which a
lookup can be configured i.e. connected or unconnected

You can configure a connected Lookup transformation to receive input directly from the mapping
pipeline, or you can configure an unconnected Lookup transformation to receive input from the
result of an expression in another transformation


Receives input values directly from the
pipeline.

You can use a dynamic or static cache.

Cache includes all lookup columns used
in the mapping (that is, lookup table
columns included in the lookup condition
and lookup table columns linked as output
ports to other transformations).

Can return multiple columns from the
same row or insert into the dynamic
lookup cache.

If there is no match for the lookup
condition, the Informatica Server returns
Receives input values from the result of a :LKP
expression in another transformation

You can use a static cache.

Cache includes all lookup/output ports in the
lookup condition and the lookup/return port.

The dynamic lookup cache.
Designate one return port (R). Returns one
column from each row

If there is no match for the lookup condition, the

the default value for all output ports. If
you configure dynamic caching, the
Informatica Server inserts rows into the
cache.

Pass multiple output values to another
transformation. Link lookup/output ports
to another transformation.

Supports user-defined default values.

Informatica Server returns NULL.

Pass one output value to another transformation.
The lookup/output/return port passes the value
to the transformation calling: LKP expression.

Does not support user-defined default values

What is complex mapping and explain?
We can say a mapping involving many lookups; Joiners and complex calculation may be
called as a complex mapping
If you cant find what you are looking for in lookup table, how do you handle?

If u has data coming from diff. sources what transformation will u use in your designer?
How many transformations are there?
15 types of transformations
Can u stop a session in a concurrent batch?
Nope
What happens when u uses the delete or update or reject or insert statement in Ur update
strategy?
How many sessions u can run?
The sessions those are configured in server. Max you can is 50-100(not sure)
What are parameter variables and where do u use it?
Mapping Parameters
A mapping parameter represents a constant value that you can define before running a session. A
mapping parameter retains the same value throughout the entire session.
When you use a mapping parameter, you declare and use the parameter in a mapping or mapplet.
Then define the value of the parameter in a parameter file for the session. During the session, the
Informatica Server evaluates all references to the parameter to that value.
Mapping Variables
Unlike a mapping parameter, a mapping variable represents a value that can change through the
session. The Informatica Server saves the value of a mapping variable to the repository at the end
of each successful session run and uses that value the next time you run the session.

Local variable: You can use local that you create within a mapping in any transformation
expression. For example, if you use a complex tax calculation throughout a mapping, you might
want to write the expression once and designate it as a. you thereby increase performance since
the Informatica Server performs the calculation only once. Local are especially useful when used
with stored procedure expressions to capture multiple return values.


System variable $$$SessStartTime these are constant through out the mapping and cannot be
changed.

What are counters?
The performance details provide that help you understand the session and mapping efficiency.
Each Source Qualifier, target definition, and individual transformation appears in the
performance details, along with that display performance information about each transformation

Understanding Performance Counters
All transformations have some basic that indicates the number of input rows, output rows, and
error rows. Source Qualifiers, Normalizes, and targets have additional that indicates the
efficiency of data moving into and out of buffers. You can use these to locate performance
bottlenecks. Some transformations have specific to their functionality. For example, each Lookup
transformation has an indicates the number of rows stored in the lookup cache. When you read
performance details, the first column displays the transformation name as it appears in the
mapping, the second column contains the name, and the third column holds the resulting number
or efficiency percentage. When you partition a source, the Informatica Server generates one set of
for each partition. The following performance illustrate two partitions for an Expression
transformation:

Transformation

Counter

Value

EXPTRANS [1]

Expression_input rows

8

Expression_output rows

8

EXPTRANS [2]

Expression_input rows

16

Expression_output rows

16

Note: When you partition a session, the number of aggregate or rank input rows may be different
from the number of output rows from the previous transformation.

How do u set the size of block buffer?
How do u pass parameters to unconn. Stored proc?
Configure the expression to send any input parameters and capture any output parameters or
return value You must know whether the parameters shown in the Expression Editor are input or
output parameters. You insert variables or port names between the parentheses in the exact order
that they appear in the stored procedure itself. The datatypes of the ports and variables must
match those of the parameters passed to the stored procedure.
For example, when you click the stored procedure, something similar to the following appears:
:SP.GET_NAME_FROM_ID()

This particular stored procedure requires an integer value as an input parameter and returns a
string value as an output parameter. How the output parameter or return value is captured depends
on the number of output parameters and whether the return value needs to be captured.
If the stored procedure returns a single output parameter or a return value (but not both), you
should use the reserved variable PROC_RESULT as the output variable. In the previous example,
the expression would appear as:
:SP.GET_NAME_FROM_ID(inID, PROC_RESULT)
InID can be either an input port for the transformation or a variable in the transformation. The
value of PROC_RESULT is applied to the output port for the expression.
If the stored procedure returns multiple output parameters, you must create variables for each
output parameter. For example, if you created a port called varOUTPUT2 for the stored
procedure expression, and a variable called varOUTPUT1, the expression would appears as:
:SP.GET_NAME_FROM_ID (inID, varOUTPUT1, PROC_RESULT)
The value of the second output port is applied to the output port for the expression, and the value
of the first output port is applied to varOUTPUT1. The output parameters are returned in the
order they are declared in the stored procedure itself.
With all these expressions, the datatypes for the ports and variables must match the datatypes for
the input/output variables and return value.

How do u implement unconn. Stored proc. In a mapping?
Can u access a repository created in previous version of Informatica?
What happens if the info. Server doesnt find the session parameter in the parameter file?

How does run session from command line?
PMCMD command
What is diff. Things u can do using PMCMD?
Start, Stop and abort the session

1. What are different ports in Informatica?
Ans: Input, Output, Variable, Return/Rank, Lookup and Master.
2. What is a Variable port? Why it is used?
Ans: Variable port is used to store intermediate results.
Variable ports can reference input ports and variable ports, but not output ports.
3. What are the different connectivity Informatica uses to connect to sources, targets
and the repository?
Ans: Powermart and PowerCenter uses
1. Network Protocol
2. Native Drivers
3. ODBC
The Server Manager and the Informatica Server use TCP/IP or IPX/SPX to communicate to
each other.
4. Difference between Data cleansing and Data scrubbing?
Ans: Data cleansing is a process of removing errors and resolving inconsistencies in
source data before loading data into targets.
Data scrubbing is a process of filtering, merging, decoding and translating the source data
to create the validation data for data warehouse.
5. What are reusable transformations? Can we rollback this option?

Ans: Reusable transformations can be used in multiple transformations. Only one
transformation can be used in reusable transformation. You can rollback this option.
Diff between Active and passive transormation ?
Transf can be active or passive , activ tranf can change the no of rec passed th it, a
passive tranf can never change the rec cnt, Active trsnf that might change the rec cnt
are advan ext proc, aggrega, filter,joiner,normalizer, rank , updte strategy, source
qualifier, if u use powerconnect to access erp sources, erp source quail is also an
active tranfor
Passive tranf :- lookup, expression, external procedure, seq generator, stored procedure
U can connect only 1 active tranf to the same tranf or target can connect any no of pass
tranf
6. What are Mapplet?
Ans: A mapplet is a reusable object that represents a set of transformations. It allows you
to reuse transformation logic and can contain as many transformation as you need.
7. What are mapping parameters and variables?
Ans: A mapping parameter represents a constants value that you can define before running a
session. A mapping parameter retains the same value throughout the entire session.
Unlike a mapping parameter, a mapping variable represents a value that can change through
the session.
8. How many transformations have you used?
9. What is Aggregate transformation
Ans: An aggregator transformation allows you to perform aggregate calculations, such as
average and sums. The Aggregator transformation is unlike the Expression transformation, in
that you can use the Aggregator transformation to perform calculations on groups.
10. What is Router Transformation? How is it different from Filter transformation?
Ans: A Router transformation is similar to a Filter transformation because both
transformations allow you to use a condition to test data. A Filter transformation tests data for
one condition and drops the rows of data that do not meet the condition. However, a router
transformation tests data for one or more conditions and gives you the option to route rows of
data that do not meet any of the conditions to default output group.
11. What are connected and unconnected transformations?
Ans: Connected transformations are the transformation, which are in the data flow, whereas
unconnected transformation will not be in the data flow.
These are dealt in Lookup and Stored procedure transformations.
12. What is Normalizer transformation?
Ans: Normalizer transformation normalizes records from COBOL and relational sources
allowing you to organize the data according to your needs. A normalizer transformation
can appear anywhere in a data flow when you normalize a relational source.
13. How to use a sequence created in Oracle in Informatica?
Ans: By using Stored procedure transformation.
14. What are source qualifier transformations?
Ans: The source qualifier represents the records that the Informatica Server reads when it
runs a session.
15. What is a dimension?
Ans: A set of level properties that describe a specific aspect of a business, used for analyzing
the factual measures of one or more cubes, which use that dimension. Ex: geography, time,
customer and product.

16. What is a XML source qualifier?
Ans: The XML source qualifier represents the data elements that the Informatica server reads
when it runs a session with XML sources.
17. What is a cube?
Ans: A set of related factual measures, aggregates, and dimensions for a specific dimensional
analysis problem. Ex: regional product sales.
18. What is Load Manager process?
Ans: Load manager is the primary Informatica server process. It performs the following
tasks:
a. Manages sessions and batch scheduling.
b. Locks the sessions and reads properties.
c. Reads parameter files.
d. Expands the server and session variables and parameters.
e. Verifies permissions and privileges.
f. Validates sources and targets code pages.
g. Creates session log files.
h. Creates Data Transformation Manager (DTM) process, which executes the
session.
19. Define a session?
Ans: A session is a set of instructions that describes how and when to move data from source
to targets
20. What are pmcmd commands?
Ans: pmcmd is a command line program to communicate with the Informatica server. This
does not replace the server manager, since there are many tasks that you can perform only
with server Manager.
21. What are cache and their types in Informatica?
Ans: The Informatica server creates index and data cache for aggregator, Rank, joiner
and Lookup transformations in a mapping. The Informatica server stores key values in
the index cache and output values in the data cache.
22. What are minimum and maximum values for index and data cache?
Ans: Index cache: Min: 12MB Max: 24 MB
Data cache: Min: 12MB Max: 24MB
23. What is default block buffer size?
Ans: 64K
24. What is default LM shared memory size?
Ans: 2MB
25. What is an incremental aggregation?
Ans: In Incremental aggregation, you apply captured changes in the source to aggregate
calculations in a session. If the source changes only incrementally and you can capture
changes, you can configure the session to process only those changes. This allows the
Informatica server to update your target incrementally, rather than forcing it to process the
entire source and recalculate the same calculation each time you run the session.
26. What is Reject loading?
Ans: During a session, the Informatica server creates a reject file for each target instance in
the mapping. If the writer or the target rejects data, the Informatica server writes the rejected
row into reject file.

The reject file and session log contain information that helps you determine the cause of
the reject. You can correct reject files and load them to relational targets using the
Informatica reject load utility. The reject loader also creates another reject file for the
data that the writer or target reject during the reject loading.
27. How many servers instances can you register to a repository?
Ans:
28. What was the size of your warehouse
29. What is Star Schema
30. What is the difference between inner and outer join
31. How big was your fact table
32. How did you handle performance issues If you have data coming in from multiple
sources, just walk thru the process of loading it into the target
33. How will u convert rows into columns or columns into rows
34. Do you have any experience in OOP
35. What is the syntax difference for using outer join in SQL Server / Oracle
36. What is a repository
The place where u store the metadata is called a repository

1. What is the difference between PowerCenter and PowerMart ?

Power Center and Power mart call the instructions to get data,change it ,where to write the
information Data Mart

PowerCenter PowerMart
Can create Local Repository which could be
upgraded to a Global Repository.
Feature is not available
Can Connect to Enterprise Data (SAP/R3,
SIEBEL, VANTIVE, IBM MQ Series
Feature is not available
Ability to register multiple servers, share
metadata across repositories, and partition data
All features except distributed metadata,
multiple registered servers, and data
partitioning

2. What are the max number of sessions that you could configure in Informatica Server ?
You can set three parameters in the Informatica Server configuration that control how the Load
Manager allocates memory to sessions:
MaxSessions. The maximum sessions parameter indicates the maximum number of
session slots available to the Load Manager at one time for running or repeating sessions.
For example, if you select the default MaxSessions of 10, the Load Manager allocates 10
session slots. This parameter helps you control the number of sessions the Informatica
Server can run simultaneously.
LMSharedMemory. The Load Manager shared memory parameter is set in conjunction
with the maximum sessions parameter to ensure that the Load Manager has enough
memory for each session. The Load Manager requires approximately 200,000 bytes of
shared memory for each session slot. The default setting is 2,000,000 bytes. For each
increase of 10 sessions in the MaxSessions setting, you need to increase
LMSharedMemory by 2,000,000 bytes.

KeepRepeatSessionInShm. The Keep Repeating Sessions in Shared Memory option
determines how the Load Manager behaves when there are no available slots for sessions
to run. By default, this option is disabled, and the Load Manager swaps repeating
sessions out of session slots to run new session requests. If all session slots are filled with
running sessions, the Load Manager places new session requests in a waiting state until it
can open a session slot. If you enable this option, the Load Manager retains repeating
sessions in shared memory, and fails to run new session requests.
3. What is unconnected transformation? What is the advance of using Unconnected
transformation?
Unconnected transformation is not part of the data flow
The Informatica Server queries the Lookup table based on the logic of the expression
calling the lookup.
Unconnected Lookup Transformation to check the record whether it exists in the
target database or not.
4. What are the steps involved in the migration from older version to newer version of
Informatica Server?
5. Have you worked in conjunction with Oracle DBA(s) and UNIX administrators?
6. Have you worked with Oracle front-end tools?
7. What are the main features of Oracle 8i with context to datawarehouse?
8. Are you comfortable with Business Objects?
9. What kind of analysis you did in the existing Data warehouse environment?
10. What problem you faced and what were your recommendations (based on your last
project)?
11. How many dimension tables and fact tables were there in your datawarehouse?
If you cant find what you are looking for in lookup table, how do you handle? What is Star
schema / Snow Flake schema?

1.What are Target Types on the Server?
Ans: Target Types are File, Relational and ERP
2.What are Target Options on the Servers?
Ans: Target Options for File Target type are FTP File, Loader and MQ
There are no target options for ERP target type
Target Options for Relational are Insert, Update (as Update), Update (as Insert), Update (else
Insert), Delete,and TruncateTable.
3.How do you identify existing rows of data in the target table using lookup transformation?
Ans: Can identify existing rows of data using unconnected lookup transformation.

4.What are Aggregate transformation?
Ans: Aggregator transformation allows you to perform aggregate calculations, such as averages
and sums.
5.What are various types of Aggregation?
Ans: Various types of aggregation are SUM, AVG, COUNT, MAX, MIN, FIRST, LAST,
MEDIAN, PERCENTILE, STDDEV, and VARIANCE.
6.What are Dimensions and various types of Dimensions?

A set of level properties that describe a specific aspect of a business, used for analyzing the
factual measures of one or more cubes which use that dimension. Egs. Geography, time,
7.What are 2 modes of data movement in Informatica Server?
Ans: The data movement mode depends on whether Informatica Server should process single
byte or multi-byte character data. This mode selection can affect the enforcement of code page
relationships and code page validation in the Informatica Client and Server.
a) Unicode IS allows 2 bytes for each character and uses additional byte for each non-ascii
character (such as Japanese characters)
b) ASCII IS holds all data in a single byte
The IS data movement mode can be changed in the Informatica Server configuration
parameters. This comes into effect once you restart the Informatica Server.
8.What is Code Page Compatibility?
Ans: Compatibility between code pages is used for accurate data movement when the Informatica
Sever runs in the Unicode data movement mode. If the code pages are identical, then there will
not be any data loss. One code page can be a subset or superset of another. For accurate data
movement, the target code page must be a superset of the source code page.
Superset - A code page is a superset of another code page when it contains the character
encoded in the other code page. It also contains additional characters not contained in the other
code page.
Subset - A code page is a subset of another code page when all characters in the code page are
encoded in the other code page.
9.What is Code Page used for?
Ans: Code Page is used to identify characters that might be in different languages. If
you are importing Japanese data into mapping, then u must select the Japanese code page
for the source data.
10.What is Router transformation?
Ans: Router transformation allows you to use a condition to test data. It is similar to filter
transformation. It allows the testing to be done on one or more conditions.
11. What is Load Manager ?
Ans. The load Manager is the Primary informatica Server Process. It Performs the following tasks
-
Manages session and batch scheduling.
Locks the session and read session properties.
Reads the parameter file.
Expand the server and session variables and parameters.
Verify permissions and privileges.
Validate source and target code pages.
Create the session log file.
Create the Data Transformation Manager which execute the session.
12. What is Data Transformation Manager?
Ans. After the load manager performs validations for the session, it creates the DTM process. The
DTM process is the second process associated with the session run. The primary purpose of the
DTM process is to create and manage threads that carry out the session tasks.

The DTM allocates process memory for the session and divide it into buffers. This is also known
as buffer memory. It creates the main thread, which is called the master thread. The master thread
creates and manages all other threads.

If we partition a session, the DTM creates a set of threads for each partition to allow concurrent
processing.. When Informatica server writes messages to the session log it includes thread type
and thread ID. Following are the types of threads that DTM creates:

MASTER THREAD - Main thread of the DTM process. Creates and manages all other
threads.
MAPPING THREAD - One Thread to Each Session. Fetches Session and Mapping
Information.
Pre And Post Session Thread - One Thread Each To Perform Pre And Post Session
Operations.
READER THREAD - One Thread for Each Partition for Each Source Pipeline.
WRITER THREAD - One Thread for Each Partition If Target Exist In The Source pipeline
Write To The Target.
TRANSFORMATION THREAD - One or More Transformation Thread For Each Partition.

13. WHAT IS SESSION AND BATCHES?
Ans. SESSION - A Session Is A set of instructions that tells the Informatica Server How And
When To Move Data From Sources To Targets. After creating the session, we can use either the
server manager or the command line program pmcmd to start or stop the session.
BATCHES - It Provides A Way to Group Sessions For Either Serial Or Parallel Execution By
The Informatica Server. There Are Two Types Of Batches :
1. SEQUENTIAL - Run Session One after the Other.
2. CONCURRENT - Run Session At The Same Time.
14. What is a source qualifier?
Ans: It represents all data queried from the source.
15. Why we use lookup transformations?
Ans: Lookup Transformations can access data from relational tables that are not sources in
mapping. With Lookup transformation, we can accomplish the following tasks:
a) Get a related value - Get the Employee Name from the Employee table based on the
Employee ID
b) Perform Calculation
c) Update slowly changing dimension tables - We can use unconnected lookup
transformation to determine whether the records already exist in the target or not.
Informatica :
1. Difference between Power Center and Mart?
2. What are the new features of Power Center 5.0?
3. How to run a session, which contains mapplet?
4. Differentiate between Load Manager and DTM?
Business Objects:
1. What is Universe ?
2. Define Loops and how to resolve them ?
3. What were the errors which you encountered most while running BO reports ?
4. How to schedule BO Reports ?
5. What are the methods available in Document class of the Web-I SDK ?
6. What is the objective behind Partitioning in Oracle ?

CODE PAGE OVERVIEW


A code page contains the encoding to specify characters in a set of one or more languages. An
encoding is the assignment of a number to a character in the character set. You use code pages to
identify data that might be in different languages. For example, if you are importing Japanese
data into a mapping, you must select a Japanese code page for the source data.

To change the language to English and require the system to use the Latin1 code page, in UNIX,
execute the following command. setenv LANG en_US.iso88591
If You Are Using PowerCenter
With PowerCenter, you receive all product functionality, including the ability to register multiple
servers, share metadata across repositories, and partition data.
repository, the core component of a data warehouse.
When this guide mentions a PowerCenter Server, it is referring to an Informatica Server with a
PowerCenter license.
If You Are Using PowerMart
This version of PowerMart includes all features except distributed metadata, multiple registered
servers, and data partitioning. Also, the various options available with PowerCenter (such as
PowerCenter Integration Server for BW, PowerConnect for IBM DB2, PowerConnect for IBM
MQSeries, PowerConnect for SAP R/3, PowerConnect for Siebel, and PowerConnect for
PeopleSoft) are not available with PowerMart.
When this guide mentions a PowerMart Server, it is referring to an Informatica Server with a
PowerMart license.

Normalizer Transformation:

Normalization protects the data and makes the database more flexible by eliminating redundancy
and inconsistent dependencies.

Used mainly with COBOL sources.
With relational sources, used to create multiple rows from a single row of data.

Use a single Normalizer T to handle multiple levels of denormalization in the same record. For
example, a single record might contain two different detail record sets. Rather than using two
Normalizer T to handle the two different detail record sets, you handle both normalizations in the
same transformation.

Pivoting can be done. Changing columns to rows and vice versa.

A Normalizer column id is created for OCCURS, Pivoting used to identify the Columns.

When we are importing COBOL and Normalizer T automatically created and we cant modify
the field definitions in the Normalizer T. We have to modify only in the Source Analyzer.

Setting Description
Reset
If selected, the Informatica Server resets the generated key value after the session
finishes to its original value.
Restart If selected, the Informatica Server restarts the generated key values from 1 every

time the session runs.
Tracing
level
Determines the amount of information about this transformation that the
Informatica Server writes to the session log when it runs the session. You can
override this tracing level when you configure a session.

Can I use my existing PowerMart 4.0/PowerCenter 1.0 mappings that contain COBOL
sources?
In PowerMart 4.0 and PowerCenter 1.0, the Designer did not support REDEFINE statements in
COBOL sources or copybooks. If your COBOL sources included REDEFINE statements in older
mappings and you implemented workarounds, you do not need to edit your existing mappings or
sessions. However, if you want to improve session performance, you can either create a new
mapping or edit the existing mapping by reimporting the COBOL sources and modifying any
filter transformations or target as needed. You can then create one session to load the data from
the COBOL source.
Cannot I edit the ports in my Normalizer transformation when using a relational source?
When you create ports manually, you must do so on the Normalizer tab in the transformation, not
the Ports tab.

Source Qualifier Transformation:

When you add a relational or a flat file source definition to a mapping, you need to connect it to a
Source Qualifier transformation. The Source Qualifier represents the records that the
Informatica Server reads when it runs a session.

To join data originating from the same DB.
Filter records in the Source itself.
To specify an outer join instead of a default inner join.
To specify sorter ports.
To select distinct values from the source.
To create a custom query to issue a special select statement for the Informatica server to
read source data. For example, we might use a custom query to perform aggregate
calculations or execute a stored procedure.

Dont alter the data types in the Source Qualifier.

If we have connected multiple SQ to multiple targets, we can designate the order in which the
targets are loaded.
If we connect one SQ to multiple targets, we can enable constraint-based loading in a session to
have the IS load data based on target table primary and foreign key relationships.

The IS reads only the columns in SQ that are connected to other transformations.


Informatica Server

Informatica server moves data from source to target based on mapping and session metadata
stored in a repository. A mapping is a set of source and target definitions linked by
transformation objects that define the rules for data transformation. A session is a set of
instructions that describes how and when to move data from source to targets.

Session Process

The Informatica server uses both process memory and system shared memory to perform these
tasks. It runs as a daemon on unix and as a service on Windows NT/2000.

The load Manager process. Start the session, creates the DTM process, and sends post-
session email when the session completed.
The DTM process (Data Transformation Manager). Creates threads to initialize the
session, read, write, and transform data, and handle pre- and post-session operations.

Partitioning Data

The Informatica server can achieve high performance by partitioning source data and performing
the extract, transformation, and load for each partition in parallel.

Configure the session to partition source data.
Install the Informatica server on a machine with multiple CPUs.

When you partition a session, configure partitioning based on source qualifiers in a mapping.

For relational source, the Informatica server creates multiple database connections to a single
source and extracts a separate range of data for each connection. For XML or file sources, the
Informatica server reads multiple files concurrently.

When the Informatica Server loads relational data, it loads relational data; it creates multiple
database connections to the target and loads partitions of data concurrently. When the Informatica
server loads data to file targets, it creates a separate file for each partition.

Update strategy

When you design the data warehouse, you need to decide what type of information to store in
targets. As part of your target table design, you need to determine whether to maintain all the
historic data or just the most recent changes.

In PowerMart and Power Center, you set your update strategy at two different levels:
Within a session. When you configure a session, you can instruct the server to either
treat all records in the same way (for example, treat all records as inserts), or use
instructions coded into the session mapping to flag records for different database
operations.

Within a mapping. Within a mapping, you use the Update Strategy transformation to
flag records for insert, delete, update, or reject.

Follow these steps to define an update strategy:
1. To control how records are flagged for insert, update, delete, or reject within a mapping,
add an Update Strategy transformation to the mapping. Update Strategy transformations
are essential if you want to flag records destined for the same target for different database
operations, or if you want to reject records.
2. Define how to flag records when you configure a session. You can flag all records for
insert, delete, or update, or select the Data Driven option, where the Informatica Server
follows instructions code into Update Strategy transformations within the session
mapping.
3. Define insert, update, and delete options for each target when you configure a session. On
a target-by-target basis, you can allow or disallow inserts and deletes, and you can choose
three different ways to handle updates.

When you configure a session, you have several options for handling specific database
operations, including updates.

Specifying an operation for all rows

Setting Description
Insert Treat all records as inserts. If inserting the record violates a primary or
foreign key constraint in database, the server reject the records
Delete Treat all records as deletes. For each record, if the server finds a
corresponding record in the target table, the server deletes it.
Update Treat all records as updates. For each existing records, server updates the
record.
Data Driven The informatica server follows instructions code into update strategy
transformations within the session mapping to determine how to flag
records for insert, delete, update or reject.

Specifying operations for individual Target Tables.

Insert. Select this option to insert a row into a target table.
Delete. Select this option to delete a record from a table.
Update. You have three different options.
1. Update as update. Update each record flagged for update if exists in the target
table.
2. Update as insert. Insert each record flagged for update.
3. Update else insert. Update the record if it exist. Otherwise, insert it.
Truncate Table. Select this option to truncated the target table before loading data.

Flagging Records within a Mapping.

Following are database operation

Operation Constant Numeric Value
Insert DD_INSERT 0
Update DD_UPDATE 1

Delete DD_DELETE 2
Reject DD_REJECT 3

The server treats any other value as an insert.

Update strategy transformation is frequently the first transformation in a mapping, before data
reaches a target table. You can use the Update strategy transformation to determine how to flag
that record. Later, when you configure a session based on this transformation, you can determine
what to do with records flagged for insert, delete, or update.

Forwarding Rejected Rows

You can configure the Update strategy transformation to either pass rejected rows to the next
transformation or drop them. By default, the informatica server forwards rejected rows to the next
transformation.

Update Strategy Expression.

Frequently, the update strategy expression uses the IIF or DECODES function from the
transformation language to test each record to see if it meets a particular condition. On the
condition you can assign a numeric code to flag it for a particular database operation.

Aggregator and update strategy transformation

Position the Aggregator before the Update Strategy Transformation. In this case,
you perform the aggregate calculation, and then used the update strategy transformation to
flag records that contain the results of this calculation for insert, delete, or update.
Position the aggregator after the update strategy transformation. Here, you flag
records for insert, delete, update, or reject before you perform the aggregate calculation. How
you flag a particular record determines how the aggregator transformation treats any values in
that record used in the calculation.

This behavior has changed since PowerMart 3.5. In version 3.5, the informatica server performed
all aggregate calculations before it flagged records for insert, update, delete, or reject through a
data driven expression

1.What are Target Types on the Server?
Ans: Target Types are File, Relational and ERP

2.What are Target Options on the Servers?

Ans: Target Options for File Target type are FTP File, Loader and MQ
There are no target options for ERP target type
Target Options for Relational are Insert, Update (as Update), Update (as Insert), Update (else
Insert), Delete, and Truncate Table.

3.How do you identify existing rows of data in the target table using lookup transformation?
Ans: Can identify existing rows of data using unconnected lookup transformation.


4.What are Aggregate transformation?
Ans: Aggregator transformation allows you to perform aggregate calculations, such as averages
and sums.

5.What are various types of Aggregation?

Ans: Various types of aggregation are SUM, AVG, COUNT, MAX, MIN, FIRST, LAST,
MEDIAN, PERCENTILE, STDDEV, and VARIANCE.

6.What are Dimensions and various types of Dimensions?
A set of level properties that describe a specific aspect of a business, used for analyzing the
factual measures of one or more cubes which use that dimension. Egs. Geography, time,

7.What are 2 modes of data movement in Informatica Server?

Ans: The data movement mode depends on whether Informatica Server should process single
byte or multi-byte character data. This mode selection can affect the enforcement of code page
relationships and code page validation in the Informatica Client and Server.

c) Unicode IS allows 2 bytes for each character and uses additional byte for each non-ascii
character (such as Japanese characters)
d) ASCII IS holds all data in a single byte

The IS data movement mode can be changed in the Informatica Server configuration
parameters. This comes into effect once you restart the Informatica Server.

8.What is Code Page Compatibility?

Ans: Compatibility between code pages is used for accurate data movement when the Informatica
Sever runs in the Unicode data movement mode. If the code pages are identical, then there will
not be any data loss. One code page can be a subset or superset of another. For accurate data
movement, the target code page must be a superset of the source code page.

Superset - A code page is a superset of another code page when it contains the character
encoded in the other code page. It also contains additional characters not contained in the other
code page.

Subset - A code page is a subset of another code page when all characters in the code page are
encoded in the other code page.

9.What is Code Page used for?

Ans: Code Page is used to identify characters that might be in different languages. If
you are importing Japanese data into mapping, then u must select the Japanese code page
for the source data.


10.What is Router transformation?

Ans: Router transformation allows you to use a condition to test data. It is similar to filter
transformation. It allows the testing to be done on one or more conditions.

11. What is Load Manager ?
Ans. The load Manager is the Primary informatica Server Process. It Performs the following tasks
-
Manages session and batch scheduling.
Locks the session and read session properties.
Reads the parameter file.
Expand the server and session variables and parameters.
Verify permissions and privileges.
Validate source and target code pages.
Create the session log file.
Create the Data Transformation Manager which execute the session.

12. What is Data Transformation Manager?
Ans. After the load manager performs validations for the session, it creates the DTM process. The
DTM process is the second process associated with the session run. The primary purpose of the
DTM process is to create and manage threads that carry out the session tasks.

The DTM allocates process memory for the session and divide it into buffers. This is also known
as buffer memory. It creates the main thread, which is called the master thread. The master thread
creates and manages all other threads.

If we partition a session, the DTM creates a set of threads for each partition to allow concurrent
processing.. When Informatica server writes messages to the session log it includes thread type
and thread ID. Following are the types of threads that DTM creates:

MASTER THREAD - Main thread of the DTM process. Creates and manages all other
threads.
MAPPING THREAD - One Thread to Each Session. Fetches Session and Mapping
Information.
Pre And Post Session Thread - One Thread Each To Perform Pre And Post Session
Operations.
READER THREAD - One Thread for Each Partition for Each Source Pipeline.
WRITER THREAD - One Thread for Each Partition If Target Exist In The Source pipeline
Write To The Target.
TRANSFORMATION THREAD - One or More Transformation Thread For Each Partition.

13. WHAT IS SESSION AND BATCHES?
Ans. SESSION - A Session Is A set of instructions that tells the Informatica Server How And
When To Move Data From Sources To Targets. After creating the session, we can use either the
server manager or the command line program pmcmd to start or stop the session.


BATCHES - It Provides A Way to Group Sessions For Either Serial Or Parallel Execution By
The Informatica Server. There Are Two Types Of Batches :
3. SEQUENTIAL - Run Session One after the Other.
4. CONCURRENT - Run Session At The Same Time.

14. What is a source qualifier?
Ans: It represents all data queried from the source.

15. Why we use lookup transformations?
Ans: Lookup Transformations can access data from relational tables that are not sources in
mapping. With Lookup transformation, we can accomplish the following tasks:
d) Get a related value - Get the Employee Name from the Employee table based on the
Employee ID
e) Perform Calculation
f) Update slowly changing dimension tables - We can use unconnected lookup
transformation to determine whether the records already exist in the target or not.

Contrasting a Data Warehouse with an OLTP System
Figure 11 illustrates some of the key differences between a data warehouses model and an
OLTP systems.

Figure 11 Contrasting OLTP and Data Warehousing Environments

One major difference between the types of system is that data warehouses are not usually in third-
normal form.
Data warehouses and OLTP systems have vastly different requirements. Here are some examples
of the notable differences between typical data warehouses and OLTP systems:
Workload
Data warehouses are designed to accommodate ad hoc queries. The workload of a data
warehouse may not be completely understood in advance, and the data warehouse is
optimized to perform well for a wide variety of possible query operations.

OLTP systems support only predefined operations. The application may be specifically tuned
or designed to support only these operations.
Data Modifications
The data in a data warehouse is updated on a regular basis by the ETT process (often, every
night or every week) using bulk data-modification techniques. The end users of a data
warehouse do not directly update the data warehouse. In an OLTP system, end users routinely
issue individual data-modification statements in the database. The OLTP database is always
up-to-date, and reflects the current state of each business transaction.
Schema Design
Data warehouses often use denormalized or partially denormalized schemas (such as a star
schema) to optimize query performance. OLTP systems often use fully normalized schemas
to optimize update/insert/delete performance, and guarantee data consistency.
Typical Operations
A typical data warehouse query may scan thousands or millions of rows. For example, "Find
the total sales for all customers last month." A typical OLTP operation may access only a
handful of records. For example, "Retrieve the current order for a given customer."
Historical Data
Data warehouses usually store many months or years of historical data. This is to support
historical analysis of business data. OLTP systems usually store only a few weeks' or months'
worth of data. The OLTP system only stores as much historical data as is necessary to
successfully meet the current transactional requirements.
Typical Data Warehouse Architectures
As you might expect, data warehouses and their architectures can vary depending upon the
specifics of each organization's situation. Figure 12 shows the most basic architecture for a data
warehouse. In it, a data warehouse is fed from one or more source systems, and end users directly
access the data warehouse.

Figure 12 Typical Architecture for a Data Warehouse


Figure 13 illustrates a more complex data warehouse environment. In addition to a central
database, there is a staging system used to cleanse and integrate data, as well as multiple data
marts, which are systems designed for a particular line of business.



What are confirmed dimensions
Confirmed dimensions are linked to multiple fact tables

D1 D2 D1 D2 D5

FT1 FT2 FT3

D3 D4 D3

Data modeling tool :- Erwin by computer associates

Reverse engineering :- Modifying existing tables
Forwarding :- creating tables ( u create the sql scripts and execute)

Explain the Difference between OLAP, MOLAP,ROLAP ,HOLAP
OLTP (online transaction processing system) stores day to day transactions
DW stores historical Data. OLTP data is in normalized form, multiple tables joins where u run a
query, complex joins takes a lot of time
DW is a denormalized one or more fact tables each connected to multiple dimension tables (Star
Schema)
Snow Flake Schema:- When u have multiple lookup tables, u can go for snowflake schema
(Partial Normalized ) when u have large volume of data in dimension tables
Ex:- When u have a country table to be linked to both customers, resot u can create a lkp country
table so when u make any changes to ctry table u dont need to change in multiple dimension
tables

ROLAP:- Star, Snow Flake Schema
MOLAP:-Multidimensional data is stored in hirarechical way (Customer-> Region->Country-
>Continent) Data is stored in the form of cubes . MOLAP is very expensive and complex design
When u need results very very quickly response time is more in MOLAP
HOLAP also called as Hybrid OLAP combination of both ROLAP and MOLAP
What is ZABO(Zero based administration)

What is AIS (APPlication Integration Software)

Name Some Cleansing Tools
Informatica can be used as a cleansing tool, Avinisha ,Data stage

What is Slicing and dicing

Explain linking universes

To link the universes u must have exported to where the kernel universe at lease once otherwise
the designer does not allow the link


Data Warehousing

What is Data warehousing
A DW is a DB used for query,analysis and reporting . By definition DW is a subject oriented,
intergrated, non volatile and time variant
Subject Oriented:- Represents a subject Aread like sales, Mktg
Integrated :- Data Colleted from multiple source systems integrated into a user readable unique
format
Ex:- male, female ,0,1, M,F, T, F
Non Volatile :- Dw stores historical data
Time Variant :- Stores data timewise like weekly,monthly,quarterly, yearly

Oracle
Query Optimization (Performance tuning SQL)
Oracle currently (as of Oracle9i Release 2) provides 2 different optimization approaches. These
are Cost Based Optimization (CBO) and Rule Based Optimization (RBO). The CBO bases
optimization choices on pre-gathered table and index statistics while the RBO makes it's
decisions based on a set of 'best practice' rules and does not rely on any statistical information.
CBO's reliance on statistics makes it vastly more flexible than the RBO since as long as up to
date statistics are maintained it will accurately reflect real data volumes. The RBO is Oracle's
legacy optimizer and is to be desupported in Oracle10i.

What is dynamic caching
One Important feature of Informatica 6.2 is dynamic lookup cache
dynamic means.......for example if you are using lookup transformation and lookup cache for a
particular table
so this lookupcache will read the table at a time and will keep in the memory in the form of
lookup cache, but this is generally under assumption you the lookup table is not getting changed
during that time

let's if the lookup table values getting changed during that time this lookup cache will not hold
most recent values While creating aggregator transformation have u enabled sorted input option
what exactly it does
Explain the process

so instead of standard lookup cache you use dynamic cache in that case

What is the size of ur repository ?
10 GB
How many source tables u have
30 to 40 (Target database 60 GB)
How many mapping u have
20
How many sessions
Batches :- 4-5 nested
Sessions :- 20-25
Individual :- depencies I put in sequential (ex:- first one session, next session ,next a batch)

How do u sechedule sessions
On Demand (Manual)
Time based
Event based
How do u handle session failure
I run from recovery mode, PMCMD run (if error occurs session stops, wherever it is stopped it
continues)

Dashboards:- GE From one prj how much revenue generated , how well u did a process
(Graphical representation) Bo Reporting (Use graphs)

What does sorted input option in aggregator transformation does

Improves session performance but the data has to be passed into aggregator transformation must
be sorted before (if use sorted input option and data is not sorted before session fails)

When u pass presorted data ex:- 101 if
4 records whenever it finds new no 102 it performs the calculation

When the sorted input is not selected the Informatica server performs aggregate calculations as it
reads the Informatica server stores data for each group until it reads the entire source to ensure all
aggregate calculations are accurate
When sorted input option is selected Informatica assumes all data is sorted by group as it reads
rows for a group performs aggregate calculations as it reads

How did u do increment load
What is Click Stream In Data Warehousing

Click stream is basically web based data warehousing analysis basically e-web intelligence

When to index a particular column on what percentage
Generally if you are creating ordinary b-tree index.........if you know that that columns will be
mostly reference in select statement
if your select statement is retrieving more than 30 % rows then...it's better not to use index
index will be used out of the table when you are retrieving smaller no. of rows
if you retrieving more no. of rows. you can read the table directly rather than using the index(Full
table scan)

STAR QUERY transformer generally used to improve the query performance in generally star
schema models
The star transformation is a cost-based query transformation aimed at executing
star queries efficiently. Whereas the star optimization works well for schemas with a
small number of dimensions and dense fact tables, the star transformation may be
considered as an alternative if any of the following holds true:
In order to get the best possible performance for star queries, it is important to
follow some basic guidelines:
n A bitmap index should be built on each of the foreign-key columns of the fact
table(s).
n The initialization parameter STAR_TRANSFORMATION_ENABLED should be

set to TRUE. This enables an important optimizer feature for star-queries; it is
set to FALSE by default for backwards-compatibility.
n The cost-based optimizer should be used. [This does not apply solely to star
schemas: all data warehouses should always use the cost-based optimizer].

If u have data coming from diff. Sources what transformation will u use in your designer?

What is a reusable transf.. What is a mapplet . Explain diff. Bet them

Reusable tranformation:- if u want to create tranfor that perform common tasks such as avg
sal in a dept
Mapplet:- Is a reusuable object that represents a set of transformations

How many transformations are there.
Standard and reusable
Standard:- Aggregator, expression, external procedure, advanced external procedure,
filter, joiner, lookup, normalizer, sequence, source qualifer, stored procedure, update
strategy
Can u stop a session in a concurrent batch?
If the session u want to stop is part of a batch u must stop the batch to stop the session. If the
session is part of a batch that is nested in a series of batches u must stop the outermost batch
to stop the session
What happens when u use the delete or update or reject or insert statement in ur update
strategy?
Inserts:- treats all records as inserts , while inserting if the record violates primary,
foreign key or foreign key in the database it rejects the record
What are variables and where do u use it?
Use variable to simplify complex expressions, temporarily store date, store values from prior
rows, compare values, capture multiple return values from a stored procedure, store the
results of an unconnected lookup
What is conn. And unconn. Lookup?
A connected lookup transformation is part of the mapping flow,. With connected lookup
values u can have multiple return values, Support Default values
Unconnected lookup transformation exists separate from the data flow in the mapping
U write an expr using :lkp reference qualifier to call the lookup within another
transformation
Uses:- testing the results of a lookup in a transformation
Filtering records based on the lookup results
Calling the same lookup multiple times in one mapping
To update slowly changing dimensions, Does not Support Default values
Can u access a repository created in previous version of Informatica?

What are the diff.ports available?
Input Port :- Which receives data (Target)
Output Port:- Which provides data (Source)
Input/Output Port:- Which pass data through them (Mapplets)
Variable Port :- Used to store components of expressions

What happens if the info. Server doesnt find the session parameter in the parameter
file?

How do run session from command line?
pmcmd
What are diff. Things u can do using PMCMD ?
U can stop or abort a session

What is pmrep command

Have u created parallel sessions How do u create parallel sessions ?

U can improve performace by creating a concurrent batch to run several sessions in parallel
on one informatic server, if u have several independent sessions using separate sources and
separate mapping to populate diff targets u can place them in a concurrent batch and run them at
the same time , if u have a complex mapping with multiple sources u can separate the mapping
into several simpler mappings with separate sources. Similarly if u have session performing a
minimal no of transformations on large amounts of data like moving flat files to staging area, u
can separate the session into multiple sessions and run them concurrently in a batch cutting the
total run time dramatically

What is dynamic insert

What are diff types of cache diff between persistent and static cache

Static cache :- Read only will be deleted after use
Persistent cache :- Reused saved on the server

What are session parameters ? How do u set them ?

When do u use mapping parameters ? (In which transformations)

What is a parameter When and where do u them when does the value will be created
Desing time, run time . If u dont create parameter what will happen

How do you do error handling in Informatica ?
Error handling is very primitive.
Log files can be generated which contain error details and code.
The error code can be checked from troubleshooting guide and corrective action
taken.The log file can be increased by giving appropriate tracing level in the session
properties. Also we can give that one Session can stop after 1,2 or n number of errors.
How do you implement configuration management in Informatica?
There are several methods to do this .Some of them are :-
Taking a back up of the repository as a binary file and treat it as a configurable item.
Implement Folder Versioning utility in Informatica.
Scenario :-. A mapping contains Source Table S_Time ( Start_Year, End_Year )
Target Table Time_Dim ( Date, Day, Month, Year, Quarter )

Stored procedure transformation : A procedure has two input parameters I_Start_Year,
I_End_Year and output parameter as O_Date, Day , Month, Year, Quarter. If this session is
running, how many rows will be available in the target and why ?.
Only one row the last date of the End_Year.
All the subsequent rows are overriding the previous rows.

What is the difference between lookup cache and lookup index.
Look up Cache contains Index cache and data cache
Index cache:Contains columns used in condition
Data cache: :Contains other output columns than the condition columns.
Discuss two approaches for updation of target table in informatica and how they are
different.
Update strategy transformation: We can write our own code .It is flexible.
Normal insert / update /delete (with proper variation of the update option) :
It can be configured in the Session properties.
Any change in the row will cause an update.Inflexible.
How do you handle performance issues in Informatica.Where can you monitor the
performance
There are several aspects to the performance handling .Some of them are :-
Source tuning
Target tuning
Repository tuning
Session performance tuning
Incremental Change identification in source side.
Software , hardware(Use multiple servers) and network tuning.
Bulk Loading
Use the appropriate transformation.
To monitor this
Set performance detail criteria
Enable performance monitoring
Monitor session at runtime &/ or Check the performance monitor file .
What is a suggested method for validating fields / marking them with errors?.
One of the successful methods is to create an expression object, which contains
variables.> One variable per port that is to be checked.> Set the error flag for that field, then at
the bottom of the expression trap each of the error fields.> From this port you can choose to set
flags based on each individual error which occurred, or feed them out as a combination of
concatenated field names to be inserted in to the database as an error row in an error tracking
table.
Where is the cache (lookup, index) created and how can you see it.
The cache is created in the server.Some default memory is allocated for it.
Once that memory is exceeded than these files can be seen in the Cache directory in the Sever,
not before that.
When do you use SQL override in Look up Transformation.
Use SQl override when
you have more than one look up table
To use where condition to reduce records in cache.
Explain how "constraint based load ordering" works?.
Constraint based load ordering in PowerMart / PowerCenter works like this: it controls
the order in which the target tables are committed to a relational database. It is of no use
when sending information to a flat file. To construct the proper constraint order: links

between the TARGET tables in Informatica need to be constructed. Simply turning on
"constraint based load ordering" has no effect on the operation itself. Informatica does
NOT read constraints from the database when this switch is turned on. Again, to take
advantage of this switch, you must construct primary / foreign key relationships in the
TARGET TABLES in the designer of Informatica. Creating primary / foreign key
relationships is difficult - you are only allowed to link a single port (field) to a single
table as a primary / foreign key.
What is the difference between Power mart and Power Centre.
Power Center - has all the functionality .
distributed metadata(repository).
global repository and can register multiple Informatica servers. One can share metadata
across repositories.
Can connect to Varied sources like Peoplesoft,SAP etc.
Has bridges which can transport meta data from opther tools (like Erwin)
Cost around 200K US $.

Power Mart Subset of Power centre.
One repository and can register only one Informatica server.
Cannot connect to Varied sources like Peoplesoft,SAP etc
Cost around 50K US $.
What is the difference between Oracle Sequence and Informatica Sequence and which is
better.
Oracle sequence can be used in a Pl/Sql stored procedure, which in turn can be used with
stored procedure transformation of Informatica.
Informatica sequence is generated through sequence generator transformation of Informatica.
It depends upon the user needs but Oracle sequence provides greater control.

How do you execute a set of Sql commands before running as session and after completion
of session in Informatica.Explain.
Sql commands can be put in stored procedures.
Two Unconnected Stored procedure Transformations are created pointing to respective
procedures one pre session ,other post session.
When the Session is run these two procedures are executed before the session and after the
session.
How can you utilize COM components in Informatica.
By writing C+,VB,VC++ code in External Stored Procedure Transformation

What is an indicator file and how it can be used.
Indicator file is used for Event Based Scheduling when you dont know when the Source Data is
availaible., A shell command ,script or a batch file creates and send this indicator file to the
directory local to the Informatica Server.Server waits for the indicator file to appear before
running the session.


What persistent cache? When it should be used.
When Lookup cache is saved in Look up Transformation It is called persistent cache.
The first time session runs it is saved on the disk and utilized in subsequent running of the
Session.
It is used when the look up table is Static i.e doesnt change frequently
What is Incremental Aggregation and how it should be used .
If the source changes only incrementally and you can capture changes, you can configure the
session to process only those changes. This allows the Informatica Server to update your target
incrementally, rather than forcing it to process the entire source and recalculate the same
calculations each time you run the session. Therefore, only use incremental aggregation if:
Your mapping includes an aggregate function.
The source changes only incrementally.
You can capture incremental changes. You might do this by filtering source data by
timestamp.
Before implementing incremental aggregation, consider the following issues:
Whether it is appropriate for the session
What to do before enabling incremental aggregation
When to reinitialize the aggregate caches

Scenario :-Informatica Server and Client are in different machines. You run a session from the
server manager by specifying the source and target databases. It displays an error. You are
confident that everything is correct. Then why it is displaying the error?

The connect strings for source and target databases are not configured on the Workstation
conatining the server though they may be on the client m/c.
Informatica

Duration : 1 Hr. Max. Marks : 100

1. Where exactly the sources and target information stored ? (2)
2. Informatica Repository Tables
3. What is the difference between power mart and power centre. Elaborate. (2)
4. Power center is a global repository
5. What are variable ports and list two situations when they can be used? (2)

6. What are the parts of Informatica Server? (2)
7. How does the server recognise the source and target databases.
Elaborate on this. (2)
8. List the transformation used for the following: (10)
a) Heterogeneous Sources
b) Homogeneous Sources
c) Find the 5 highest paid employees within a dept.
d) Create a Summary table
e) Generate surrogate keys
9. What is the difference between sequential batch and concurrent batch and which is
recommended and why ? (2)
10. Designer is used for ____________________ (1)
11. Repository Manager is used for __________________ (1)
12. Server Manager is used for ______________________ (1)
13. Server is used for _____________________________ (1)

14. A session S_MAP1 is in Repository A. While running the session error message has
displayed
server hot-ws270 is connect to Repository B . What does it mean ? (2)
15. How do you do error handling in Informatica ? (2)
16. How do you implement scheduling in Informatica ? (2)
17. What is the meaning of upgradation of repository ? (2)
18. How can you run a session without using server manager ? (2)
19. What is indicator file and where it is used ? (2)
20. What are pre and post session stored procedures ? Write a suitable example. (2)
21. Consider two cases :
1. Power Center Server and Client on the same machine
2. Power Center Sever and Client on the different machines
What is the basic difference in these two setups and which is recommended? (2)
22. Informatica Server and Client are in different machines. You run a session from the server
manager by specifying the source and target databases. It displays an error. You are confident
that everything is correct. Then why it is displaying the error? (2)
23. When you connect to repository for the first time it asks you for user name & password of
repository and database both.
But subsequent times it asks only repository password. Why ? (2)
24. What is the difference between normal and bulk loading.
Which one is recommended? (2)
25. What is a test load ? (2)
26. What is an incremental aggregation and when it should be implemented? (2)
27. How can you use an Oracle sequences in Informatica? You have an Informatica sequence
generator transformation also. Which one is better to use? (2)
28. What is the difference between a shortcut of an object and copy of an object?
Compare them. (2)
29. What is mapplet and a reusable transformation? (2)
30. How do you implement configuration management in Informatica? (3)
31. What are Business Components in Informatica? (2)
32. Dimension Object created in Oracle can be imported in Designer ( T/ F) (1)
33. Cubes contain measures ( T / F ) (1)
34. COM components can be used in Informatica ( T / F ) (1)
35. Lookup is an Active Transformation (T/F) (1)
36. What is the advantage of persistent cache? When it should be used. (1)
37. When will you use SQL override in a lookup transformation? (2)
38. Two different admin users created for repository are ______ and_______ (1)
39. Two Default User groups created in the repository are ____ and ______ (1)
40. A mapping contains
Source Table S_Time ( Start_Year, End_Year )
Target Table Tim_Dim ( Date, Day, Month, Year, Quarter )
Stored procedure transformation : A procedure has two input parameters I_Start_Year,
I_End_Year and output parameter as O_Date, Day , Month, Year, Quarter. If this session is
running, how many rows will be available in the target and why ? (5)
39. Two Sources S1, S2 containing measures M1,M2,M3, 4 Dimensions D1,D2,D3,D4,
1 Fact F1 containing measures M1, M2,M3 and Dimension Surrogate keys K1,K2,K3,K4
(a) Write a SQL statement to populate Fact table F1
(b) Design a mapping in Informatica for loading of Fact table F1. (5)
40. What is the difference between connected lookup and unconnected lookup. (2)
41. What is the difference between datacahe and lindex cahe. (2)
42. When should one create a lookup transformation? (2)

43. How do you handle performance issues in Informatica.Where can you monitor the
performance ? (3)
44. List and Discuss two approaches for updation of target table in informatica and
how they are different. (3)
45. You have created a lookup tansformation for a certain condition which if true
returns multiple rows .When you go to the target and see only one row has come
and not all. Why is it so and how it can be corrected. (2)
46. Where are the log files generally stored. Can you change the path of the file.
What can the path be? (2)
47. Where is the cache (lookup, index) created and how can you see it. (2)

How do I ?
Q: How do I connect job streams/sessions or batches across folders? (30 October 2000)
For quite a while there's been a deceptive problem with sessions in the Informatica
repository. For management and maintenance reasons, we've always wanted to separate
mappings, sources, targets, in to subject areas or functional areas of the business. This
makes sense until we try to run the entire Informatica job stream. Understanding of
course that only the folder in which the map has been defined can house the session. This
makes it difficult to run jobs / sessions across folders - particularly when there are
necessary job dependancies which must be defined. The purpose of this article is to
introduce an alternative solution to this problem. It requires the use of shortcuts.
The basics are like this: Keep the map creations, sources, and targets subject
oriented. This allows maintenance to be easier (by subect area). Then once the
maps are done, change the folders to allow shortcuts (done from the repository
manager). Create a folder called: "MY_JOBS" or something like that. Go in to
designer, open "MY_JOBS", expand the source folders, and create shortcuts to
the mappings in the source folders.
Go to the session manager, and create sessions for each of the short-cut mappings
in MY_JOBS. Then batch them as you see fit. This will allow a single folder for
running jobs and sessions housed anywhere in any folder across your repository.
Q: How do I get maximum speed out of my database connection? (12 September 2000)
In Sybase or MS-SQL Server, go to the Database Connection in the Server Manager.
Increase the packet size. Recommended sizing depends on distance traveled from
PMServer to Database - 20k Is usually acceptable on the same subnet. Also, have the
DBA increase the "maximum allowed" packet size setting on the Database itself.
Following this change, the DBA will need to restart the DBMS. Changing the Packet

Size doesn't mean all connections will connect at this size, it just means that anyone
specifying a larger packet size for their connection may be able to use it. It should
increase speed, and decrease network traffic. Default IP Packets are between 1200 bytes
and 1500 bytes.
In Oracle: there are two methods. For connection to a local database, setup the
protocol as IPC (between PMServer and a DBMS Server that are hosted on the
same machine). IPC is not a protocol that can be utilized across networks
(apparently). IPC stands for Inter Process Communication, and utilizes memory
piping (RAM) instead of client context, through the IP listner. For remote
connections there is a better way: Listner.ORA and TNSNames.ORA need to be
modified to include SDU and TDU settings. SDU = Service Layer Data Buffer,
and TDU = Transport Layer Data Buffer. Both of which specify packet sizing in
Oracle connections over IP. Default for Oracle is 1500 bytes. Also note: these
settings can be used in IPC connections as well, to control the IPC Buffer sizes
passed between two local programs (PMServer and Oracle Server)
Both the Server and the Client need to be modified. The server will allow
packets up to the max size set - but unless the client specifies a larger packet size,
the server will default to the smallest setting (1500 bytes). Both SDU and TDU
should be set the same. See the example below:
TNSNAMES.ORA
LOC=(DESCRIPTION= (SDU = 20480) (TDU=20480)
LISTENER.ORA
LISTENER=....(SID_DESC= (SDU = 20480) (TDU=20480) (SID_NAME =
beqlocal) ....
Q: How do I get a Sequence Generator to "pick up" where another "left off"? (8 June 2000)
To perform this mighty trick, one can use an unconnected lookup on the Sequence ID of
the target table. Set the properties to "LAST VALUE", input port is an ID. the condition
is: SEQ_ID >= input_ID. Then in an expression set up a variable port: connect a NEW
self-resetting sequence generator to a new input port in the expression. The variable
port's expression should read: IIF( v_seq = 0 OR ISNULL(v_seq) = true,
:LKP.lkp_sequence(1), v_seq). Then, set up an output port. Change the output port's
expression to read: v_seq + input_seq (from the resetting sequence generator). Thus you
have just completed an "append" without a break in sequence numbers.
Q: How do I query the repository to see which sessions are set in TEST MODE? (8 June 2000)
Run the following select:
select * from opb_load_session where bit_option = 13;
It's actually BIT # 2 in this bit_option setting, so if you have a mask, or a bit-level
function you can then AND it with a mask of 2, if this is greater than zero, it's been set
for test load.



Q: How do I "validate" all my mappings at once? (31 March 2000)
Issue the following command WITH CARE.
UPDATE OPB_MAPPING SET IS_VALID = 1;
Then disconnect from the database, and re-connect. In session manager, and
designer as well.
Q: How do I validate my entire repository? (12 September 2000)
To add the menu option, change this registry entry on your client.
HKEY_CURRENT_USER/Software/Informatica/PowerMart Client
Tools/4.7/Repository Manager Options
add the following string Name: EnableCheckReposit Data: 1
Validate Repository forces Informatica to run through the repository, and check
the repo for errors
Q: How do I work around a bug in 4.7? I can't change the execution order of my stored
procedures that I've imported? (31 March 2000)
Issue the following statements WITH CARE:
select widget_id from OPB_WIDGET where WIDGET_NAME = <widget name>
(write down the WIDGET ID)
select * from OPB_WIDGET_ATTR where WIDGET_ID = <widget_id>
update OPB_WIDGET_ATTR set attr_value = <execution order> where
WIDGET_ID = <widget_id> and attr_id = 5
COMMIT;
The <execution order> is the number of the order in which you want the stored
proc to execute. Again, disconnect from both designer and session manager
repositories, and re-connect to "re-read" the local cache.
Q: How do I keep the session manager from "Quitting" when I try to open a session? (23
March 2000)
Informatica Tech Support has said: if you are using a flat file as a source, and your "file
name" in the "Source Options" dialog is longer than 80 characters, it will "kill" the
Session Manager tool when you try to re-open it. You can fix the session by: logging in
to the repository via SQLPLUS, or ISQL, and finding the table called:
OPB_LOAD_SESSION, find the Session ID associated with the session name - write it
down. Then select FNAME from OPB_LOAD_FILES where Session_ID =

<session_id>. Change / update OPB_LOAD_FILES set FNAME= <new file name>
column, change the length back to less than 80 characters, and commit the changes. Now
the session has been repaired. Try to keep the directory to that source file in the
DIRECTORY entry box above the file name box. Try to keep all the source files
together in the same source directory if possible.
Q: How do I repair a "damaged" repository? (16 March 2000)
There really isn't a good repair tool, nor is there a "great" method for repairing the
repository. However, I have some suggestions which might help. If you're running in to
a session which causes the session manager to "quit" on you when you try to open it, or
you have a map that appears to have "bad sources", there may be something you can do.
There are varying degrees of damage to the repository - mostly caused because the
sequence generator that PM/PC relies on is buried in a table in the repository - and they
generate their own sequence numbers. If this table becomes "corrupted" or generates the
wrong sequences, you can get repository errors all over the place. It can spread quickly.
Try the following steps to repair a repository: (USE AT YOUR OWN RISK) The
recommended path is to backup the repository, send it to Technical Support - and tell
them it's damaged.
1. Delete the session, disconnect, re-connect, then re-create the session, then attempt to edit
the new session again. If the new session won't open up (srvr mgr quits), then there are
more problems - PM/PC is not successfully attaching sources and targets to the session
(SEE: OPB_LOAD_SESSION table (SRC_ID, TARGET_ID) columns - they will be
zero, when they should contain an ID.
2. Delete the session, then open the map. Delete the source and targets from the MAP.
Save the map and invalidate it - forcing an update to the repository and it's links. Drag
the sources and targets back in to the map and re-connect them. Validate and Save. Then
try re-building the session (back to step one). If there is still a failure, then there are more
problems.
3. Delete the session and the map entirely. Save the repository changes - thus requesting a
delete in the repository. While the "delete" may occur - some of the tables in the
repository may not be "cleansed". There may still be some sources, targets, and
transformation objects (reusable) left in the repository. Rebuild the map from scratch -
then save it again... This will create a new MAP ID in the OPB_MAPPING table, and
force PM/PC to create new ID links to existing Source and Target objects (as well as all
the other objects in the map).
4. If that didn't work - you may have to delete the sources, reusable objects, and targets, as
well as the session and the map. Then save the repository - again, trying to "remove" the
objects from the repository itself. Then re-create them. This forces PM/PC to assign new
ID's to ALL the objects in the map, the map, and the session - hopefully creating a
"good" picture of all that was rebuilt.
Or try this method:
1. Create a NEW repository -> call it REPO_A (for reference only).
2. Copy any of the MAPPINGS that don't have "problems" opening in their respective
sessions, and copy the mappings (using designer) from the old repository (REPO_B) to
the new repository (REPO_A). This will create NEW ID's for all the mappings,
CAUTION: You will lose your sessions.

3. DELETE the old repository (REPO_B).
4. Create a new repository in the OLD Repository Space (REPO_B)..
5. Copy the maps back in to the original repository (Recreated Repository) From REPO_A
to REPO_B.
6. Rebuild the sessions, then re-create all of the objects you originally had trouble with.
You can apply this to FOLDER level and Repository Manager Copying, but you need to
make sure that none of the objects within a folder have any problems.
What this does: creates new ID's, resets the sequence generator, re-establishes all the
links to the objects in the tables, and drop's out (by process of elimination) any objects
you've got problems with.
Bottom line: PM/PC client tools have trouble when the links between ID's get broken.
It's fairly rare that this occurs, but when it does - it can cause heartburn.
Q: How do I clear the locks that are left in the repository? (3 March 2000)
Clearing locks is typically a task for the repository manager. Generally it's done from
within the Repository Manager: Edit Menu -> Show Locks. Select the locks, then press
"remove". Typically locks are left on objects when a client is rebooted without properly
exiting Informatica. These locks can keep others from editing the objects. They can also
keep scheduled executions from occurring. It's not uncommon to want to clear the locks
automatically - on a prescheduled time table, or at a specified time. This can be done
safely only if no-one has an object out for editing at the time of deletion of the lock. The
suggested method is to log in to the database from an automated script, and issue a
"delete from OPB_OBJECT_LOCKS" table.
Q: How do I turn on the option for Check Repository? (3 March 2000)
According to Technical Support, it's only available by adjusting the registry entries on the
client. PM/PC need to be told it's in Admin mode to work. Below are the steps to turn on
the Administration Mode on the client. Be aware - this may be a security risk, anyone
using that terminal will have access to these features.

1) start repository manager
2) repository menu go to check repository
3) if the option is not there you need to edit your registry using regedit
go to: HKEY_CURRENT_USER>>SOFTWARE>>INFORMATICA>>PowerMart
Client Tools>>Repository Manager Options
go to your specific version 4.5 or 4.6 and then go to Repository Manager. In
there add two strings:
1) EnableAdminMode 1
2) EnableCheckReposit 1
both should be spelled as shown the value for both is 1
Q: How do I generate an Audit Trail for my repository (ORACLE / Sybase) ?
Download one of two *USE AT YOUR OWN RISK* zip files. The first is available
now for PowerMart 4.6.x and PowerCenter 1.6x. It's a 7k zip file: Informatica Audit Trail
v0.1a The other file (for 4.5.x is coming...). Please note: this is FREE software that
plugs in to ORACLE 7x, and ORACLE 8x, and Oracle 8i. It has NOT been built for

Sybase, Informix, or DB2. If someone would care to adapt it, and send it back to me, I'll
be happy to post these also. It has limited support - has not been fully tested in a
multi-user environment, any feedback would be appreciated. NOTE: SYBASE
VERSION IS ON IT'S WAY.
Q: How do I "tune" a repository? My repository is slowing down after a lot of use, how can I
make it faster?
In Oracle: Schedule a nightly job to ANALYZE TABLE for ALL INDEXES, creating
histograms for the tables - keep the cost based optimizer up to date with the statistics. In
SYBASE: schedule a nightly job to UPDATE STATISTICS against the tables and
indexes. In Informix, DB2, and RDB, see your owners manuals about maintaining SQL
query optimizer statistics.
Q: How do I achieve "best performance" from the Informatica tool set?
By balancing what Informatica is good at with what the databases are built for. There are
reasons for placing some code at the database level - particularly views, and staging
tables for data. Informatica is extremely good at reading/writing and manipulating data at
very high rates of throughput. However - to achieve optimum performance (in the
Gigabyte to Terabyte range) there needs to be a balance of Tuning in Oracle, utilizing
staging tables, views for joining source to target data, and throughput of manipulation in
Informatica. For instance: Informatica will never achieve the speeds of "append" or
straight inserts that Oracle SQL*Loader, or Sybase BCP achieve. This is because these
two tools are written internally - specifically for the purposes of loading data (direct to
tables / disk structures). The API that Oracle / Sybase provide Informatica with is not
nearly as equipped to allow this kind of direct access (to eliminate breakage when
Oracle/Sybase upgrade internally). The basics of Informatica are: 1) Keep maps as
simple as possible 2) break complexity up in to multiple maps if possible 3) rule of
thumb: one MAP per TARGET table 4) Use staging tables for LARGE sets of data 5)
utilize SQL for it's power of sorts, aggregations, parallel queries, temp spaces, etc...
(setup views in the database, tune indexes on staging tables) 6) Tune the database -
partition tables, move them to physical disk areas, etc... separate the logic.
Q: How do I get an Oracle Sequence Generator to operate faster?
The first item is: use a function to call it, not a stored procedure. Then, make sure the
sequence generator and the function are local to the SOURCE or TARGET database, DO
NOT use synonyms to place either the sequence or function in a remote instance
(synonyms to a separate schema/database on the same instance may be only a slight
performance hit). This should help - possibly double the throughput of generating
sequences in your map. The other item is: see slide presentations on performance tuning
for your sessions / maps for a "best" way to utilize an Oracle sequence generator. Believe
it or not - the write throughput shown in the session manager per target table is directly
affected by calling an external function/procedure which is generating sequence
numbers. It does NOT appear to affect the read throughput numbers. This is a difficult
problem to solve when you have low "write throughput" on any or all of your targets.
Start with the sequence number generator (if you can), and try to optimize the map for
this.

Q: I have a mapping that runs for hours, but it's not doing that much. It takes 5 input tables,
uses 3 joiner transformations, a few lookups, a couple expressions and a filter before writing to
the target. We're running PowerMart 4.6 on an NT 4 box. What tuning options do I have?
Without knowing the complete environment, it's difficult to say what the problem is, but
here's a few solutions with which you can experiment. If the NT box is not dedicated to
PowerMart (PM) during its operation, identify what it contends with and try rescheduling
things such that PM runs alone. PM needs all the resources it can get. If it's a dedicated
box, it's a well known fact that PM consumes resources at a rapid clip, so if you have
room for more memory, get it, particularly since you mentioned use of the joiner
transformation. Also toy with the caching parameters, but remember that each joiner
grabs the full complement of memory that you allocate. So if you give it 50Mb, the 3
joiners will really want 150Mb. You can also try breaking up the session into parallel
sessions and put them into a batch, but again, you'll have to manage memory carefully
because of the joiners. Parallel sessions is a good option if you have a multiple-processor
CPU, so if you have vacant CPU slots, consider adding more CPU's. If a lookup table is
relatively big (more than a few thousand rows), try turning the cache flag off in the
session and see what happens. So if you're trying to look up a "transaction ID" or
something similar out of a few million rows, don't load the table into memory. Just look it
up, but be sure the table has appropriate indexes. And last, if the sources live on a pretty
powerful box, consider creating a view on the source system that essentially does the
same thing as the joiner transformations and possibly some of the lookups. Take
advantage of the source system's hardware to do a lot of the work before handing down
the result to the resource constrained NT box.
Q: Is there a "best way" to load tables?
Yes - If all that is occurring is inserts (to a single target table) - then the BEST method of
loading that target is to configure and utilize the bulk loading tools. For Sybase it's BCP,
for Oracle it's SQL*Loader. With multiple targets, break the maps apart (see slides), one
for INSERTS only, and remove the update strategies from the insert only maps (along
with unnecessary lookups) - then watch the throughput fly. We've achieved 400+ rows
per second per table in to 5 target Oracle tables (Sun Sparc E4500, 4 CPU's, Raid 5, 2
GIG RAM, Oracle 8.1.5) without using SQL*Loader. On an NT 366 mhz P3, 128 MB
RAM, single disk, single target table, using SQL*Loader we've loaded 1 million rows
(150 MB) in 9 minutes total - all the map had was one expression to left and right trim
the ports (12 ports, each row was 150 bytes in length). 3 minutes for SQL*Loader to
load the flat file - DIRECT, Non-Recoverable.
Q: How do I guage that the performance of my map is acceptable?
If you have a small file (under 6MB) and you have pmserver on a Sun Sparc 4000,
Solaris 5.6, 2 cpu's, 2 gigs RAM, (baseline configuration - if your's is similar you'll be
ok). For NT: 450 MHZ PII 128 MB RAM (under 3 MB file size), then it's nothing to
worry about unless your write throughput is sitting at 1 to 5 rows per second. If you are
in this range, then your map is too complex, or your tables have not been optimized. On
a baseline defined machine (as stated above), expected read throughput will vary -
depending on the source, write throughput for relational tables (tables in the database)
should be upwards of 150 to 450+ rows per second. To calculate the total write
throughput, add all of the rows per second for each target together, run the map several

times, and average the throughput. If your map is running "slow" by these standards,
then see the slide presentations to implement a different methodology for tuning. The
suggestion here is: break the map up - 1 map per target table, place common logic in to
maplets.
Q: How do I create a state variable?
Create a variable port in an expression (v_MYVAR), set the data type to Integer (for this
example), set the expression to: IIF( ( ISNULL(v_MYVAR) = true or v_MYVAR = 0 ) [
and <your condition> ], 1, v_MYVAR).> What happens here, is that upon initialization
Informatica may set the v_MYVAR to NULL, or zero.> The first time this code is
executed it is set to 1.> Of course you can set the variable to any value you wish
and carry that through the transformations.> Also you can add your own AND
condition (as indicated in italics), and only set the variable when a specific condition has
been met.> The variable port will hold its value for the rest of the transformations.>
This is a good technique to use for lookup values when a single lookup value is necessary
based on a condition being met (such as a key for an unknown value).> You can
change the data type to character, and use the same examination simply remove the or
v_MYVAR = 0 from the expression character values will be first set to NULL.
Q: How do I pass a variable in to a session?
There is no direct method of passing variables in to maps or sessions.> In order to get a
map/session to respond to data driven (variables) a data source must be provided.> If
working with flat files it can be another flat file, if working with relational data sources
it can be with another relational table.> Typically a relational table works best, because
SQL joins can then be employed to filter the data sets, additional maps and source
qualifiers can utilize the data to modify or alter the parameters during run-time.
Q: How can I create one map, one session, and utilize multiple source files of the same
format?
In UNIX its very easy: create a link to the source file desired, place the link in the
SrcFiles directory, run the session.> Once the session has completed successfully,
change the link in the SrcFiles directory to point to the next available source file.>
Caution: the only downfall is that you cannot run multiple source files (of the same
structure) in to the database simultaneously.> In other words it forces the same session
to be run serially, but if that outweighs the maintenance and speed is not a major issue,
feel free to implement it this way.> On NT you would have to physically move the files
in and out of the SrcFiles directory. Note: the difference between creating a link to an
individual file, and changing SrcFiles directory to link to a specific directory is this:
changing a link to an individual file allows multiple sessions to link to all different types
of sources, changing SrcFiles to be a link itself is restrictive also creates Unix Sys
Admin pressures for directory rights to PowerCenter (one level up).
Q: How can I move my Informatica Logs / BadFiles directories to other disks without
changing anything in my sessions?
Use the UNIX Link command ask the SA to create the link and grant read/write
permissions have the real directory placed on any other disk you wish to have it on.

Q: How do I handle duplicate rows coming in from a flat file?
If you don't care about "reporting" duplicates, use an aggregator. Set the Group By Ports
to group by the primary key in the parent target table. Keep in mind that using an
aggregator causes the following: The last duplicate row in the file is pushed through as
the one and only row, loss of ability to detect which rows are duplicates, caching of the
data before processing in the map continues. If you wish to report duplicates, then follow
the suggestions in the presentation slides (available on this web site) to institute a staging
table. See the pro's and cons' of staging tables, and what they can do for you.
Back to top

Where can I find ?
Q: Where can I find a history / metrics of the load sessions that have occurred in Informatica?
(8 June 2000)
The tables which house this information are OPB_LOAD_SESSION,
OPB_SESSION_LOG, and OPB_SESS_TARG_LOG. OPB_LOAD_SESSION contains
the single session entries, OPB_SESSION_LOG contains a historical log of all session
runs that have taken place. OPB_SESS_TARG_LOG keeps track of the errors, and the
target tables which have been loaded. Keep in mind these tables are tied together by
Session_ID. If a session is deleted from OPB_LOAD_SESSION, it's history is not
necessarily deleted from OPB_SESSION_LOG, nor from OPB_SESS_TARG_LOG.
Unfortunately - this leaves un-identified session ID's in these tables. However, when you
can join them together, you can get the start and complete times from each session. I
would suggest using a view to get the data out (beyond the MX views) - and record it in
another metrics table for historical reasons. It could even be done by putting a TRIGGER
on these tables (possibly the best solution)...
Q: Where can I find more information on what the Informatica Repository Tables are?
On this web-site. We have published an unsupported view of what we believe to be
housed in specific tables in the Informatica Repository. Check it out - we'll be adding to
this section as we go. Right now it's just a belief of what we see in the tables. Repository
Table Meta-Data Definitions
Q: Where can I find / change the settings regarding font's, colors, and layouts for the
designer?
You can find all the font's, colors, layouts, and controls in the registry of the individual
client. All this information is kept at:
HKEY_CURRENT_USER\Software\Informatica\PowerMart Client Tools\<ver>. Below
here, you'll find the different folders which allow changes to be made. Be careful,
deleting items in the registry could hamper the software from working properly.
Q: Where can I find tuning help above and beyond the manuals?

Right here. There are slide presentations, either available now, or soon which will cover
tuning of Informatica maps and sessions - it does mean that the architectural solution
proposed here be put in place.

Q: Where can I find the map's used in generating performance statistics?
A windows ZIP file will soon be posted, which houses a repository backup, as well as a
simple PERL program that generates the source file, and a SQL script which creates the
tables in Oracle. You'll be able to download this, and utilize this for your own benefit.
Back to top
Why doesnt ?
Q: Why doesn't constraint based load order work with a maplet? (08 May 2000)
If your maplet has a sequence generator (reusable) that's mapped with data straight to an
"OUTPUT" designation, and then the map splits the output to two tables: parent/child -
and your session is marked with "Constraint Based Load Ordering" you may have
experienced a load problem - where the constraints do not appear to be met?? Well - the
problem is in the perception of what an "OUTPUT" designation is. The OUTPUT
component is NOT an "object" that collects a "row" as a row, before pushing it
downstream. An OUTPUT component is merely a pass-through structural object - as
indicated, there are no data types on the INPUT or OUTPUT components of a maplet -
thus indicating merely structure. To make the constraint based load order work properly,
move all the ports through a single expression, then through the OUTPUT component -
this will force a single row to be "put together" and passed along to the receiving maplet.
Otherwise - the sequence generator generates 1 new sequence ID for each split target on
the other side of the OUTPUT component.
Q: Why doesn't 4.7 allow me to set the Stored Procedure connection information in the Session
Manager -> Transformations Tab? (31 March 2000)
This functionality used to exist in an older version of PowerMart/PowerCenter. It was a
good feature - as we could control when the procedure was executed (ie: source pre-
load), but execute it in a target database connection. It appears to be a removed piece of
functionality. We are asking Informatica to put it back in.
Q: Why doesn't it work when I wrap a sequence generator in a view, with a lookup object?
First - to wrap a sequence generator in a view, you must create an Oracle stored function,
then call the function in the select statement in a view. Second, Oracle dis-allows an
order by clause on a column returned from a user function (It will cut your connection -
and report an oracle error). I think this is a bug that needs to be reported to Oracle. An
Informatica lookup object automatically places an "order by" clause on the return ports /

output ports in the order they appear in the object. This includes any "function" return.
The minute it executes a non-cached SQL lookup statement with an order by clause on
the function return (sequence number) - Oracle cuts the connection. Thus keeping this
solution from working (which would be slightly faster than binding an external
procedure/function).
Q: Why doesn't a running session QUIT when Oracle or Sybase return fatal errors?
The session will only QUIT when it's threshold is set: "Stop on 1 errors". Otherwise the
session will continue to run.
Q: Why doesn't a running session return a non-successful error code to the command line
when Oracle or Sybase return any error?
If the session is not bounded by it's threshold: set "Stop on 1 errors" the session will run
to completion - and the server will consider the session to have completed successfully -
even if Oracle runs out of Rollback or Temp Log space, even if Sybase has a similar
error. To correct this - set the session to stop on 1 error, then the command line: pmcmd
will return a non-zero (it failed) type of error code. - as will the session manager see that
the session failed.
Q: Why doesn't the session work when I pass a text date field in to the to_date function?
In order to make to_date(xxxx,<format>) work properly, we suggest surrounding your
expression with the following: IIF( is_date(<date>,<format>) = true,
to_date(<date>,<format>), NULL) This will prevent session errors with "transformation
error" in the port. If you pass a non-date to a to_date function it will cause the session to
bomb out. By testing it first, you ensure 1) that you have a real date, and 2) your format
matches the date input. The format should match the expected date input directly -
spaces, no spaces, and everything in between. For example, if your date is:
1999103022:31:23 then you want a format to be: YYYYMMDDHH24:MI:SS with no
spaces.
Q: Why doesn't the session control an update to a table (I have no update strategy in the map
for this target)?
In order to process ANY update to any target table, you must put an update strategy in the
map, process a DD_UPDATE command, change the session to "data driven". There is a
second method: without utilizing an update strategy, set the SESSION properties to
"UPDATE" instead of "DATA DRIVEN", but be warned ALL targets will be updated in
place - with failure if the rows don't exist. Then you can set the update flags in the
mapping's sessions to control updates to the target. Simply setting the "update flags" in a
session is not enough to force the update to complete - even though the log may show an
update SQL statement, the log will also show: cannot insert (duplicate key) errors.
Back to top

Who is ?

Q: Who is the Informatica Sales Team in the Denver Region?
Christine Connor (Sales), and Alan Schwab (Technical Engineer).
Q: Who is the contact for Informatica consulting across the country?
CORE Integration
Back to top

What is ?
Q: What happens when I don't connect input ports to a maplet? (14 June 2000)
Potentially Hazardous values are generated in the maplet itself. Particularly for
numerics. If you didn't connect ALL the ports to an input on a maplet, chances are you'll
see sporadic values inside the maplet - thus sporadic results. Such as ZERO in certain
decimal cases where NULL is desired. This is because both the INPUT and OUTPUT
objects of a maplet are nothing more than an interface, which defines the structure of a
data row - they are NOT like an expression that actually "receives" or "puts together" a
row image. This can cause a misunderstanding of how the maplet works - if you're not
careful, you'll end up with unexpected results.
Q: What is the Local Object Cache? (3 March 2000)
The local object cache is a cache of the Informatica objects which are retrieved from the
repository when a connection is established to a repository. The cache is not readily
accessed because it's housed within the PM/PC client tool. When the client is shut-down,
the cache is released. Apparently the refresh cycle of this local cache requires a full
disconnect/reconnect to the repository which has been updated. This cache will house
two different images of the same object. For instance: a shared object, or a shortcut to
another folder. If the actual source object is updated (source shared, source shortcut),
updates can only be seen in the current open folder if a disconnect/reconnect is performed
against that repository. There is no apparent command to refresh the cache from the
repository. This may cause some confusion when updating objects then switching back
to the mapping where you'd expect to see the newly updated object appear.
Q: What is the best way to "version control"?
It seems the general developer community agrees on this one, the Informatica Versioning
leaves a lot to be desired. We suggest not utilizing the versioning provided. For two
reasons: one, it's extremely unwieldy (you lose all your sessions), and the repository
grows exponentially because Informatica copies objects to increase the version number.
We suggest two different approaches; 1) utilizing a backup of the repository -
synchronize Informatica repository backups (as opposed to DBMS repo backups) with all
the developers. Make your backup consistently and frequently. Then - if you need to
back out a piece, restore the whole repository. 2) Build on this with a second "scratch"

repository, save and restore to the "scratch" repository ONE version of the folders. Drag
and drop the folders to and from the "scratch" development repository. Then - if you
need to VIEW a much older version, restore that backup to the scratch area, and view the
folders. In this manner - you can check in the whole repository backup binary to an
outside version control system like PVCS, CCS, SCM, etc... Then restore the whole
backup in to acceptance - use the backup as a "VERSION" or snapshot of everything in
the repository - this way items don't get lost, and disconnected versions do not get
migrated up in to production.
Q: What is the best way to handle multiple developer environments?
The school of thought is still out on this one. As with any - there are many many ways to
handle this. One idea is presented here (which seems to work well, and be comfortable to
those who already worked in shared Source Code environments). The idea is this: All
developers use shared folders, shared objects, and global repositories. In development -
it's all about communication between team members - so that the items being modified
are assigned to individuals for work. With this methodology - all maps can use common
mapplets, shared sources, targets, and other items. The one problem with this is that the
developers MUST communicate about what they are working on. This is a common and
familiar method to working on shared source code - most development teams feel
comfortable with this, as do managers. The problem with another commonly utilized
method (one folder per developer), is that you end up with run-away development
environments. Code re-use, and shared object use nearly always drop to zero percent
(caveat: unless you are following SEI / CMM / KPA Level 5 - and you have a dedicated
CM (Change Management) person in the works. Communication is still of utmost
importance, however now you have the added problem of "checking in" what looks like
different source tables from different developers, but the objects are named the same...
Among other problems that arise.
Q: What is the web address to submit new enhancement requests?
Informatica's enhancement request web address is:
mailto:featurerequest@informatica.com
Q: What is the execution order of the ports in an expression?
All ports are executed TOP TO BOTTOM in a serial fashion, but they are done in the
following groups: All input ports are pushed values first. Then all variables are executed
(top to bottom physical ordering in the expression). Last - all output expressions are
executed to push values to output ports - again, top to bottom in physical ordering. You
can utilize this to your advantage, by placing lookups in to variables, then using the
variables "later" in the execution cycle.
Q: What is a suggested method for validating fields / marking them with errors?
One of the successful methods is to create an expression object, which contains
variables.> One variable per port that is to be checked.> Set the error flag for that
field, then at the bottom of the expression trap each of the error fields.> From this port
you can choose to set flags based on each individual error which occurred, or feed them

out as a combination of concatenated field names to be inserted in to the database as an
error row in an error tracking table.
Q: What does the error Broken Pipe mean in the PMSERVER.ERR log on Unix?
One of the known causes for this error message is: when someone in the client User
Interface queries the server, then presses the cancel button that appears briefly in the
lower left corner.> It is harmless and poses no threat.
Q: What is the best way to create a readable DEBUG log?
Create a table in a relational database which resembles your flat file source (assuming
you have a flat file source).> Load the data in to the relational table.> Then create
your map from top to bottom and turn on VERBOSE DATA log at the session level.>
Go back to the map, over-ride the SQL in the SQL Qualifier to only pull one to three
rows through the map, then run the session.> In this manner, the DEBUG log will be
readable, errors will be much easier to identify and once the logic is fixed, the whole
data set can be run through the map with NORMAL logging.> Otherwise you may end
up with a huge (Megabyte) log.> The other two ways to create debugging logs are: 1)
switch the session to TEST LOAD, set it to 3 rows, and run The problem with this is
that the reader will read ALL of the source data.> 2) change the output to a flat file.
The problem with this is that your log ends up huge (depends on the number of source
rows you have).
Q: What is the best methodology for utilizing Informaticas Strengths?
It depends on the purpose. However there is a basic definition of how well the tool will
perform with throughput and data handling, if followed in general principal you will
have a winning situation.> 1) break all complex maps down in to small manageable
chunks.> Break up any logic you can in to steps.> Informatica does much better with
smaller more maintainable maps. 2) Break up complex logic within an expression in to
several different expressions.> Be wary though: the more expressions the slower the
throughput only break up the logic if its too difficult to maintain.> 3) Follow the
guides for table structures and data warehouse structures which are available on this web
site.> For reference: load flat files to staging tables, load staging tables in to operational
data stores / reference stores / data warehousing sources, load data warehousing sources
in to star schemas or snowflakes, load star schemas or snowflakes in to highly de-
normalized reporting tables.> By breaking apart the logic you will see the fastest
throughput.
Back to top

When is ?
Q: When is it right to use SQL*Loader / BCP as a piped session versus a tail process?
SQL*Loader / BCP as a piped session should be used when no intermediate file is
necessary, or the source data is too large to stage to an intermediate file, there is not

enough disk or time to place all the source data in to an intermediate file.> The downfalls
currently are this: as a piped process (for PowerCenter 1.5.2 and 1.6 / PowerMart v4.52.
and 4.6)> the core does NOT stop when either BCP or SQL*Loader quit or
terminate.> The core will only stop after reading all of the source data in to the data
reader thread.> This is dangerous if you have a huge file you wish to process and its
scheduled as a monitored process.> Which means: a 5 hour load (in which SQL*Loader /
BCP stopped within the first 5 minutes) will only stop and signal a page after 5 hours of
reading source data.
Back to top

What happens when ?
Q: What happens when Informatica causes DR Watson's on NT? (30 October 2000)
This is just my theory for now, but here's the best explanation I can come up with.
Typically this occurs when there is not enough physical RAM available to perform the
operation. Usually this only happens when SQLServer is installed on the same machine
as the PMServer - however if this is not your case, some of this may still apply.
PMServer starts up child threads just like Unix. The threads share the global shared
memory area - and rely on NT's Thread capabilities. The DR Watson seems to appear
when a thread attempts to deallocate, or allocate real memory. There's none left (mostly
because of SQLServer). The memory manager appears to return an error, or asks the
thread to wait while it reorganizes virtual RAM to make way for the physical request.
Unfortunately the thread code doesn't pay attention to this requrest, resulting in a memory
violation. The other theory is the thread attempts to free memory that's been swapped to
virtual, or has been "garbage collected" and cleared already - thus resulting again in a
protected memory mode access violation - thus a DR Watson. Typically the DR Watson
can cause the session to "freeze up". The only way to clear this is to stop and restart the
PMSERVER service - in some cases it requires a full machine reboot. The only other
possibility is when PMServer is attempting to free or shut down a thread - maybe there's
an error in the code which causes the DR Watson. In any case, the only real fix is to
increase the physical RAM on the machine, or to decrease the number of concurrent
sessions running at any given point, or to decrease the amount of RAM that each
concurrent session is using.
Q: What happens when Informatica CORE DUMPS on Unix? (12 April 2000)
Many things can cause a core dump, but the question is: how do you go about "finding
out" what cuased it, how do you work to solve it, and is there a simple fix? This case was
found to be frequent (according to tech support) among setups of New Unix Hardware -
causing unnecessary core dumps. The IPC semaphore settings were set too low -
causing X number of concurrent sessions to "die" with "writer process died" and "reader
process died" etc... We are on a Unix Machine - Sun Solaris 5.7, anyone with this
configuration might want to check the settings if they experience "Core Dumps" as well.
1. Run "sysdef", examine the IPC Semaphores section at the bottom of the output.
2. the folowing settings should be "increased"

3. SEMMNI - (semaphore identifiers), (7 x # of concurrent sessions to run in Informatica) +
10 for growth + DBMS setting (DBMS Setting: Oracle = 2 per user, Sybase = 40 (avg))
4. SEMMNU - (undo structures in system) = 0.80 x SEMMNI value
5. SEMUME - (max undo entries per process) = SEMMNU
6. SHMMNI - (shared memory identifiers) = SEMMNI + 10
These settings must be changed by ROOT: etc/system file.
About the CORE DUMP: To help Informatica figure out what's going wrong you can
run a unix utility: "truss" in the following manner:
1. Shut down PMSERVER
2. login as "powermart" owner of pmserver - cd to the pmserver home directory.
3. Open Session Manager on another client - log in, and be ready to press "start" for the
sessions/batches causing problems.
4. type: truss -f -o truss.out pmserver <hit return>
5. On the client, press "start" for the sessions/batches having trouble.
6. When all the batches have completed or failed, press "stop server" from the Server
Manager
Your "truss.out" file will have been created - thus giving you a log of all the forked
processes, and memory management /system calls that will help decipher what's
happing. you can examine the "truss.out" file - look for: "killed" in the log.
DONT FORGET: Following a CORE DUMP it's always a good idea to shut down the
unix server, and bounce the box (restart the whole server).
Q: What happens when Oracle or Sybase goes down in the middle of a transformation?
Its up to the database to recover up to the last commit point.> If youre asking this
question, you should be thinking about re-runnability of your processes.> Designing re-
runability in to the processing/maps up front is the best preventative measure you can
have.> Utilizing the recovery facility of PowerMart / PowerCenter appears to be sketchy
at best particularly in this area of recovery.> The transformation itself will eventually
error out stating that the database is no longer available (or something to that effect).
Q: What happens when Oracle (or Sybase) is taken down for routine backup, but nothing is
running in PMServer at the time?
PMServer reports that the database is unavailable in the PMSERVER.err log.> When
Oracle/Sybase comes back on line, PMServer will attempt to re-connect (if the repository
is on the Oracle/Sybase instance that went down), and eventually it will succeed (when
Oracle/Sybase becomes available again).> However it is recommended that PMServer
be scheduled to shutdown before Oracle/Sybase is taken off-line and scheduled to re-start
after Oracle/Sybase is put back on-line.
Q: What happens in a database when a cached LOOKUP object is created (during a session)?
The session generates a select statement with an Order By clause. Any time this is
issued, the databases like Oracle and Sybase will select (read) all the data from the table,
in to the temporary database/space. Then the data will be sorted, and read in chunks back
to Informatica server. This means, that hot-spot contention for a cached lookup will NOT

be the table it just read from. It will be the TEMP area in the database, particularly if the
TEMP area is being utilized for other things. Also - once the cache is created, it is not
re-read until the next running session re-creates it.
Back to top

Generic Questions
Q: Can you explain how "constraint based load ordering" works? (27 Jan 2000)
Constraint based load ordering in PowerMart / PowerCenter works like this: it controls
the order in which the target tables are committed to a relational database. It is of no use
when sending information to a flat file. To construct the proper constraint order: links
between the TARGET tables in Informatica need to be constructed. Simply turning on
"constraint based load ordering" has no effect on the operation itself. Informatica does
NOT read constraints from the database when this switch is turned on. Again, to take
advantage of this switch, you must construct primary / foreign key relationships in the
TARGET TABLES in the designer of Informatica. Creating primary / foreign key
relationships is difficult - you are only allowed to link a single port (field) to a single
table as a primary / foreign key.
Q: It appears as if "constraint based load ordering" makes my session "hang" (it never
completes). How do I fix this? (27 Jan 2000)
We have a suggested method. The best known method for fixing this "hang" bug is to 1)
open the map, 2) delete the target tables (parent / child pairs) 3) Save the map, 4) Drag in
the targets again, Parent's FIRST 5) relink the ports, 6) Save the map, 7) refresh the
session, and re-run it. What it does: Informatica places the "target load order" as the order
in which the targets are created (in the map). It does this because the repository is
Seuqence ID Based and the session derives it's "commit" order by the Sequence ID
(unless constraint based load ordering is ON), then it tries to re-arrange the commit order
based on the constraints in the Target Table definitions (in PowerMart/PowerCenter).
Once done, this will solve the commit ordering problems, and the "constraint based" load
ordering can even be turned off in the session. Informatica claims not to support this
feature in a session that is not INSERT ONLY. However -we've gotten it to work
successfully in DATA DRIVEN environments. The only known cause (according to
Technical Support) is this: the writer is going to commit a child table (as defined by the
key links in the targets). It checks to see if that particular parent row has been committed
yet - but it finds nothing (because the reader filled up the memory cache with new rows).
The memory that was holding the "committed" rows has been "dumped" and no longer
exists. So - the writer waits, and waits, and waits - it never sees a "commit" for the
parents, so it never "commits" the child rows. This only appears to happen with files
larger than a certain number of rows (depending on your memory settings for the
session). The only fix is this: Set "ThrottleReader=20" in the PMSERVER.CFG file. It
apparently limits the Reader thread to a maximum of "20" blocks for each session - thus
leaving the writer more room to cache the commit blocks. However - this too also hangs
in certain situations. To fix this, Tech Support recommends moving to PowerMart 4.6.2
release (internal core apparently needs a fix). 4.6.2 appears to be "better" behaved but

not perfect. The only other way to fix this is to turn off constraint based load ordering,
choose a different architecture for your maps (see my presentations), and control one
map/session per target table and their order of execution.
Q: Is there a way to copy a session with a map, when copying a map from repository to
repository? Say, copying from Development to Acceptance?
Not that anyone is aware of. There is no direct straight forward method for copying a
session. This is the one downside to attempting to version control by folder. You MUST
re-create the session in Acceptance (UNLESS) you backup the Development repository,
and RESTORE it in to acceptance. This is the only way to take all contents (and
sessions) from one repository to another. In this fashion, you are versioning all of the
repository at once. With the repository BINARY you can then check this whole binary in
to PVCS or some other outside version control system. However, to recreate the session,
the best method is to: bring up Development folder/repo, side by side with Acceptance
folder/repo - then modify the settings in Acceptance as necessary.
Q: Can I set Informatica up for Target flat file, and target relational database?
Up through PowerMart 4.6.2, PowerCenter 1.6.2 this cannot be done in a single map.
The best method for this is to stay relational with your first map, add a table to your
database that looks exactly like the flat file (1 for 1 with the flat file), target the two
relational tables. Then, construct another map which simply reads this "staging" table
and dumps it to flat file. You can batch the maps together as sequential.
Q: How can you optimize use of an Oracle Sequence Generator?
In order to optimize the use of an Oracle Sequence Generator you must break up you
map. The generic method for calling a sequence generator is to encapsulate it in a stored
procedure. This is typically slow - and kills the performance. Your version of
Informatica's tool should contain maplets to make this easier. Break the map up in to
inserts only, and updates only. The suggested method is as follows: 1) Create a staging
table - bring the data in straight from the flat file in to the staging table. 2) Create a
maplet with the current logic in it. 3) create one INSERT map, and one Update map
(separate inserts from updates) 4) create a SOURCE called: DUAL, containing the
fields: DUMMY char(1), NEXTVAL NUMBER(15,0), CURRVAL number(15,0), 5)
Copy the source in to your INSERT map, 6) delete the Source Qualifier for "dummy" 7)
copy the "nextval" port in to the original source qualifier (the one that pulls data from the
staging table) 8) Over-ride the SQL in the original source qualifier, (generate it, then
change DUAL.NEXTVAL to the sequence name: SQ_TEST.NEXTVAL. 9) Feed the
"nextval" port through the mapplet. 10) Change the where clause on the SQL over-ride
to select only the data from the staging table that doesn't exist in the parent target (to be
inserted. This is extremely fast, and will allow your inserts only map to operate at
incredibly high throughput while using an Oracle Sequence Generator. Be sure to tune
your indexes on the Oracle tables so that there is a high read throughput.
Q: Why can't I over-ride the SQL in a lookup, and make the lookup non-cached?

Apparently Informatica hasn't made this feature available yet in their tool. It's a shame -
it would simplify the method for pulling Oracle Sequence numbers from the database.
For now - it's simply not implemented.
Q: Does it make a difference if I push all my ports (fields) through an expression, or push only
the ports which are used in the expression?
From the work that has been done - it doesn't make much of an impact on the overall
speed of the map. If the paradigm is to push all ports through the expressions for
readability then do so, however if it's easier to push the ports around the expression (not
through it), then do so.
Q: What is the affect of having multiple expression objects vs one expression object with all the
expressions?
Less overall objects in the map make the map/session run faster. Consolodating
expressions in to a single expression object is most helpful to throughput - but can
increase the complexity (maintenance). Read the question/answer about execution cycles
above for hints on how to setup a large expression like this.

Data warehousing concepts

1.What is difference between view and materialized view?
Views contains query whenever execute views it has read from base table
Where as M views loading or replicated takes place only once which gives you better query
performance

Refresh m views 1.on commit and 2. on demand
(Complete, never, fast, force)

2.What is bitmap index why its used for DWH?
a bitmap for each key value replaces a list of rowids. Bitmap index more efficient for data
warehousing because low cardinality, low updates, very efficient for where class

3.What is star schema? And what is snowflake schema?
The center of the star consists of a large fact table and the points of the star are the dimension
tables.

Snowflake schemas normalized dimension tables to eliminate redundancy. That is, the
Dimension data has been grouped into multiple tables instead of one large table.

Star schema contains demoralized dimension tables and fact table, each primary key values in
dimension table associated with foreign key of fact tables.
Here a fact table contains all business measures (normally numeric data) and foreign key
values, and dimension tables has details about the subject area.

Snowflake schema basically a normalized dimension tables to reduce redundancy in the
dimension tables


4.Why need staging area database for DWH?
Staging area needs to clean operational data before loading into data warehouse.
Cleaning in the sense your merging data which comes from different source

5.What are the steps to create a database in manually?
create os service and create init file and start data base no mount stage then give create
data base command.

6.Difference between OLTP and DWH?
OLTP system is basically application orientation (eg, purchase order it is functionality of an
application)
Where as in DWH concern is subject orient (subject in the sense custorer, product, item,
time)
OLTP
Application Oriented
Used to run business
Detailed data
Current up to date
Isolated Data
Repetitive access
Clerical User
Performance Sensitive
Few Records accessed at a time (tens)
Read/Update Access
No data redundancy
Database Size 100MB-100 GB
DWH
Subject Oriented
Used to analyze business
Summarized and refined
Snapshot data
Integrated Data
Ad-hoc access
Knowledge User
Performance relaxed
Large volumes accessed at a time(millions)
Mostly Read (Batch Update)
Redundancy present
Database Size 100 GB - few terabytes

7.Why need data warehouse?

A single, complete and consistent store of data obtained from a variety of different sources
made available to end users in a what they can understand and use in a business context.

A process of transforming data into information and making it available to users in a timely
enough manner to make a difference Information

Technique for assembling and managing data from various sources for the purpose of
answering business questions. Thus making decisions that were not previous possible



8.What is difference between data mart and data warehouse?

A data mart designed for a particular line of business, such as sales, marketing, or finance.

Where as data warehouse is enterprise-wide/organizational

The data flow of data warehouse depending on the approach

9.What is the significance of surrogate key?
Surrogate key used in slowly changing dimension table to track old and new values and its
derived from primary key.

10.What is slowly changing dimension. What kind of scd used in your project?
Dimension attribute values may change constantly over the time. (Say for example customer
dimension has customer_id,name, and address) customer address may change over time.
How will you handle this situation?
There are 3 types, one is we can overwrite the existing record, second one is create additional
new record at the time of change with the new attribute values.
Third one is create new field to keep new values in the original dimension table.

11.What is difference between primary key and unique key constraints?
Primary key maintains uniqueness and not null values
Where as unique constrains maintain unique values and null values

12.What are the types of index? And is the type of index used in your project?
Bitmap index, B-tree index, Function based index, reverse key and composite index.
We used Bitmap index in our project for better performance.

13.How is your DWH data modeling(Details about star schema)?

14.A table have 3 partitions but I want to update in 3rd partitions how will you do?

Specify partition name in the update statement. Say for example
Update employee partition(name) a, set a.empno=10 where ename=Ashok
15.When you give an update statement how memory flow will happen and how oracles
allocate memory for that?

Oracle first checks in Shared sql area whether same Sql statement is available if it is there
it uses. Otherwise allocate memory in shared sql area and then create run time memory in
Private sql area to create parse tree and execution plan. Once it completed stored in the shared
sql area wherein previously allocated memory

16.Write a query to find out 5th max salary? In Oracle, DB2, SQL Server

Select (list the columns you want) from (select salary from employee order by salary)
Where rownum<5

17.When you give an update statement how undo/rollback segment will work/what are
the steps?


Oracle keep old values in undo segment and new values in redo entries. When you say
rollback it replace old values from undo segment. When you say commit erase the undo
segment values and keep new vales in permanent.

Informatica Administration

18.What is DTM? How will you configure it?
DTM transform data received from reader buffer and its moves transformation to
transformation on row by row basis and it uses transformation caches when necessary.

19.You transfer 100000 rows to target but some rows get discard how will you trace
them? And where its get loaded?

Rejected records are loaded into bad files. It has record indicator and column indicator.

Record indicator identified by (0-insert,1-update,2-delete,3-reject) and column indicator
identified by (D-valid,O-overflow,N-null,T-truncated).

Normally data may get rejected in different reason due to transformation logic

20.What are the different uses of a repository manager?

Repository manager used to create repository which contains metadata the informatica
uses to transform data from source to target. And also it use to create informatica users and
folders and copy, backup and restore the repository

21.How do you take care of security using a repository manager?

Using repository privileges, folder permission and locking.

Repository privileges(Session operator, Use designer, Browse repository, Create session and
batches, Administer repository, administer server, super user)

Folder permission(owner, groups, users)

Locking(Read, Write, Execute, Fetch, Save)

22.What is a folder?

Folder contains repository objects such as sources, targets, mappings, transformation
which are helps logically organize our data warehouse.

23.Can you create a folder within designer?

Not possible

24.What are shortcuts? Where it can be used? What are the advantages?


There are 2 shortcuts(Local and global) Local used in local repository and global used in
global repository. The advantage is reuse an object without creating multiple objects. Say for
example a source definition want to use in 10 mappings in 10 different folder without
creating 10 multiple source you create 10 shotcuts.

25.How do you increase the performance of mappings?

Use single pass read(use one source qualifier instead of multiple SQ for same table)
Minimize data type conversion (Integer to Decimal again back to Integer)
Optimize transformation(when you use Lookup, aggregator, filter, rank and joiner)
Use caches for lookup
Aggregator use presorted port, increase cache size, minimize input/out port as much as
possible
Use Filter wherever possible to avoid unnecessary data flow

26.Explain Informatica Architecture?
Informatica consist of client and server. Client tools such as Repository manager,
Designer, Server manager. Repository data base contains metadata it read by informatica
server used read data from source, transforming and loading into target.

27.How will you do sessions partitions?
Its not available in power part 4.7

Transformation

28.What are the constants used in update strategy?

DD_INSERT, DD_UPDATE, DD_DELETE, DD_REJECT

29.What is difference between connected and unconnected lookup transformation?

Connected lookup return multiple values to other transformation
Where as unconnected lookup return one values
If lookup condition matches Connected lookup return user defined default values
Where as unconnected lookup return null values
Connected supports dynamic caches where as unconnected supports static

30.What you will do in session level for update strategy transformation?

In session property sheet set Treat rows as Data Driven

31.What are the port available for update strategy , sequence generator, Lookup, stored
procedure transformation?

Transformations Port

Update strategy Input, Output
Sequence Generator Output only
Lookup Input, Output, Lookup, Return
Stored Procedure Input, Output

32.Why did you used connected stored procedure why dont use unconnected stored
procedure?

33.What is active and passive transformations?

Active transformation change the no. of records when passing to targe(example filter)
where as passive transformation will not change the transformation(example expression)

34.What are the tracing level?
Normal It contains only session initialization details and transformation details no. records
rejected, applied
Terse - Only initialization details will be there
Verbose Initialization Normal setting information plus detailed information about the
transformation.
Verbose data Verbose init. Settings and all information about the session

35.How will you make records in groups?

Using group by port in aggregator

36.Need to store value like 145 into target when you use aggregator, how will you do
that?

Use Round() function

37.How will you move mappings from development to production database?

Copy all the mapping from development repository and paste production repository while
paste it will promt whether you want replace/rename. If say replace informatica replace all
the source tables with repository database.

38.What is difference between aggregator and expression?

Aggregator is active transformation and expression is passive transformation
Aggregator transformation used to perform aggregate calculation on group of records
Where as expression used perform calculation with single record

39.Can you use mapping without source qualifier?

Not possible, If source RDBMS/DBMS/Flat file use SQ or use normalizer if the source cobol
feed

40.When do you use a normalizer?

Normalized can be used in Relational to denormilize data.


41.What are stored procedure transformations. Purpose of sp transformation. How did
you go about using your project?
Connected and unconnected stored procudure.
Unconnected stored procedure used for data base level activities such as pre and post load

Connected stored procedure used in informatica level for example passing one parameter as
input and capturing return value from the stored procedure.

Normal - row wise check
Pre-Load Source - (Capture source incremental data for incremental aggregation)
Post-Load Source - (Delete Temporary tables)
Pre-Load Target - (Check disk space available)
Post-Load Target (Drop and recreate index)

42.What is lookup and difference between types of lookup. What exactly happens when
a lookup is cached. How does a dynamic lookup cache work.
Lookup transformation used for check values in the source and target tables(primary key
values).
There are 2 type connected and unconnected transformation
Connected lookup returns multiple values if condition true
Where as unconnected return a single values through return port.
Connected lookup return default user value if the condition does not mach
Where as unconnected return null values
Lookup cache does:
Read the source/target table and stored in the lookup cache
43.What is a joiner transformation?

Used for heterogeneous sources(A relational source and a flat file)

Type of joins:

Assume 2 tables has values(Master - 1, 2, 3 and Detail - 1, 3, 4)

Normal(If the condition mach both master and detail tables then the records will be displaced.
Result set 1, 3)
Master Outer(It takes all the rows from detail table and maching rows from master table.
Result set 1, 3, 4)
Detail Outer(It takes all the values from master source and maching values from detail table.
Result set 1, 2, 3)
Full Outer(It takes all values from both tables)

44.What is aggregator transformation how will you use in your project?

Used perform aggregate calculation on group of records and we can use conditional
clause to filter data

45.Can you use one mapping to populate two tables in different schemas?

Yes we can use

46.Explain lookup cache, various caches?


Lookup transformation used for check values in the source and target tables(primary key
values).

Various Caches:

Persistent cache (we can save the lookup cache files and reuse them the next time
process the lookup transformation)

Re-cache from database (If the persistent cache not synchronized with lookup table you
can configure the lookup transformation to rebuild the lookup cache)

Static cache (When the lookup condition is true, Informatica server return a value from
lookup cache and its does not update the cache while it processes the lookup transformation)

Dynamic cache (Informatica server dynamically inserts new rows or update existing
rows in the cache and the target. Suppose if we want lookup a target table we can use
dynamic cache)

Shared cache (we can share lookup transformation between multiple transformations in
a mapping. 2 lookup in a mapping can share single lookup cache)

47.Which path will the cache be created?

User specified directory. If we say c:\ all the cache files created in this directory.

48.Where do you specify all the parameters for lookup caches?

Lookup property sheet/tab.

49.How do you remove the cache files after the transformation?

After session complete, DTM remove cache memory and deletes caches files.
In case using persistent cache and Incremental aggregation then caches files will be saved.

50.What is the use of aggregator transformation?

To perform Aggregate calculation

Use conditional clause to filter data in the expression Sum(commission, Commission >2000)

Use non-aggregate function iif (max(quantity) > 0, Max(quantitiy), 0))

51.What are the contents of index and cache files?

Index caches files hold unique group values as determined by group by port in the
transformation.

Data caches files hold row data until it performs necessary calculation.


52.How do you call a store procedure within a transformation?

In the expression transformation create new out port in the expression write :sp.stored
procedure name(arguments)

53.Is there any performance issue in connected & unconnected lookup? If yes, How?

Yes

Unconnected lookup much more faster than connected lookup why because in
unconnected not connected to any other transformation we are calling it from other
transformation so it minimize lookup cache value

Where as connected transformation connected to other transformation so it keeps values in
the lookup cache.

54.What is dynamic lookup?

When we use target lookup table, Informatica server dynamically insert new values or it
updates if the values exist and passes to target table.

55.How Informatica read data if source have one relational and flat file?

Use joiner transformation after source qualifier before other transformation.

56.How you will load unique record into target flat file from source flat files has
duplicate data?

There are 2 we can do this either we can use Rank transformation or oracle external table
In rank transformation using group by port (Group the records) and then set no. of rank 1.
Rank transformation return one value from the group. That the values will be a unique one.

57.Can you use flat file for repository?

No, We cant

58.Can you use flat file for lookup table?

No, We cant

59.Without Source Qualifier and joiner how will you join tables?

In session level we have option user defined join. Where we can write join condition.

60.Update strategy set DD_Update but in session level have insert. What will happens?

Insert take place. Because this option override the mapping level option

Sessions and batches


61.What are the commit intervals?

Source based commit (Based on the no. of active source records(Source qualifier) reads.
Commit interval set 10000 rows and source qualifier reads 10000 but due to transformation
logic 3000 rows get rejected when 7000 reach target commit will fire, so writer buffer does
not rows held the buffer)

Target based commit (Based on the rows in the buffer and commit interval. Target based
commit set 10000 but writer buffer fills every 7500, next time buffer fills 15000 now commit
statement will fire then 22500 like go on.)

62.When we use router transformation?

When we want perform multiple condition to filter out data then we go for router. (Say
for example source records 50 filter condition mach 10 records remaining 40 records get filter
out but still we want perform few more filter condition to filter remaining 40 records.)

63.How did you schedule sessions in your project?

Run once (set 2 parameter date and time when session should start)

Run Every (Informatica server run session at regular interval as we configured, parameter
Days, hour, minutes, end on, end after, forever)

Customized repeat (Repeat every 2 days, daily frequency hr, min, every week, every month)

Run only on demand(Manually run) this not session scheduling.

64.How do you use the pre-sessions and post-sessions in sessions wizard, what for they
used?

Post-session used for email option when the session success/failure send email. For that
we should configure
Step1. Should have a informatica startup account and create outlook profile for that user
Step2. Configure Microsoft exchange server in mail box applet(control panel)
Step3. Configure informatica server miscellaneous tab have one option called MS
exchange profile where we have specify the outlook profile name.

Pre-session used for even scheduling (Say for example we dont know whether source
file available or not in particular directory. For that we write one DOS command to move file
directory to destination and set event based scheduling option in session property sheet
Indicator file wait for).

65.What are different types of batches. What are the advantages and dis-advantages of
a concurrent batch?

Sequential(Run the sessions one by one)

Concurrent (Run the sessions simultaneously)

Advantage of concurrent batch:


Its takes informatica server resource and reduce time it takes run session separately.
Use this feature when we have multiple sources that process large amount of data in one
session. Split sessions and put into one concurrent batches to complete quickly.

Disadvantage

Require more shared memory otherwise session may get failed

66.How do you handle a session if some of the records fail. How do you stop the session in
case of errors. Can it be achieved in mapping level or session level?

It can be achieved in session level only. In session property sheet, log files tab one option
is the error handling Stop on ------ errors. Based on the error we set informatica server
stop the session.

67.How you do improve the performance of session.

If we use Aggregator transformation use sorted port, Increase aggregate cache size, Use
filter before aggregation so that it minimize unnecessary aggregation.

Lookup transformation use lookup caches

Increase DTM shared memory allocation

Eliminating transformation errors using lower tracing level(Say for example a mapping has
50 transformation when transformation error occur informatica server has to write in session
log file it affect session performance)

68.Explain incremental aggregation. Will that increase the performance? How?

Incremental aggregation capture whatever changes made in source used for aggregate
calculation in a session, rather than processing the entire source and recalculating the same
calculation each time session run. Therefore it improve session performance.

Only use incremental aggregation following situation:

Mapping have aggregate calculation
Source table changes incrementally

Filtering source incremental data by time stamp

Before Aggregation have to do following steps:

Use filter transformation to remove pre-existing records

Reinitialize aggregate cache when source table completely changes for example incremental
changes happing daily and complete changes happenings monthly once. So when the source
table completely change we have reinitialize the aggregate cache and truncate target table use
new source table. Choose Reinitialize cache in the aggregation behavior in transformation tab

69.Concurrent batches have 3 sessions and set each session run if previous complete but 2nd
fail then what will happen the batch?

Batch will fail

General Project

70.How many mapping, dimension tables, Fact tables and any complex mapping you did?
And what is your database size, how frequently loading to DWH?

I did 22 Mapping, 4 dimension table and one fact table. One complex mapping I did for
slowly changing dimension table. Database size is 9GB. Loading data every data.

71.What are the different transformations used in your project?

Aggregator, Expression, Filter, Sequence generator, Update Strategy, Lookup, Stored
Procedure, Joiner, Rank, Source Qualifier.

72.How did you populate the dimensions tables?

73.What are the sources you worked on?
Oracle


74.How many mappings have you developed on your whole dwh project?

45 mappings

75.What is OS used your project?

Windows NT

76.Explain your project (Fact table, dimensions, and database size)

Fact table contains all business measures(numeric values) and foreign key values, Dimension
table contains details about subject area like customer, product

77.What is difference between Informatica power mart and power center?

Using power center we can create global repository
Power mart used to create local repository
Global repository configure multiple server to balance session load
Local repository configure only single server

78.Have you done any complex mapping?

Developed one mapping to handle slowly changing dimension table.

79.Explain details about DTM?

Once we session start, load manager start DTM and it allocate session shared memory and
contains reader and writer. Reader will read source data from source qualifier using SQL
statement and move data to DTM then DTM transform data to transformation to
transformation and row by row basis finally move data to writer then writer write data into
target using SQL statement.

I-Flex Interview (14
th
May 2003)

80.What are the key you used other than primary key and foreign key?

Used surrogate key to maintain uniqueness to overcome duplicate value in the primary key.

81.Data flow of your Data warehouse(Architecture)


DWH is a basic architecture (OLTP to Data warehouse from DWH OLAP analytical and
report building.

82.Difference between Power part and power center?

Using power center we can create global repository
Power mart used to create local repository
Global repository configure multiple server to balance session load
Local repository configure only single server

83.What are the batches and its details?

Sequential(Run the sessions one by one)

Concurrent (Run the sessions simultaneously)

Advantage of concurrent batch:

Its takes informatica server resource and reduce time it takes run session separately.
Use this feature when we have multiple sources that process large amount of data in one
session. Split sessions and put into one concurrent batches to complete quickly.

Disadvantage

Require more shared memory otherwise session may get failed

84.What is external table in oracle. How oracle read the flat file

Used for read flat file. Oracle internally write SQL loader script with control file.

85.What are the index you used? Bitmap join index?

Bitmap index used in data warehouse environment to increase query response time, since
DWH has low cardinality, low updates, very efficient for where clause.

Bitmap join index used to join dimension and fact table instead reading 2 different index.

86.What are the partitions in 8i/9i? Where you will use hash partition?

In oracle8i there are 3 partition (Range, Hash, Composite)
In Oracle9i List partition is additional one

Range (Used for Dates values for example in DWH ( Date values are Quarter 1, Quarter 2,
Quarter 3, Quater4)


Hash (Used for unpredictable values say for example we cant able predict which value to
allocate which partition then we go for hash partition. If we set partition 5 for a column oracle
allocate values into 5 partition accordingly).

List (Used for literal values say for example a country have 24 states create 24 partition for
24 states each)

Composite (Combination of range and hash)

91.What is main difference mapplets and mapping?

Reuse the transformation in several mappings, where as mapping not like that.

If any changes made in mapplets it automatically inherited in all other instance mapplets.

92. What is difference between the source qualifier filter and filter transformation?
Source qualifier filter only used for relation source where as Filter used any kind of source.

Source qualifier filter data while reading where as filter before loading into target.

93. What is the maximum no. of return value when we use unconnected
transformation?

Only one.

94. What are the environments in which informatica server can run on?

Informatica client runs on Windows 95 / 98 / NT, Unix Solaris, Unix AIX(IBM)

Informatica Server runs on Windows NT / Unix

Minimum Hardware requirements

Informatica Client Hard disk 40MB, RAM 64MB

Informatica Server Hard Disk 60MB, RAM 64MB

95. Can unconnected lookup do everything a connected lookup transformation can do?

No, We cant call connected lookup in other transformation. Rest of things its possible

96. In 5.x can we copy part of mapping and paste it in other mapping?

I think its possible


97. What option do you select for a sessions in batch, so that the sessions run one
after the other?

We have select an option called Run if previous completed

98. How do you really know that paging to disk is happening while you are using a
lookup transformation? Assume you have access to server?

We have collect performance data first then see the counters parameter lookup_readtodisk if
its greater than 0 then its read from disk

Step1. Choose the option Collect Performance data in the general tab session property
sheet.
Step2. Monitor server then click server-request session performance details
Step3. Locate the performance details file named called session_name.perf file in the session
log file directory
Step4. Find out counter parameter lookup_readtodisk if its greater than 0 then informatica
read lookup table values from the disk. Find out how many rows in the cache see
Lookup_rowsincache

99. List three option available in informatica to tune aggregator transformation?

Use Sorted Input to sort data before aggregation
Use Filter transformation before aggregator

Increase Aggregator cache size

100.Assume there is text file as source having a binary field to, to source qualifier What
native data type informatica will convert this binary field to in source qualifier?

Binary data type for relational source for flat file ?
101.Variable v1 has values set as 5 in designer(default), 10 in parameter file, 15 in
repository. While running session which value informatica will read?

Informatica read value 15 from repository

102. Joiner transformation is joining two tables s1 and s2. s1 has 10,000 rows and s2 has
1000 rows . Which table you will set master for better performance of joiner
transformation? Why?

Set table S2 as Master table because informatica server has to keep master table in the cache
so if it is 1000 in cache will get performance instead of having 10000 rows in cache


103. Source table has 5 rows. Rank in rank transformation is set to 10. How many rows
the rank transformation will output?

5 Rank

104. How to capture performance statistics of individual transformation in the mapping
and explain some important statistics that can be captured?

Use tracing level Verbose data

105. Give a way in which you can implement a real time scenario where data in a table
is changing and you need to look up data from it. How will you configure the lookup
transformation for this purpose?

In slowly changing dimension table use type 2 and model 1

106. What is DTM process? How many threads it creates to process data, explain each
thread in brief?

DTM receive data from reader and move data to transformation to transformation on row by
row basis. Its create 2 thread one is reader and another one is writer

107. Suppose session is configured with commit interval of 10,000 rows and source has
50,000 rows explain the commit points for source based commit & target based commit.
Assume appropriate value wherever required?

Target Based commit (First time Buffer size full 7500 next time 15000)

Commit Every 15000, 22500, 30000, 40000, 50000

Source Based commit(Does not affect rows held in buffer)

Commit Every 10000, 20000, 30000, 40000, 50000

108.What does first column of bad file (rejected rows) indicates?

First Column - Row indicator (0, 1, 2, 3)

Second Column Column Indicator (D, O, N, T)

109. What is the formula for calculation rank data caches? And also Aggregator, data,
index caches?

Index cache size = Total no. of rows * size of the column in the lookup condition (50 * 4)

Aggregator/Rank transformation Data Cache size = (Total no. of rows * size of the column in
the lookup condition) + (Total no. of rows * size of the connected output ports)


110. Can unconnected lookup return more than 1 value?

No

1. Explain about your projects
- Architecture
- Dimension and Fact tables
- Sources and Targets
- Transformations used
- Frequency of populating data
- Database size

2. What is dimension modeling?
Unlike ER model the dimensional model is very asymmetric
with one large central table called as fact table connected to multiple
dimension tables .It is also called star schema.

3. What are mapplets?
Mapplets are reusable objects that represents collection of transformations
Transformations not to be included in mapplets are
Cobol source definitions
Joiner transformations
Normalizer Transformations
Non-reusable sequence generator transformations
Pre or post session procedures
Target definitions
XML Source definitions
IBM MQ source definitions
Power mart 3.5 style Lookup functions

4. What are the transformations that use cache for performance?
Aggregator, Lookups, Joiner and Ranker

5. What the active and passive transformations?
An active transformation changes the number of rows that pass through the
mapping.
1. Source Qualifier
2. Filter transformation
3. Router transformation
4. Ranker
5. Update strategy
6. Aggregator
7. Advanced External procedure

8. Normalizer
9. Joiner

Passive transformations do not change the number of rows that pass through
the mapping.
1. Expressions
2. Lookup
3. Stored procedure
4. External procedure
5. Sequence generator
6. XML Source qualifier

6. What is a lookup transformation?
Used to look up data in a relational table, views, or synonym, The
informatica server queries the lookup table based on the lookup ports in the
transformation. It compares lookup transformation port values to lookup
table column values based on the lookup condition. The result is passed to
other transformations and the target.

Used to :
Get related value
Perform a calculation
Update slowly changing dimension tables.
Diff between connected and unconnected lookups. Which is better?
Connected :
Received input values directly from the pipeline
Can use Dynamic or static cache.
Cache includes all lookup columns used in the mapping
Can return multiple columns from the same row
If there is no match , can return default values
Default values can be specified.

Un connected :
Receive input values from the result of a LKP expression in another
transformation.
Only static cache can be used.
Cache includes all lookup/output ports in the lookup condition and lookup or
return port.
Can return only one column from each row.
If there is no match it returns null.
Default values cannot be specified.

Explain various caches :
Static:
Caches the lookup table before executing the transformation.
Rows are not added dynamically.
Dynamic:
Caches the rows as and when it is passed.
Unshared:
Within the mapping if the lookup table is used in more than

one transformation then the cache built for the first lookup can be used for
the others. It cannot be used across mappings.
Shared:
If the lookup table is used in more than one
transformation/mapping then the cache built for the first lookup can be used
for the others. It can be used across mappings.
Persistent :
If the cache generated for a Lookup needs to be preserved
for subsequent use then persistent cache is used. It will not delete the
index and data files. It is useful only if the lookup table remains
constant.

What are the uses of index and data caches?
The conditions are stored in index cache and records from
the lookup are stored in data cache

7. Explain aggregate transformation?
The aggregate transformation allows you to perform aggregate calculations,
such as averages, sum, max, min etc. The aggregate transformation is unlike
the Expression transformation, in that you can use the aggregator
transformation to perform calculations in groups. The expression
transformation permits you to perform calculations on a row-by-row basis
only.
Performance issues ?
The Informatica server performs calculations as it reads and stores
necessary data group and row data in an aggregate cache.
Create Sorted input ports and pass the input records to aggregator in
sorted forms by groups then by port

Incremental aggregation?
In the Session property tag there is an option for
performing incremental aggregation. When the Informatica server performs
incremental aggregation , it passes new source data through the mapping and
uses historical cache (index and data cache) data to perform new aggregation
calculations incrementally.

What are the uses of index and data cache?
The group data is stored in index files and Row data stored
in data files.

8. Explain update strategy?
Update strategy defines the sources to be flagged for insert, update,
delete, and reject at the targets.
What are update strategy constants?
DD_INSERT,0 DD_UPDATE,1 DD_DELETE,2
DD_REJECT,3

If DD_UPDATE is defined in update strategy and Treat source
rows as INSERT in Session . What happens?
Hints: If in Session anything other than DATA DRIVEN is
mentions then Update strategy in the mapping is ignored.


What are the three areas where the rows can be flagged for
particular treatment?
In mapping, In Session treat Source Rows and In Session
Target Options.

What is the use of Forward/Reject rows in Mapping?

9. Explain the expression transformation ?
Expression transformation is used to calculate values in a single row before
writing to the target.
What are the default values for variables?
Hints: Straing = Null, Number = 0, Date = 1/1/1753

10. Difference between Router and filter transformation?
In filter transformation the records are filtered based on the condition and
rejected rows are discarded. In Router the multiple conditions are placed
and the rejected rows can be assigned to a port.

How many ways you can filter the records?
1. Source Qualifier
2. Filter transformation
3. Router transformation
4. Ranker
5. Update strategy
.
11. How do you call stored procedure and external procedure
transformation ?
External Procedure can be called in the Pre-session and post session tag in
the Session property sheet.
Store procedures are to be called in the mapping designer by three methods
1. Select the icon and add a Stored procedure transformation
2. Select transformation - Import Stored Procedure
3. Select Transformation - Create and then select stored procedure.

12. Explain Joiner transformation and where it is used?
While a Source qualifier transformation can join data originating from a
common source database, the joiner transformation joins two related
heterogeneous sources residing in different locations or file systems.
Two relational tables existing in separate databases
Two flat files in different file systems.
Two different ODBC sources
In one transformation how many sources can be coupled?
Two sources can be couples. If more than two is to be couples add another
Joiner in the hierarchy.
What are join options?
Normal (Default)
Master Outer
Detail Outer
Full Outer



13. Explain Normalizer transformation?
The normaliser transformation normalises records from COBOL and relational
sources, allowing you to organise the data according to your own needs. A
Normaliser transformation can appear anywhere in a data flow when you
normalize a relational source. Use a Normaliser transformation instead of
the Source Qualifier transformation when you normalize COBOL source. When
you drag a COBOL source into the Mapping Designer Workspace, the Normaliser
transformation appears, creating input and output ports for every columns in
the source.

14. What is Source qualifier transformation?
When you add relational or flat file source definition to a mapping , you
need to connect to a source Qualifier transformation. The source qualifier
represents the records that the informatica server reads when it runs a
session.
Join Data originating from the same source database.
Filter records when the Informatica server reads the source data.
Specify an outer join rather than the default inner join.
Specify sorted ports
Select only distinct values from the source
Create a custom query to issue a special SELECT statement for the
Informatica server to read the source data.

15. What is Ranker transformation?
Filters the required number of records from the top or from the bottom.

16. What is target load option?
It defines the order in which informatica server loads the data into the
targets.
This is to avoid integrity constraint violations

17. How do you identify the bottlenecks in Mappings?
Bottlenecks can occur in
1. Targets
The most common performance bottleneck occurs when the
informatica server writes to a target
database. You can identify target bottleneck by
configuring the session to write to a flat file target.
If the session performance increases significantly when
you write to a flat file, you have a target
bottleneck.
Solution :
Drop or Disable index or constraints
Perform bulk load (Ignores Database log)
Increase commit interval (Recovery is compromised)
Tune the database for RBS, Dynamic Extension etc.,

2. Sources

Set a filter transformation after each SQ and see the
records are not through.
If the time taken is same then there is a problem.
You can also identify the Source problem by
Read Test Session - where we copy the mapping with
sources, SQ and remove all transformations
and connect to file target. If the performance is same
then there is a Source bottleneck.
Using database query - Copy the read query directly from
the log. Execute the query against the
source database with a query tool. If the time it takes
to execute the query and the time to fetch
the first row are significantly different, then the query
can be modified using optimizer hints.
Solutions:
Optimize Queries using hints.
Use indexes wherever possible.

3. Mapping
If both Source and target are OK then problem could be
in mapping.
Add a filter transformation before target and if the
time is the same then there is a problem.
(OR) Look for the performance monitor in the Sessions
property sheet and view the counters.
Solutions:
If High error rows and rows in lookup cache indicate a
mapping bottleneck.
Optimize Single Pass Reading:
Optimize Lookup transformation :
1. Caching the lookup table:
When caching is enabled the informatica
server caches the lookup table and queries the
cache during the session. When this option is
not enabled the server queries the lookup
table on a row-by row basis.
Static, Dynamic, Shared, Un-shared and
Persistent cache
2. Optimizing the lookup condition
Whenever multiple conditions are placed, the
condition with equality sign should take
precedence.
3. Indexing the lookup table
The cached lookup table should be indexed on
order by columns. The session log contains
the ORDER BY statement
The un-cached lookup since the server issues a
SELECT statement for each row passing
into lookup transformation, it is better to
index the lookup table on the columns in the
condition


Optimize Filter transformation:
You can improve the efficiency by filtering early
in the data flow. Instead of using a filter
transformation halfway through the mapping to
remove a sizable amount of data.
Use a source qualifier filter to remove those same
rows at the source,
If not possible to move the filter into SQ, move
the filter transformation as close to the
source
qualifier as possible to remove unnecessary data
early in the data flow.
Optimize Aggregate transformation:
1. Group by simpler columns. Preferably numeric
columns.
2. Use Sorted input. The sorted input decreases
the use of aggregate caches. The server
assumes all input data are sorted and as it
reads it performs aggregate calculations.
3. Use incremental aggregation in session property
sheet.
Optimize Seq. Generator transformation:
1. Try creating a reusable Seq. Generator
transformation and use it in multiple mappings
2. The number of cached value property determines
the number of values the informatica
server caches at one time.
Optimize Expression transformation:
1. Factoring out common logic
2. Minimize aggregate function calls.
3. Replace common sub-expressions with local
variables.
4. Use operators instead of functions.

4. Sessions
If you do not have a source, target, or mapping
bottleneck, you may have a session bottleneck.
You can identify a session bottleneck by using the
performance details. The informatica server
creates performance details when you enable Collect
Performance Data on the General Tab of
the session properties.
Performance details display information about each
Source Qualifier, target definitions, and
individual transformation. All transformations have some
basic counters that indicate the
Number of input rows, output rows, and error rows.
Any value other than zero in the readfromdisk and
writetodisk counters for Aggregate, Joiner,
or Rank transformations indicate a session bottleneck.

Low bufferInput_efficiency and BufferOutput_efficiency
counter also indicate a session
bottleneck.
Small cache size, low buffer memory, and small commit
intervals can cause session bottlenecks.
5. System (Networks)

18. How to improve the Session performance?
1 Run concurrent sessions
2 Partition session (Power center)
3. Tune Parameter - DTM buffer pool, Buffer block size, Index cache size,
data cache size, Commit Interval, Tracing level (Normal, Terse, Verbose
Init, Verbose Data)
The session has memory to hold 83 sources and targets. If it is more, then
DTM can be increased.
The informatica server uses the index and data caches for Aggregate, Rank,
Lookup and Joiner
transformation. The server stores the transformed data from the above
transformation in the data
cache before returning it to the data flow. It stores group information for
those transformations in
index cache.
If the allocated data or index cache is not large enough to store the date,
the server stores the data
in a temporary disk file as it processes the session data. Each time the
server pages to the disk the
performance slows. This can be seen from the counters .
Since generally data cache is larger than the index cache, it has to be
more than the index.
4. Remove Staging area
5. Tune off Session recovery
6. Reduce error tracing

19. What are tracing levels?
Normal-default
Logs initialization and status information, errors
encountered, skipped rows due to transformation errors, summarizes session
results but not at the row level.
Terse
Log initialization, error messages, notification of rejected
data.
Verbose Init.
In addition to normal tracing levels, it also logs
additional initialization information, names of index and data files used
and detailed transformation statistics.
Verbose Data.
In addition to Verbose init, It records row level logs.

20. What is Slowly changing dimensions?
Slowly changing dimensions are dimension tables that have
slowly increasing data as well as updates to existing data.

21. What are mapping parameters and variables?
A mapping parameter is a user definable constant that takes up a value
before running a session. It can be used in SQ expressions, Expression
transformation etc.
Steps:
Define the parameter in the mapping designer - parameter & variables .
Use the parameter in the Expressions.
Define the values for the parameter in the parameter file.

A mapping variable is also defined similar to the parameter except that the
value of the variable is subjected to change.
It picks up the value in the following order.
1. From the Session parameter file
2. As stored in the repository object in the previous run.
3. As defined in the initial values in the designer.
4. Default values

Informatica Interview Questioner-Ambarish

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Informatica Interview Questioner-Ambarish

Uploaded by

Copyright:

Available Formats

Informatica Questionnaire

Business Intelligence ambarish Page 1 of 211

You might also like