You are on page 1of 9

Parsing Microsoft Excel with the Custom Data Transformation

© 2010 Informatica
Abstract
You can parse data from a Microsoft Excel spreadsheet with a Custom Data transformation in Informatica Developer. The
Custom Data transformation returns row data to relational tables. This article describes how to configure the Custom Data
transformation.

Supported Versions
¨ Informatica Developer 9.0.1

Table of Contents
Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
Data Transformation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Logical Data Object Model Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Custom Data Transformation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Configure Relational Hierarchy Ports. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Export the XML Schema. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Create and Deploy the Data Transformation Project. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Create the Data Transformation Project. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Deploy the Project. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Configure the Service Name . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Preview the Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Deploy the Application and the Service. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

Overview
Your organization maintains employee records in Microsoft Excel sheets. Before you purchase a Human Resources
application, you want to expose the Excel data as a virtual database that you can query.

This article explains how to design a data object mapping in the Developer tool. The data object mapping includes a Custom
Data transformation. The Custom Data transformation calls a Data Transformation service to parse the data from the Excel
spreadsheets. The Custom Data transformation returns row data.

To migrate the data from Microsoft Excel spreadsheets complete the following tasks:

¨ Create a logical data model in the Developer tool.

¨ Design a flat file data object to pass Excel file names to the Custom Data transformation.

¨ Create the Custom Data transformation.

¨ Define the output ports group structure.

¨ Export the structure as an XML schema from the Developer tool.

¨ Create a Data Transformation Parser project in the Data Transformation Developer Studio.

¨ Import the XML schema to the Data Transformation project.

¨ Import a sample Excel file to the project.

¨ Design the project using the sample data.

¨ Deploy the project as a Data Transformation service in the Data Transformation repository.

2
¨ Add the Data Transformation service name to the Custom Data transformation in the Developer Tool.

¨ Run the data object to view the data.


¨ Deploy the logical data model as an application.

¨ Deploy the Data Transformation service to the same machine as the Data Integration Service that runs the
application.

Data Transformation
Data Transformation is an application that transforms file formats such as Excel spreadsheets or PDF documents. You can
transform data in formats such as HL7, EDI-X12, EDIFACT, SWIFT, NACHA, FIXBAI2, and DTCC.

Develop Data Transformation projects in the Data Transformation Studio visual editor. Deploy the projects from the Data
Transformation Studio to the Data Transformation repository. Informatica accesses the services in the Data Transformation
repository when you create Custom Data transformation mappings and when you run them.

The Data Transformation Engine is the process that runs a Data Transformation service from the repository.

Logical Data Object Model Overview


Create a logical data model to parse the data from Microsoft Excel sheets and pass the data to a logical data object.

The following figure shows the mapping in the Developer tool:

The mapping contains the following objects:

ExelSrc
Physical data object that determines which Microsoft Excel files to process. The input type is command. The
command lists all the Excel files in a directory. The Data Integration Service passes the name of the each
Microsoft Excel file in the directory to the Custom Data transformation.

3
The following image shows the runtime properties for the physical data object:

The list.bat command lists all .xls files in the NewComp directory.
@echo off
for /f %%a IN ('dir /b C:\NewComp\*.xls') do echo C:\NewComp\%%a

CDT_Employees
The Custom Data transformation receives the Microsoft Excel file name in the InputFileName port. The Custom
Data transformation passes the Data Transformation Engine the name of a Data Transformation service to run and
the EXCEL file name to process. The Data Transformation Engine opens the Microsoft Excel file, parses the data,
and returns XML to the Custom Data transformation. The Custom Data transformation passes rows of data to the
EMPL logical data object.

EMPL
The EMPL logical data object receives rows of employee data from the Custom Data transformation.

Excel Files
The employee data is in multiple Microsoft Excel files. Each Excel file contains a heading row and a row for each
employee.

The following table shows the first seven rows of employee data in a spreadsheet:

EMPNO ENAME JOB MGR SAL COMM DEPTNO

7369 SMITH CLERK 7902 800 20

7499 ALLEN SALESMAN 7698 1600 300 30

7521 WARD SALESMAN 7698 1250 500 30

7566 JONES MANAGER 7839 2975 20

7654 MARTIN SALESMAN 7698 1250 1400 30

Custom Data Transformation


Create the Custom Data transformation in the Developer tool before designing the Data Transformation project.

The following figure shows the Custom Data transformation configuration:

4
The following table shows the configuration:

Attribute Description

Name The transformation name is CDT_Employees.

Location The folder and the project location in the Model repository.

Create Options Create As Empty. Do not generate ports from a service. Manually define the output ports on the Custom
Data transformation Structure view.

Input Type File. The input is a file that contains the path to EMP.xls.

Output Type Buffer. The output is transformed data.

Configure Relational Hierarchy Ports


Define the output ports in the Custom Data transformation Structure view.

The CDT_Employees transformation returns data to one group of ports.

The following figure shows the ports in the EMPL group:

5
The output data is optional in the ports unless Not Null is enabled.

Export the XML Schema


Export an XML schema from the Structure view. The schema describes the structure of the output ports.

The schema name is Employee_Tables_Schema.xsd:


<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!-- ===== AUTO-GENERATED FILE - DO NOT EDIT ===== -->
<!-- ===== This file has been generated by Informatica Developer ===== -->
<xs:schema xmlns="www.informatica.com/CDET/XSD/mappingName_Unstructured_Data" xmlns:xs="http://www.w3.org/2001/
XMLSchema" attributeFormDefault="unqualified" elementFormDefault="qualified"
targetNamespace="www.informatica.com/CDET/XSD/mappingName_Unstructured_Data">
<xs:element name="PC_XSD_ROOT" type="PC_XSD_ROOTT"/>
<xs:complexType name="PC_XSD_ROOTT">
<xs:sequence>
<xs:element maxOccurs="unbounded" minOccurs="0" ref="EMPL"/>
</xs:sequence>
</xs:complexType>
<xs:element name="EMPL" type="EMPLT"/>
<xs:complexType name="EMPLT">
<xs:sequence>
<xs:element minOccurs="0" name="EMPNO" type="xs:double"/>
<xs:element minOccurs="0" name="ENAME" type="xs:string"/>
<xs:element minOccurs="0" name="JOB" type="xs:string"/>
<xs:element minOccurs="0" name="MGR" type="xs:double"/>
<xs:element minOccurs="0" name="SAL" type="xs:double"/>
<xs:element minOccurs="0" name="COMM" type="xs:double"/>
<xs:element minOccurs="0" name="DEPTNO" type="xs:double"/>
</xs:sequence>
</xs:complexType>
</xs:schema>

Columns that can contain null values have a minOccurs=0 attribute.

Create and Deploy the Data Transformation Project


Data Transformation provides a visual editor to create projects and deploy them as services.

Create the Data Transformation Project


Create a Parser project in the Data Transformation Developer Studio. A Parser project extracts data and returns XML. The
Parser project is named Emp_Excel_2.

The Developer Studio prompts for a schema to define the output data structure. Browse for the
Employees_Tables_Schema.xsd that you exported from the Developer tool.

6
The Developer Studio prompts for sample data for the project:

Browse for one of the Excel spreadsheets of employee data and import it into the project. The Developer Studio shows the
sample data.

After you configure the project, you can run it in the Developer tool. View the output.xml file to verify the data that the service
returns to the Custom Data transformation is correct.

The following figure shows the output.xml results in the Data Transforation Studio:

7
Deploy the Project
Deploy the Data Transformation project to the Data Transformation repository that is on the same machine as the Developer
tool. The project becomes a runnable service.

Configure the Service Name


Open the Custom Data transformation Service view in the Developer tool. Add the Data Transformation service name to the
Custom Data transformation. The service name must be in the Data Transformation repository.

The Developer tool retrieves available service names from the Data Transformation repository. The service name for the
Excel Parser project is Emp_Excel_2:

Preview the Data


You can preview the output data in the Developer tool Data Viewer.

Run the Data Viewer for the Custom Data transformation. The Data Transformation Engine runs the Emp_Excel_2 service in
the local Data Transformation repository. The service parses the each EXCEL spreadsheet and returns XML to the Data
Integration Service. The Custom Data transformation returns row data in the Data Viewer.

The following figure shows the Data Viewer:

8
Deploy the Application and the Service
Create a data service with the EMPL logical data object. Define the names for the virtual table and schema. Create an
application for the data service and deploy the application to a Data Integration Service. You must also deploy the Data
Transformation service to the Data Transformation repository on the same machine that runs the Data Integration Service.

Author
Ellen Chandler
Principal Technical Writer

Acknowledgements
The author would like to acknowledge Thiagu Sundaramurthy for his help with this article.

You might also like