You are on page 1of 9

Import Excel unicode data with SQL Server Integration Services

Problem One task that most people are faced with at some point in time is the need to import data into SQL Server from an Excel spreadsheet. We have talked about different approaches to doing this in previous tips using OPENROWSET, OPENQUERY, Link Servers, etc... These options are great, but they may not necessarily give you as much control as you may need during the import process. Another approach to doing this is using SQL Server Integration Services (SSIS). With SSIS you can import different types of data as well as apply other logic during the importing process. One problem though that I have faced with importing data from Excel into a SQL Server table is the issue of having to convert data types from Unicode to non-Unicode. SSIS treats data in an Excel file as Unicode, but my database tables are defined as non-Unicode, because I don't have the need to store other code sets and therefore I don't want to waste additional storage space. Is there any simple way to do this in SSIS? Solution If you have used SSIS to import Excel data into SQL Server you may have run into the issue of having to convert data from Unicode to non-Unicode. By default Excel data is treated as Unicode and also by default when you create new tables SQL Server will make your character type columns Unicode as well (nchar, nvarchar,etc...) If you don't have the need to store Unicode data, you probably always use non-Unicode datatypes such as char and varchar when creating your tables, so what is the easiest way to import my Excel data into non-Unicode columns? The following shows two different examples of importing data from Excel into SQL Server. The first example uses Unicode datatypes and the second does not. Here is what the data in Excel looks like.

Example 1 - Unicode data types in SQL Server Our table 'unicode" is defined as follows: CREATE TABLE [dbo].[unicode]( [firstName] [nvarchar](50) NULL, [lastName] [nvarchar](50) NULL ) ON [PRIMARY]

If we create a simple Data Flow Task and an Excel Source and an OLE DB Destination mapping firstname to firstname and lastname to lastname the import works great as shown below.

Example 2- non-Unicode data types in SQL Server Our table 'non_unicode" is defined as follows: CREATE TABLE [dbo].[non_unicode]( [firstName] [varchar](50) NULL, [lastName] [varchar](50) NULL ) ON [PRIMARY] If we map the columns firstname to firstname and lastname to lastname we automatically get the following error in the OLE DB Destination. Columns "firstname" and "firstname" cannot convert between unicode and non-unicode data types...

If we execute the task we get the following error dialog box which gives us additional information.

Solving the Problem So based on the error we need to convert the data types so they are the same types. If you right click on the OLE Destination and select "Show Advanced Editor" you have the option of changing the DataType from string [DT_STR] to Unicode string [DT_WSTR]. But once you click on OK it looks like the changed was saved, but if you open the editor again the change is gone and back to the original value. This makes sense since you can not change the data type in the actual table.

If you right click on the Excel Source and select "Show Advanced Editor" you have the option of changing the DataType from Unicode string [DT_WSTR] to string [DT_STR] and the change is saved.

If you click OK the change is saved, but now you get the error in the Excel Source that you can not convert between unicode and non-unicode as shown below. So this did not solve the problem either.

Using the Data Conversion Task So to get around this problem we have to also use a Data Conversion task. This will allow us to convert data types so we can get the import completed. The following picture shows the "Data Conversion" task in between the Excel Source and the OLE DB Destination.

If you right click on "Data Conversion" and select properties you will get a dialog box such as the following. In here we created an Output Alias for each column. Our firstname column becomes firstname_nu (this could be any name you want) and we are making the output be a non-unicode string. In addition we do the same thing for the lastname column.

If we save this and change the mapping as shown to use our new output columns and then execute the task we can see that the import was successful.

As you can see this is pretty simple to do once you know that you need to use the Data Conversion task to convert the data types. Next Steps Next time you are importing data into SQL Server, don't forget about using the Data Conversion task if you are importing unicode data types into non-unicode columns If you encounter this error Columns "xx" and "xx" cannot convert between unicode and non-unicode data types...remember this tip Take a look at these other SSIS tips

SQL Server Integration Services 2008: Importing Excel Data Using Derived Column Transformation

In our most recent articles, we have explained how SQL Server 2008 Integration Services Import and Export Wizard facilitate the transfer of data between an Excel spreadsheet and a table residing in a SQL Server database. The complexity involved in performing this seemingly straightforward task results from different and sometimes incompatible data types that are used by these two data stores. As we have demonstrated, the wizard mitigates potential issues introduced by such incompatibilities by taking advantage of Data Conversion Transformation (which is automatically added to the Data Flow task of the resulting SSIS package). However, another approach produces an equivalent outcome by employing Derived Column Transformation instead. This article describes its characteristics and demonstrates its usage. Derived Column Transformation is one of the more popular components of Data Flow Task. Despite its simplicity (as far as the ease of use is concerned), it is quite powerful and flexible, primarily due to the SSIS expression language, which is exposed through its interface. In order to become familiar with its basic functionality, let's step through an example that will mirror the scenario covered in our previous article (which we handled at that time by leveraging Data Conversion Transformation). To start, create a sample spreadsheet that will serve as a source of data in our import operation. As before, we will limit it to two columns with strings shorter than 255 characters each (represented as Unicode String DT_WSTR data type when viewed in Business Intelligence Development Studio) containing the following entries: Fear and Loathing in Las Vegas Terry Gilliam The Saragossa Manuscript Wojciech Hass Our destination will consist of a target table in an arbitrarily chosen user database to which you have dbo privileges. Its first column, Title, will have varchar data type, while the second one, named Director, will be defined as nvarchar. The length of both will be limited to 100 characters (respectively, non-Unicode and Unicode). This can be accomplished by executing the following T-SQL statement: CREATE TABLE [dbo].[TwoThumbsUp]( [Title][varchar](100) NOT NULL, [Director][nvarchar](100) NOT NULL ) This time, instead of relying on the wizard to guide us through the automatic generation of a package, we will use a manual process that involves creating an Integration Services project within Business Intelligence Development Studio. Once you have launched it (via its shortcut in the Start -> All Programs), select New -> Project... from the File menu. In the resulting dialog box, choose the Integration Services Project entry in the Templates section, as well as specify an arbitrary Name and Location. This will yield a new single project solution containing an SSIS package Package.dtsx, which you can easily identify by examining the content of the Solution Explorer window. We will modify its content by using the Package Designer and Connection Managers windows. Right-click on the latter to add connections to our two data stores. In the context-sensitive menu, select the following: New Connection... item, which will trigger the display of the Add SSIS Connection Manager dialog box. Choose the EXCEL entry corresponding to the Connection manager for Excel files.

Click on the Add... command button. Point to the location of our sample spreadsheet. Select the appropriate Excel version, Uncheck the First row has column names checkbox. To facilitate connectivity to the TwoThumbsUp table we created earlier, select the New OLE DB Connection... entry from the same context-sensitive menu of the Connection Managers window. Provide the target Server name, corresponding authentication method, and name of the database where the table resides (use Test Connection to verify accuracy of the information you provided).

At this point, we are ready to populate our package with components that will carry out data extraction, transformation, and loading. We already know (based on our earlier analysis of the package generated by Import and Export Wizard) that these actions can be implemented using a single Data Flow Task, consisting of a source and a destination component, as well as a transformation that will handle data conversion. To create such a task, simply click on the link appearing in the center of the Data Flow tab of the designer window. Next, drag the Excel Source icon from the Data Flow Sources section of the Toolbox onto its empty surface. Use the Edit option in the context-sensitive menu of the newly added component to display its editor window. Confirm that the OLE DB Connection Manager listbox contains the Excel Connection Manager entry, the Data access mode is set to Table or view, and the Name of the Excel sheet is configured as Sheet1$ (use the Preview command button to verify that you see content of both sample rows). Now it is time to add our package Derived Column Transformation. Drag it from the Data Flow Transformations section of the Toolbox onto the designer's surface directly under Excel Source. Connect the two by extending the green arrow originating from the bottom edge of the top one (representing its standard output) until it reaches the upper edge of the one underneath it. Activate Derived Column Transformation Editor by selecting the Edit... option from its context sensitive menu. Note that the resulting window is divided into three sections. The upper left one gives you access to package variables (which we have not yet introduced) and input columns (consisting, in our case, of F1 and F2 and representing content of our sample spreadsheet). To the right, you will find operations that can be applied in order to generate the desired output, including following: Mathematical Functions (ABS, CEILING, EXP, FLOOR, LN, LOG, POWER, ROUND, SIGN, SQUARE, and SQRT) String Functions (CODEPOINT, FINDSTRING, HEX, LEN, LOWER, LTRIM, REPLACE, REPLICATE, REVERSE, RIGHT, RTRIM, SUBSTRING, TRIM, and UPPER) Date and Time Functions (DATEADD, DATEDIFF, DATEPART, DAY, GETDATE(), GETUTCDATE(), MONTH, and YEAR) NULL Functions (ISNULL and NULL for each of the SSIS data type) Mathematical and logical Operators Type Casts. That last category is of particular importance to us, since it will allow us to implement data type mapping in a manner that facilitates SSIS data conversion rules.

To illustrate the need for such conversion, let's pipe output columns from the newly added transformation directly to the target table (without any modifications). Drag OLE DB Destination from the Data Flow Destinations section of the Toolbox and place it below the Derived Column component on the designer's surface. Connect the output of the latter (represented by a green arrow originating from its bottom edge) to the input of the former.

Launch its editor by selecting the Edit... item from its context-sensitive menu. Ensure that its entries point to the previously created OLE DB connection manager and Data access mode is listed as Table or view - fast load, as well as setting Name of the table or the view entry to [dbo].[TwoThumbsUp]. Switch to the Mappings section in the editor interface and connect the input columns (F1 and F2) to their respective destination columns (Title and Director). Once you click on the OK command button to confirm your selection, you will notice a red cross appearing on the OLE DB Destination, with a tooltip informing you that Columns "F1" and "Title" cannot convert between unicode and non-unicode string data types. To remediate this issue, open the Derived Column Transformation Editor. In the lower portion of its window, select the <add as a new column> entry in the Derived Column listbox. Provide a descriptive name (e.g. Derived Column F1) in the Derived Column Name column and type in the typecasting expression that will convert the unicode string ([DT_WSTR]) of 255 characters in length into non-unicode ([DT_STR]) 100 characters long (using 1252 West European Latin code page). This takes the form of (DT_STR, 100, 1252)(F1), which you can also put together by dragging relevant functions and column names from their respective panes into the Expression column in the lower part of the window. To verify that the resulting output has the desired characteristics, examine the Data Type and Length columns (which should contain at this point string [DT_STR] and 100, respectively). Confirm your choices by clicking on the OK command button and launch OLE DB Destination Editor. Switch to its Mappings section, delete the link between the F1 and Title columns and create one between Derived Column F1 and Title instead. Once you are back in the designer interface, you will notice that the red cross has been replaced by a yellow exclamation mark, which means that our original error condition has been resolved. As the tooltip indicates, at this point we need to be aware that Truncation may occur due to inserting data from data flow column 'F2' with a length of 255 to database column 'Director' with a length of 100 (you can also view this warning by opening the Error List window from the View menu). To remediate this, return to the Derived Column Transformation Editor, create another entry in the Derived Column Name (we will call it Derived Column F2) with the <add as a new column> option in the Derived Column listbox and (DT_WSTR, 100)(F2) as the expression. Once completed, launch the OLE DB Destination Editor again, switch to its Mappngs section, and replace the mapping between F2 and Director with another one between Derived Column F2 and Director. The resulting package demonstrates our intended goal, by utilizing Derived Column Transformation to facilitate conversion between incompatible data types. In our next article, we will review other types of SSIS components that assist with resolving the most common ETL challenges. Additional Resources MSDN Derived Column Transformation MSDN How to: Derive Column Values Using the Derived Column Transformation

You might also like