Thursday, September 13, 2007

Expressions and the Data Flow Pipeline

File this under the "more than one way to skin a cat" topic. I am creating a demo for one of my upcoming presentations, and I noticed a technique for cleaning data from extracts that escaped my earlier reading/training/lab time.

In particular, I am creating some SSIS packages to load Comma Separated Variable (csv) files. Only these files, from an extract provider over which I have little or no control, contain embedded quotes in them. For example a data record would read:

"column1 value", "column2 value", "column3 value"

This did not strike me as odd until the flat file connection manager for this extract had NO desire to see the quotes as part of the field delimiter. Unfortunately, my first pass at loading this data created a table with the following values.

Column 1
Column 2
Column 3

Obviously, NOT what I wanted, even in a staging table, with which to begin my Transform and Load processing.

One way or another, I would have to strip the double quotes out of the data, and for some reason I latched onto the idea of using an expression within an SSIS Derived Column transformation.

Soon enough, I had the Derived Column transform, and was feeding it a steady diet of DT_STR columns from the Flat File Data Source. It seemed as though surely the REPLACE function would suit my needs, but how to express the target expression is "? To little surprise REPLACE(column1, ""","") is NOT what the SSIS Execution Engine is looking for. Thankfully, in the books online Microsoft has furnished a reference for common escape sequences. In my case, " was just what the doctor ordered. REPLACE(column1,"\"","") effectively stripped all of the quote characters from the data, leaving my staging table looking a little more appealing.

Somewhere on the Web, I found a custom SSIS component to strip quotes from an extract. That still may be worthwhile, as my candidate inbound set only had 50,000 or so records.

If anyone out there reading this has ideas as to how to download files (using http), and without a Web Service, drop me a reply via the blog. Perhaps that will be the subject of my next post.

No comments: