Friday, August 24, 2007

SSIS Lookup Transforms and Varchar Natural Keys

A recent project required the use of inferred dimension members due to early arriving facts. My initial thought was to do something like the figure below. I would use the client natural key for an in-bound fact row, and first attempt to find it in the dimension using a Lookup transform and a cached reference set. If the dimension key was found, it is pipelined to the Union All and eventually inserted into the fact table. If the dimensional key was not located, the row was dispatched to the Add Inferred Client Member task.
Essentially, this task would execute a T-SQL stored procedure to find the dimension key in the table and insert an inferred member if it is not found. The primary benefit to this is being able to avoid the case sensitivity of the SSIS engine and instead have the matching logic be executed in SQL Server. The row is then dispatched to the 2nd Pass Lookup.

The 2nd Pass Lookup is a Lookup transform identical to the initial attempt to find a matching dimension member, except with reference set caching disabled. This allows newly added rows, inserted by the Add Inferred Client Member task, to be part of the dimension lookup. If for some reason the candidate row is not found at this stage (perhaps an error getting the inferred member inserted was not caught), the row has an ErrorReason column added via a Derived Column transform, and is sent into an error flow. All inbound rows that find a dimension surrogate key in the 2nd Pass Lookup are moved into the destination (ie Production) fact table.

Caveat SSIS Developer

In this case the natural key for this dimension was a varchar datatype, and my initial impression was the use of a Lookup transformation would make the lookup of dimensional keys during fact table load straightforward. Unfortunately, there is a difference in how SSIS and SQL server handle string comparisons. SSIS executes a case-sensitive lookup. A fact table candidate natural key of 'A1234' would not match a dimension key of 'a1234'. Some lookup 'failures' in the initial lookup were actually case sensitivity differences. The first-cut of this process actually was two lookups, with the Execute SQL task between them only to insert an inferred dimension member. As duplicate natural keys for inferred members were inserted, the case sensitivity issue became evident.

No comments: