Thursday, September 06, 2007

The Dummy Script Task, or Conditional Execution 100

So I am working on building a data warehouse ETL system with SSIS. Like so many times before, there may be certain steps in the process that I wish to execute, based on some condition. Maybe it is just one of Larry Wall's virtues, laziness, but I want a process to drive flawlessly whenever executed, 3pm or 3am, manually or automatically. An example of this is truncation of destination tables in a data mart / data warehouse project in either a Continuous Integration environment or in an execution of a priming load. To that end, I have employed the following trick I call the Dummy Script Task. SSIS provides a nice mechanism to conditionally execute one task, based upon evaluation of an expression, constraint or both.

Truncating the destination table is easy enough, using an Execute SQL Task. However, creating the conditional execution is a less straightforward. In order to employ conditional execution, the package must have two tasks in between which sits the conditional execution flow. The Truncate Destination Table (Execute SQL Task) is just sitting in a Sequence Container (labeled "Reload_YN" for clarity), with no flow, conditional or otherwise onto which to create the condition.

Enter the dummy script task.

The dummy script task is just that. A Script Task from the toolbar, with NO CODE BEHIND IT! The task merely serves as the initial end-point for a flow that will conditionally execute. It could not be more simple, just drag a Script Task from the toolbar and optionally rename it.

So as you will note from the figure at left, there is a blue data flow connecting the Script Task to the Truncate Destination Tables task. The flow is further noted as conditional by the fx label attached to it. Right mouse clicking a flow, and choosing the Edit selection presents you with a dialog similar to the one below.


This dialog allows us to set the Truncate task to execute ONLY if one or more conditions are true. I have created a package variable, called ReloadYN, of type boolean, that determines if tasks related only to a reload are to be executed. Perhaps the subject of another post, the package variable can be set many ways, via a dtexec command line option, script, or in this case via an indirect package configuration. By having this little gem in my standard Dimension and Fact table package templates, I know the entire database will be truncated (or not) on a consistent basis. One more thing to not have to worry about.















Putting it all together

Based on the value of ReloadYN at run-time, either the Truncate Destination Tables task executes (wiping the slate clean for a new load/test) or not. You can verify this by executing the package, or sequence container, from within Business Intelligence Developer Studio. The 'skipped' task will turn neither red, yellow or green. Execution continues at the first step following the Sequence Container.

No comments: