Monday, November 12, 2007

SQL Saturday Presentation Available


My presentation on Data Mining with SQL 2005 will soon be available for download here.

Thanks again to everybody for a wonderful experience. This was beyond a doubt the best produced, best-attended, Code Camp event I have participated in as a speaker. The facilities at Seminole Community College were just first-class, as pictured here in the main atrium.

Friday, November 09, 2007

On My Way to SQL Saturday!

Flying into Orlando today for my presentation on data mining with SQL 2005, I have a few traveling observations.

1. Southwest Airlines new, business-friendly boarding process rocks!
2. The Orlando International Airport is my favorite, among major-city terminals.
3. The Hertz Neverlost GPS system blows. A batting average of .333 is great if you are a major league ballplayer. However, most consumers of these services (devices) can achieve that level of proficiency without a paper map. I guess I will await, patiently the general availability of the promising Dash. In the meantime, truth in advertising would suggest Hertz rename the product to EVERLOST.

Hardly BI related, but it is my blog.
Posted by Picasa

Thursday, November 08, 2007

PerformancePoint Server Evaluation Editions

Microsoft announced yesterday the availability of PerformancePoint Server 2007 evaluation editions. You can get the 32-bit version here.

This is surely one of the more exciting products released in the last few years that will enable organizations to drive strategy down to the individual worker, via a "BI for the Masses" execution.

High Tech Barn Raising!

While I have never participated in a physical barn-raising myself, the good folks at Tech Birmingham have managed to stage an awe-inspiring event this weekend. The plan is starting tomorrow, November 9 and finishing Sunday night, November 11, to complete a beta release of an Internet startup.

If you are interested in the concept, or participating, you can get more information at Startup Birmingham.

Unfortunately, I will not be participating this year, as I have previously committed to speaking at SQL Saturday. See you Saturday if you are in the Orlando area.

Tuesday, October 30, 2007

Wanted: Talented BI Professionals

Wondering why you have not seen a fresh blog post lately? It is because the BI practice at ComFrame is growing at a record pace. If you are an exceptionally talented, experienced BI professional and are interested in joining a premier consulting organization and overall great place to work, please email your resume to work@comframe.com. Please place sqlbi.blogspot.com in the subject header so your resume will get immediate attention. Third party submissions will not be considered. If you want to learn more about ComFrame, check out www.comframe.com.

Tuesday, October 09, 2007

Presentation Available

My presentation from Alabama .Net Code Camp V is now available here. I am already looking forward to the next Code Camp event.

Monday, October 08, 2007

Alabama .Net Code Camp V

Well the Fifth Alabama .Net Code Camp is in the books. I would like to express my humble gratitude to attendees who drove from the ends of the state to attend the event. If you had half as much fun as I did, then it was worth the time and effort. As promised, my presentation "Everything I Ever Wanted to Know About Beginning Business Intelligence with SQL 2005" will be available via ComFrame's website soon. As soon as I know the files are available, I will post an update here so the news can easily get out to RSS readers.

If you are interested in the methodology discussed to 'bootstrap' db connections and store configurations in the database, please read my post on SSIS configurations.

I have added some content to the SQLBI blog, notably a list of web resources I referenced in Saturday's discussion, and a few links to Amazon.com for the book titles I mentioned as well. Finally, I would ask that if you enjoyed the discussion on Saturday, or anything that I am writing here, please make use of the comment links available on my posts to send feedback.

Finally, I am working on a post to go up by the middle of this week, that will walk through the steps (and screen shots) I used to create the examples and demos in Saturday's discussion. Stay tuned here for updates.

Friday, September 21, 2007

File Download via HTTP

SSIS has some interesting components for 'out of the box' support to obtain data via FTP and even Web Services. However, in creating a demonstration project for a class I am teaching, I found the need to download files from a web page via http. I found a cool little script on sqljunkies that did just the trick.

To further describe the problem, I am interested in downloading the weekly performance statistics for Division 1 College Football, graciously made available by the NCAA here. There are three files, one each for the schedule, offensive and defensive statistics. I really want the package to fire on a weekly basis, and get the new stats updating my data mart.

As a brute-force, first-pass effort, I created package variables to represent the remote file uri, the local filename and local path. In the case of the Offensive production statistics, these variables were named RemoteUriOffense, LocalFileNameOffense, and Extract_Home. Unfortunately, one of the downsides of the script task is having to use some variable at the package (or container) level and explicitly declare that variable as being available for read access by the script. An example of that is below:



Substituting the variable names declared as readable by the script, in the script code as illustrated below.

While the script task from sqljunkies does get the job done, there are a few disadvantages. First and foremost, creation of the variables for each of the files. I could have done some manipulation of a single set of variables, iterating through for all three files, and using a single script task, firing the task once for each file instance. Additionally, given the variable infrastructure needed for the script task, and the probability this is not the last time I will ever get data via http, this may be a good candidate to create a custom component.

Among the advantages of the custom component, all the logic for file download would be centralized. What happens, in my three-script components when the a change occurs to the download logic (for instance)? Yep, I am replicating the change for each extract. True enough this problem is solved by using a single script task and iterating over a list of files to download.

However, this may just be the perfect task on which to base a first custom component. Stay tuned.

Thursday, September 13, 2007

Expressions and the Data Flow Pipeline

File this under the "more than one way to skin a cat" topic. I am creating a demo for one of my upcoming presentations, and I noticed a technique for cleaning data from extracts that escaped my earlier reading/training/lab time.

In particular, I am creating some SSIS packages to load Comma Separated Variable (csv) files. Only these files, from an extract provider over which I have little or no control, contain embedded quotes in them. For example a data record would read:

"column1 value", "column2 value", "column3 value"

This did not strike me as odd until the flat file connection manager for this extract had NO desire to see the quotes as part of the field delimiter. Unfortunately, my first pass at loading this data created a table with the following values.

Column 1
Column 2
Column 3
"Value"
"Value"
"Value"

Obviously, NOT what I wanted, even in a staging table, with which to begin my Transform and Load processing.

One way or another, I would have to strip the double quotes out of the data, and for some reason I latched onto the idea of using an expression within an SSIS Derived Column transformation.

Soon enough, I had the Derived Column transform, and was feeding it a steady diet of DT_STR columns from the Flat File Data Source. It seemed as though surely the REPLACE function would suit my needs, but how to express the target expression is "? To little surprise REPLACE(column1, ""","") is NOT what the SSIS Execution Engine is looking for. Thankfully, in the books online Microsoft has furnished a reference for common escape sequences. In my case, " was just what the doctor ordered. REPLACE(column1,"\"","") effectively stripped all of the quote characters from the data, leaving my staging table looking a little more appealing.

Somewhere on the Web, I found a custom SSIS component to strip quotes from an extract. That still may be worthwhile, as my candidate inbound set only had 50,000 or so records.

If anyone out there reading this has ideas as to how to download files (using http), and without a Web Service, drop me a reply via the blog. Perhaps that will be the subject of my next post.


Thursday, September 06, 2007

The Dummy Script Task, or Conditional Execution 100

So I am working on building a data warehouse ETL system with SSIS. Like so many times before, there may be certain steps in the process that I wish to execute, based on some condition. Maybe it is just one of Larry Wall's virtues, laziness, but I want a process to drive flawlessly whenever executed, 3pm or 3am, manually or automatically. An example of this is truncation of destination tables in a data mart / data warehouse project in either a Continuous Integration environment or in an execution of a priming load. To that end, I have employed the following trick I call the Dummy Script Task. SSIS provides a nice mechanism to conditionally execute one task, based upon evaluation of an expression, constraint or both.

Truncating the destination table is easy enough, using an Execute SQL Task. However, creating the conditional execution is a less straightforward. In order to employ conditional execution, the package must have two tasks in between which sits the conditional execution flow. The Truncate Destination Table (Execute SQL Task) is just sitting in a Sequence Container (labeled "Reload_YN" for clarity), with no flow, conditional or otherwise onto which to create the condition.

Enter the dummy script task.

The dummy script task is just that. A Script Task from the toolbar, with NO CODE BEHIND IT! The task merely serves as the initial end-point for a flow that will conditionally execute. It could not be more simple, just drag a Script Task from the toolbar and optionally rename it.

So as you will note from the figure at left, there is a blue data flow connecting the Script Task to the Truncate Destination Tables task. The flow is further noted as conditional by the fx label attached to it. Right mouse clicking a flow, and choosing the Edit selection presents you with a dialog similar to the one below.


This dialog allows us to set the Truncate task to execute ONLY if one or more conditions are true. I have created a package variable, called ReloadYN, of type boolean, that determines if tasks related only to a reload are to be executed. Perhaps the subject of another post, the package variable can be set many ways, via a dtexec command line option, script, or in this case via an indirect package configuration. By having this little gem in my standard Dimension and Fact table package templates, I know the entire database will be truncated (or not) on a consistent basis. One more thing to not have to worry about.















Putting it all together

Based on the value of ReloadYN at run-time, either the Truncate Destination Tables task executes (wiping the slate clean for a new load/test) or not. You can verify this by executing the package, or sequence container, from within Business Intelligence Developer Studio. The 'skipped' task will turn neither red, yellow or green. Execution continues at the first step following the Sequence Container.

Thursday, August 30, 2007

Coming Soon to A Screen Near You

Ahh, the smell of bourbon and pigskin. Football has returned to the South. It is going to be a busy Fall season between gridiron action and the following teaching/speaking events on my calendar.

September 25 - Microsoft Business Intelligence Boot Camp
I will be conducting the program on SQL Server Integration Services for this Microsoft Partner event.

October 6 - Alabama .Net Code Camp V
A couple of abstracts have been submitted, stay tuned for details should I get an invitation. Even if I am NOT speaking, I will be there. Well worth it if you can get to Birmingham for a code-filled Saturday.

November 10 - SQL Saturday
I will be doing a once more refined edition of my Data Mining with SQL Server 2005 presentation, with an eye to new features in SQL Server 2008.

Monday, August 27, 2007

SQL Saturday!

I have been invited to present on SQL Server Data Mining at the SQL Saturday event, in Orlando, Florida on November 10, 2007. The event is sponsored by the Orlando Chapter of PASS and further details can be found at www.sqlsaturday.com.

Friday, August 24, 2007

SSIS Lookup Transforms and Varchar Natural Keys

A recent project required the use of inferred dimension members due to early arriving facts. My initial thought was to do something like the figure below. I would use the client natural key for an in-bound fact row, and first attempt to find it in the dimension using a Lookup transform and a cached reference set. If the dimension key was found, it is pipelined to the Union All and eventually inserted into the fact table. If the dimensional key was not located, the row was dispatched to the Add Inferred Client Member task.
Essentially, this task would execute a T-SQL stored procedure to find the dimension key in the table and insert an inferred member if it is not found. The primary benefit to this is being able to avoid the case sensitivity of the SSIS engine and instead have the matching logic be executed in SQL Server. The row is then dispatched to the 2nd Pass Lookup.

The 2nd Pass Lookup is a Lookup transform identical to the initial attempt to find a matching dimension member, except with reference set caching disabled. This allows newly added rows, inserted by the Add Inferred Client Member task, to be part of the dimension lookup. If for some reason the candidate row is not found at this stage (perhaps an error getting the inferred member inserted was not caught), the row has an ErrorReason column added via a Derived Column transform, and is sent into an error flow. All inbound rows that find a dimension surrogate key in the 2nd Pass Lookup are moved into the destination (ie Production) fact table.

Caveat SSIS Developer

In this case the natural key for this dimension was a varchar datatype, and my initial impression was the use of a Lookup transformation would make the lookup of dimensional keys during fact table load straightforward. Unfortunately, there is a difference in how SSIS and SQL server handle string comparisons. SSIS executes a case-sensitive lookup. A fact table candidate natural key of 'A1234' would not match a dimension key of 'a1234'. Some lookup 'failures' in the initial lookup were actually case sensitivity differences. The first-cut of this process actually was two lookups, with the Execute SQL task between them only to insert an inferred dimension member. As duplicate natural keys for inferred members were inserted, the case sensitivity issue became evident.

Wednesday, August 01, 2007

Speaking In "LA"

It was my pleasure to be the featured speaker at the Lower Alabama .Net User Group (LANUG), in Mobile, AL on July 31, 2007. The presentation was on the data mining features in SQL Server 2005 and can be downloaded as a PDF. I would like to add that Doug Greene and Matthew Hughes are doing a great job with a user group that just recently celebrated the beginning of their second year.

As the post-presentation discussion swerved into dimensional data warehousing, there was some discussion on doing a dimensional data modeling session at the next Alabama .Net Code Camp. If you are interested, please post a reply here or send me an email at bralston@comframe.com.

Monday, April 16, 2007

Alabama Code Camp IV

It was my distinct pleasure to be chosen as a speaker at the fourth Alabama .Net Code Camp this past Saturday. A few of the attendees of my presentation on Agile Database Techniques for .Net requested that I post a link to the Database Publication Wizard, a cool tool for moving data and structures from SQL Server databases via text files and SQL.

The PowerPoint from the presentation (and supporting files for examples) will follow shortly.