Monday, November 12, 2007
SQL Saturday Presentation Available
My presentation on Data Mining with SQL 2005 will soon be available for download here.
Thanks again to everybody for a wonderful experience. This was beyond a doubt the best produced, best-attended, Code Camp event I have participated in as a speaker. The facilities at Seminole Community College were just first-class, as pictured here in the main atrium.
Friday, November 09, 2007
On My Way to SQL Saturday!
1. Southwest Airlines new, business-friendly boarding process rocks!
2. The Orlando International Airport is my favorite, among major-city terminals.
3. The Hertz Neverlost GPS system blows. A batting average of .333 is great if you are a major league ballplayer. However, most consumers of these services (devices) can achieve that level of proficiency without a paper map. I guess I will await, patiently the general availability of the promising Dash. In the meantime, truth in advertising would suggest Hertz rename the product to EVERLOST.
Hardly BI related, but it is my blog.
Thursday, November 08, 2007
PerformancePoint Server Evaluation Editions
This is surely one of the more exciting products released in the last few years that will enable organizations to drive strategy down to the individual worker, via a "BI for the Masses" execution.
High Tech Barn Raising!
If you are interested in the concept, or participating, you can get more information at Startup Birmingham.
Unfortunately, I will not be participating this year, as I have previously committed to speaking at SQL Saturday. See you Saturday if you are in the Orlando area.
Tuesday, October 30, 2007
Wanted: Talented BI Professionals
Tuesday, October 09, 2007
Presentation Available
Monday, October 08, 2007
Alabama .Net Code Camp V
Well the Fifth Alabama .Net Code Camp is in the books. I would like to express my humble gratitude to attendees who drove from the ends of the state to attend the event. If you had half as much fun as I did, then it was worth the time and effort. As promised, my presentation "Everything I Ever Wanted to Know About Beginning Business Intelligence with SQL 2005" will be available via ComFrame's website soon. As soon as I know the files are available, I will post an update here so the news can easily get out to RSS readers.
If you are interested in the methodology discussed to 'bootstrap' db connections and store configurations in the database, please read my post on SSIS configurations.
I have added some content to the SQLBI blog, notably a list of web resources I referenced in Saturday's discussion, and a few links to Amazon.com for the book titles I mentioned as well. Finally, I would ask that if you enjoyed the discussion on Saturday, or anything that I am writing here, please make use of the comment links available on my posts to send feedback.
Finally, I am working on a post to go up by the middle of this week, that will walk through the steps (and screen shots) I used to create the examples and demos in Saturday's discussion. Stay tuned here for updates.
Friday, September 21, 2007
File Download via HTTP
To further describe the problem, I am interested in downloading the weekly performance statistics for Division 1 College Football, graciously made available by the NCAA here. There are three files, one each for the schedule, offensive and defensive statistics. I really want the package to fire on a weekly basis, and get the new stats updating my data mart.
As a brute-force, first-pass effort, I created package variables to represent the remote file uri, the local filename and local path. In the case of the Offensive production statistics, these variables were named RemoteUriOffense, LocalFileNameOffense, and Extract_Home. Unfortunately, one of the downsides of the script task is having to use some variable at the package (or container) level and explicitly declare that variable as being available for read access by the script. An example of that is below:
Substituting the variable names declared as readable by the script, in the script code as illustrated below.
While the script task from sqljunkies does get the job done, there are a few disadvantages. First and foremost, creation of the variables for each of the files. I could have done some manipulation of a single set of variables, iterating through for all three files, and using a single script task, firing the task once for each file instance. Additionally, given the variable infrastructure needed for the script task, and the probability this is not the last time I will ever get data via http, this may be a good candidate to create a custom component.
Among the advantages of the custom component, all the logic for file download would be centralized. What happens, in my three-script components when the a change occurs to the download logic (for instance)? Yep, I am replicating the change for each extract. True enough this problem is solved by using a single script task and iterating over a list of files to download.
However, this may just be the perfect task on which to base a first custom component. Stay tuned.
Thursday, September 13, 2007
Expressions and the Data Flow Pipeline
In particular, I am creating some SSIS packages to load Comma Separated Variable (csv) files. Only these files, from an extract provider over which I have little or no control, contain embedded quotes in them. For example a data record would read:
"column1 value", "column2 value", "column3 value"
This did not strike me as odd until the flat file connection manager for this extract had NO desire to see the quotes as part of the field delimiter. Unfortunately, my first pass at loading this data created a table with the following values.
Column 1 | Column 2 | Column 3 |
"Value" | "Value" | "Value" |
Obviously, NOT what I wanted, even in a staging table, with which to begin my Transform and Load processing.
One way or another, I would have to strip the double quotes out of the data, and for some reason I latched onto the idea of using an expression within an SSIS Derived Column transformation.
Soon enough, I had the Derived Column transform, and was feeding it a steady diet of DT_STR columns from the Flat File Data Source. It seemed as though surely the REPLACE function would suit my needs, but how to express the target expression is "? To little surprise REPLACE(column1, ""","") is NOT what the SSIS Execution Engine is looking for. Thankfully, in the books online Microsoft has furnished a reference for common escape sequences. In my case, " was just what the doctor ordered. REPLACE(column1,"\"","") effectively stripped all of the quote characters from the data, leaving my staging table looking a little more appealing.
Somewhere on the Web, I found a custom SSIS component to strip quotes from an extract. That still may be worthwhile, as my candidate inbound set only had 50,000 or so records.
If anyone out there reading this has ideas as to how to download files (using http), and without a Web Service, drop me a reply via the blog. Perhaps that will be the subject of my next post.
Thursday, September 06, 2007
The Dummy Script Task, or Conditional Execution 100
Truncating the destination table is easy enough, using an Execute SQL Task. However, creating the conditional execution is a less straightforward. In order to employ conditional execution, the package must have two tasks in between which sits the conditional execution flow. The Truncate Destination Table (Execute SQL Task) is just sitting in a Sequence Container (labeled "Reload_YN" for clarity), with no flow, conditional or otherwise onto which to create the condition.
Enter the dummy script task.
The dummy script task is just that. A Script Task from the toolbar, with NO CODE BEHIND IT! The task merely serves as the initial end-point for a flow that will conditionally execute. It could not be more simple, just drag a Script Task from the toolbar and optionally rename it.
So as you will note from the figure at left, there is a blue data flow connecting the Script Task to the Truncate Destination Tables task. The flow is further noted as conditional by the fx label attached to it. Right mouse clicking a flow, and choosing the Edit selection presents you with a dialog similar to the one below.
This dialog allows us to set the Truncate task to execute ONLY if one or more conditions are true. I have created a package variable, called ReloadYN, of type boolean, that determines if tasks related only to a reload are to be executed. Perhaps the subject of another post, the package variable can be set many ways, via a dtexec command line option, script, or in this case via an indirect package configuration. By having this little gem in my standard Dimension and Fact table package templates, I know the entire database will be truncated (or not) on a consistent basis. One more thing to not have to worry about.
Putting it all together
Based on the value of ReloadYN at run-time, either the Truncate Destination Tables task executes (wiping the slate clean for a new load/test) or not. You can verify this by executing the package, or sequence container, from within Business Intelligence Developer Studio. The 'skipped' task will turn neither red, yellow or green. Execution continues at the first step following the Sequence Container.Thursday, August 30, 2007
Coming Soon to A Screen Near You
September 25 - Microsoft Business Intelligence Boot Camp
I will be conducting the program on SQL Server Integration Services for this Microsoft Partner event.
October 6 - Alabama .Net Code Camp V
A couple of abstracts have been submitted, stay tuned for details should I get an invitation. Even if I am NOT speaking, I will be there. Well worth it if you can get to Birmingham for a code-filled Saturday.
November 10 - SQL Saturday
I will be doing a once more refined edition of my Data Mining with SQL Server 2005 presentation, with an eye to new features in SQL Server 2008.
Monday, August 27, 2007
SQL Saturday!
Friday, August 24, 2007
SSIS Lookup Transforms and Varchar Natural Keys
The 2nd Pass Lookup is a Lookup transform identical to the initial attempt to find a matching dimension member, except with reference set caching disabled. This allows newly added rows, inserted by the Add Inferred Client Member task, to be part of the dimension lookup. If for some reason the candidate row is not found at this stage (perhaps an error getting the inferred member inserted was not caught), the row has an ErrorReason column added via a Derived Column transform, and is sent into an error flow. All inbound rows that find a dimension surrogate key in the 2nd Pass Lookup are moved into the destination (ie Production) fact table.
Caveat SSIS Developer
In this case the natural key for this dimension was a varchar datatype, and my initial impression was the use of a Lookup transformation would make the lookup of dimensional keys during fact table load straightforward. Unfortunately, there is a difference in how SSIS and SQL server handle string comparisons. SSIS executes a case-sensitive lookup. A fact table candidate natural key of 'A1234' would not match a dimension key of 'a1234'. Some lookup 'failures' in the initial lookup were actually case sensitivity differences. The first-cut of this process actually was two lookups, with the Execute SQL task between them only to insert an inferred dimension member. As duplicate natural keys for inferred members were inserted, the case sensitivity issue became evident.Wednesday, August 01, 2007
Speaking In "LA"
As the post-presentation discussion swerved into dimensional data warehousing, there was some discussion on doing a dimensional data modeling session at the next Alabama .Net Code Camp. If you are interested, please post a reply here or send me an email at bralston@comframe.com.
Monday, April 16, 2007
Alabama Code Camp IV
The PowerPoint from the presentation (and supporting files for examples) will follow shortly.