Friday, May 18, 2012

Sample Service Level Agreement Terms For A Warehouse


To be filled out by the key business stakeholder : <name here>
Please assess your requirements for the following service levels in the application area of: Warehouse
Change the N to the value you feel is required.

Minimum uptime requirement:
Some Database, used to drive the down-stream datamarts for business intelligence and corporate reporting, must have no greater an outage during peak work hours (from X:XX to X:XX GMT) of X minutes per month (does not include month end)
...dev provides list of databases and supported business functions...
Some Database, used to drive the down-stream datamarts for business intelligence and corporate reporting, must have no greater an outage during peak work hours (from X:XX to X:XX GMT) of X seconds per month during month end processing.
...dev provides list of databases and supported business functions...
Some Database, used to drive the down-stream datamarts for business intelligence and corporate reporting, must have no greater than X hours of planned downtime per week.
...dev provides list of databases and supported business functions...
Some Database, used to drive the down-stream datamarts for business intelligence and corporate reporting, must have data refreshed from the Pulse system of record, by start of day (X:XX GMT), Mon - Fri.
...dev provides list of databases and supported business functions...
              Some datamart, used to drive business intelligence model calculations,  must be able to process X concurrent business queries with an average run time of no more than X minutes.
...dev provides list of databases and supported business functions...
Some datamart, used to drive business intelligence model calculations,  must be able to import X petabytes of data per month.
...
Some analytics model, used to drive business intelligence decision support report,  must be able to calculate the default rate at a 90% accuracy.

Thursday, May 17, 2012

Execute A Simple Query on All Databases

The following executes a simple search for a table against each database on a given database server. I am limited by the fact that sp_Msforeachdb has a 128 character limit to what SQL it can execute.


/*
The following executes a simple search for a table against each database on a server.
Note that sp_Msforeachdb has a 128 character limit to what it can execute.
Complex query requirements will have to be run using dynamic SQL.
*/

USE [tempdb];
GO
--create a results table outside accessible in the context of sp_Msforeachdb
--but not taking up any of the 128 character limit in the sproc signature


CREATE TABLE [dbo].[t1](TABLE_CATALOG
)
nVarchar(max), TABLE_SCHEMA nVarchar(max), TABLE_NAME nVarchar(max), COLUMN_NAME nVarchar(max), ORDINAL_POSITION nVarchar(max), COLUMN_DEFAULT nVarchar(max), IS_NULLABLE nVarchar(max), DATA_TYPE nVarchar(max), CHARACTER_MAXIMUM_LENGTH nVarchar(max), CHARACTER_OCTET_LENGTH nVarchar(max), NUMERIC_PRECISION nVarchar(max), NUMERIC_PRECISION_RADIX nVarchar(max), NUMERIC_SCALE nVarchar(max), DATETIME_PRECISION nVarchar(max), CHARACTER_SET_CATALOG nVarchar(max), CHARACTER_SET_SCHEMA nVarchar(max), CHARACTER_SET_NAME nVarchar(max), COLLATION_CATALOG nVarchar(max), COLLATION_SCHEMA nVarchar(max), COLLATION_NAME nVarchar(max), DOMAIN_CATALOG nVarchar(max), DOMAIN_SCHEMA nVarchar(max), DOMAIN_NAME nVarchar(max) ON [PRIMARY]GO

--run your query looking for a table or column or schema
--you only get 128 characters here


EXEC sp_Msforeachdb
"
USE [?] ;
INSERT INTO tempdb.dbo.t1
SELECT * FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_NAME = 'borrower'
"

GO

--filter out empty records with temp table
SELECT * FROM [dbo].[t1] WHERE table_name IS NOT NULL

--clean up temp table
USE tempdb;
DROP TABLE [dbo].[t1];

GO

Monday, May 14, 2012

Informatica Glossary

Informatica is a tool used for the creation of ETL logic, job definition, and source control. It has a lot of proprietary terms that have similiar definitions outside the context of the application.  This is the start of a glossary for referncing these terms.

Mapping represents the flow and transformation of data from source to taraget.
Mapplet is a reuasable set of transformations that can be called from a mapping.
Session is a set of instructions that tells the Informatica Server how and when to move data from sources to targets. It is associated with a mapping to define the connections and other configurations for that mapping.
Workflow controls the execution of tasks such as commands, emails and sessions.
Worklet is a reuasable set of sessions that can be called within a workflow.

Although these are not Informatica specific, I am placing them here for want of a warehousing glossary post to add them to.

Factless Facts:Facts without any measures. This could be a table used to track batch processing status (e.g. job ran on date, where batch status is not normalized)
Additive Facts:a fact that leads to a meaningful measure when added (e.g. unit sold where the metric would be unit sold per week per sales representative).
Non-Additive Facts: Facts that do not lend themselves to a meaningful metric when added (e.g. unit price where unit price is a trend over time or an average across like products, but not added over time).
Semi-Additive Facts: when some but not all colums in a row can be used additively.
Accumulating Fact: Describes the grain of a fact. An accumulating fact stores a row for each stage in the lifetime of event (if that event reverts to a previous stage you can either update the previous stage record or add a new one with a rework flag).
Periodic Facts: Describes the grain of a fact. A periodic fact stores one row per event, so one row across multiple status changes for that event.
Derived Facts: Some BI data is not found in source. We create these facts by calculating them from other facts.

Friday, May 4, 2012

Replace CASE with IIF in Database Unit Tests

So I write a number of integration tests. Currently I am working to provide a way to compare all tables involved in an ETL (source to target). I have done this in the past with a single test for each table pair (or sets of tables if more than one table on the transactional side is flattened into a single target table). I always try to force people on my team to write this to assert the test conditions. Too often people lazily write the test against each table, compare manually, and throw the SQL away as it has no long term value. They start from scratch with every test run. If you write your integration test like a unit test, it has immediate reusablity, as well as the ability to be included in a continuous integration build.

I currently have a simple ETL to test, which spans 150 tables. I have three types of ETL tests. First are the extract and load tests: row counts and meta data date checks. Thirdly I have the column by column transformation tests. I do not have time to write the 150 tests to test all of the tables I need to test. As such, I am exploring using dynamic SQL to iterate through a list of server/database/table pairs to compare the raw row counts. Once I get that working, I can try to extend it to column comparisons (for those columns where there is no data transformation). Lastly I will add transformation logic to test transformed columns. This is the same asserting test pattern I referred to above, with an expected, actual, and evaluation logic. That logic looks like this

CASE WHEN (@EXECTED = @ACTUAL) THEN 'PASS' ELSE 'FAIL' END AS 'TestName'

This post is actually about how to replace that CASE statement with the simplified IIF built in function from SQL Server 2012. (I will post the solution to the dynamic SQL test once I have it figured out).

In SQL Server 2012 there is a new built in function that allows the CASE logic above to be slightly simplified. It reduces the keywords required for the comparison from five to one. There are fewer typos and thus fewer iterations arrive at a passing (or failing) test.


IIF(@EXECTED = @ACTUAL, 'PASS', 'FAIL') AS 'TestName'

Best thing for me about this is that I can always remember where to put the AS 'TestName'. Half the time I write AS before concluding the CASE with and END statement. I am glad to be ride of that recurring fat fingering.

Wednesday, May 2, 2012

Symbolic Link Gotcha in Windows 7

I needed to make a (soft) symbolic link from my local machine to a remote drive. Turns out you need to make the link and then grant the directional permisions for traversal.  This was done using a DOS prompt with nothing fancy added to the path on Windows7. The first command creates a link that will allow the documents added to C: to show up in F: the second allows local to local, remote to local, and local to remote copy. Replacing the 1 with 0 would turn one of these to inactive. The rest is just a test of the bidirectional behavior

>mklink /D F:\Documents C:\Documents
>fsutil behavior set SymlinkEvaluation l2l:1 r2l:1 l2r:1
>copy nul F:\Documents\POOKIE.TXT
>dir C:\Documents\POOKIE.TXT|findstr -i file
>dir F:\Documents\POOKIE.TXT|findstr -i file
>del C:\Documents\POOKIE.TXT
> dir F:\Documents\POOKIE.TXT|findstr -i file