Thursday, April 7, 2016

T-SQL Windowing Functions

I had thought I was done with SQL Server posts, but a SQL Server based reporting project landed on my desk. As such, I noticed I did not have a concise sample of windowing functions in PSQL, PL/SQL, or T-SQL. Many tabular stores allow for windowing, which is a key means by which to provide buckets or slices in tabular data. For me it is a key product feature when choosing a database engine. Examples of windowing would be time slices in incremental chronological data or regional buckets in geographically bound data.

Below the grain of the time data does match the hour grain requested in the report. I can cast/convert the times (potentially rounding into the wrong time bucket) or use a range on a time dimension. Since I had no permissions to create objects nor write to the source database, I created an ad hoc time dimension in a CTE. I then used a range query (BETWEEN) to create data in hourly buckets. THe real example of windowing came with page hit counts. I needed to only report on the top N most used pages. For that I used a RANK OVER PARTITION BY. Note that the PARTITION BY gives the window and the ORDER BY is providing the actual element to rank on.

USE EventsDb;

SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;

DECLARE
@EventId INT = 222222,
@EventDate DATETIME;
SET @EventDate = '2016-01-18 08:00:00.000';

WITH DimTime (DimHour) AS (
SELECT CAST(CAST(@EventDate AS DATE) AS datetime)
UNION ALL
SELECT DATEADD(MINUTE, 60, DimHour)
FROM DimTime
WHERE DimHour < CAST(@EventDate AS DATETIME)),
PageHits AS (
SELECT
DimHour, COUNT(Url) PageHits, COUNT(DISTINCT User_Id) PageUserCount,
AVG(CAST(RequestTimeMs AS BIGINT)) AvgRequestTimeMs, MAX(RequestTimeMs) MaxRequestTimeMs,
AVG(CAST(RequestLength AS BIGINT)) AvgRequestSizeB, URL
FROM DimTime DT
JOIN FactActivity FA ON FA.RequestTimestamp between DT.DimHour AND DATEADD(MI,59,DT.DimHour)
WHERE FA.RequestTimestamp Between DATEADD(MI,-59,@EventDate) AND DATEADD(MI,59,@EventDate)
GROUP BY DT.DimHour, FA.URL),
PageRank AS (
SELECT
PageHits.*,
RANK() OVER (PARTITION BY DimHour ORDER BY PageHits DESC) PageRank
FROM PageHits)
SELECT
@EventId IsmId, DimHour, PageHits,
PageUserCount, ISNULL(AvgRequestTimeMs, -1) AvgRequestTimeMs, ISNULL(MaxRequestTimeMs, -1) MaxRequestTimeMs, ISNULL(AvgRequestSizeB, -1) AvgRequestSizeB, URL VmsPage
FROM PageRank
WHERE PageRank < =10
ORDER BY DimHour, PageHits

REFS:
https://msdn.microsoft.com/en-us/library/ms189461.aspx
http://www.postgresql.org/docs/9.1/static/tutorial-window.html