My Software Notepad: 2016

Wednesday, December 14, 2016

Notes on Slick 3

Most examples are using await and that is blocking. Why use Slick if you are going to block? Look at onComplete, onSuccess, onFailure, and promise.

** Chaining transactions looks like this in one case

db.run{ (for { customer <- customerTable.save(); address <- addressTable.save() } yield ()).transactionally}

** Returning a list of ids from a select looks like this

 (but you can also use returning keyword)

db = Database.forConfig("myactors.database")

session = db createSession;session.conn.setCatalog("mydb")
session.conn.setSchema("myschema")

val query = Tables.rawStringTest.to[List].map(_.id)
val resultSet = db.run(query.result)

**this prints the list

resultSet.onComplete(res =>   println(s"string returns $res") )

**this waits for the future to return and uses the success/failure traits

resultSet.onComplete{
  case Success(res) => res.foreach(x => db.run(Tables.rawStringTest.filter(_.id === x).delete))
  case Failure(ex) =>  println(s"error $ex")
}

**Testing is a challenge for aysync operations. Using an eventually retry block or a continuation both work.  This is with eventually.

it("ids are generated when None is passed in table calls"){
  val pk = None
  val validation = Tables.rawStringMessages.map(_.message_data)

  sut.insertStringMsg(None, "this is a fancy message", mockSuccessFunction(waiter), mockFailureFunction(waiter))
  sut.insertStringMsg(None, "this is another fancy message", mockSuccessFunction(waiter), mockFailureFunction(waiter))

  eventually(timeout(duration)) {
  val resultSet = Await.result(db.run(validation.result), duration)

resultSet.onComplete{

 case Success(res) => resultSet.head.equals(99) shouldBe true

 case Failure(ex) => fail(s"this failed $ex)

**Some ways to insert

table returning table forceInsert JsonMessage(receiptId, messageData, None)

table returning table insertOrUpdate JsonMessage(receiptId, messageData, None)
table returning table += JsonMessageInsert(receiptId, messageData)

**And any of these can return nothing

table += JsonMessageInsert(receiptId, messageData)

table insertOrUpdate JsonMessage(receiptId, messageData, None)

/** * `Await` is what is used to ensure proper handling of blocking for `Awaitable` instances. * * While occasionally useful, e.g. for testing, it is recommended that you avoid Await * when possible in favor of callbacks and combinators like onComplete and use in * for comprehensions. Await will block the thread on which it runs, and could cause * performance and deadlock issues. */

Clean up is a chore as well for integration tests. In this case I did clean up in the AfterAll block  to avoid cleaning data during concurrent test runs.

override def afterAll() {
  db = Database.forConfig("rtisystem.database")

  session = db createSession;  session.conn.setCatalog("rti")
  session.conn.setSchema("rti")

  val query = Tables.rawStringMessages.to[List].map(_.id)
  val resultSet = db.run(query.result)

  resultSet.onComplete{
    case Success(res) => res.foreach(x => db.run(Tables.rawStringMessages.filter(_.id === x).delete))
    case Failure(ex) =>  println(s"error $ex")
  }

  val jquery = Tables.jsonMessages.to[List].map(_.id)
  val jresultSet = db.run(jquery.result)

  jresultSet.onComplete{
    case Success(res) => res.foreach(x => db.run(Tables.jsonMessages.filter(_.id === x).delete))
    case Failure(ex) =>  println(s"error $ex")
  }

This is an test example where I insert two records through a DAO extension method and compare them in a select query. 

it("generates ids when None is passed instead of a long in the table call"){
  val pk = None
  val msgList: List[String] = List("this is a fancy message", "this is a fancy message also")
  val read = Tables.rawStringMessages.filter(row => row.message_data inSet msgList).map(_.id)

  sut.insertStringMsg(None, msgList.head,      mockSuccessFunction(waiter), mockFailureFunction(waiter))
  sut.insertStringMsg(None, msgList.tail.head, mockSuccessFunction(waiter), mockFailureFunction(waiter))

  eventually(timeout(duration)) {
    val resultSet = db.run(read.result)
    resultSet.onComplete {
      case Success(res) => assert(res.head != res.tail.head && res.head > 100000 && res.tail.head > 100000)
      case Failure(ex) => println(s"error $ex")
    }
  }
}

val visitorId: Int = // whatever
val locationCodes = List("loc1","loc2","loc3"...)
// your query, with bind params.
val q = for {
    v <- Visits 
    if v.visitor is visitorId.bind
    if v.location_code inSetBind locationCodes
  } yield v
// have a look at the generated query.
println(q.selectStatement)
// run the query
q.list

*** To use a db default ***

O.AutoInc 
column_name.?

Column Option[Timestamp]

table += Class(id, data, Some(timestamp))

Or a detailed version

import slick.driver.PostgresDriver.api._
import slick.lifted._
import java.sql.{Date, Timestamp}

/** A representation of the message decorated for Slick persistence
  * created_date should always be null on insert operations.
  * It is set at the database level to ensure time syncronicity
  * Id is the Twitter snowflake id. All columns NotNull unless declared as Option
  * */
class RawMessages(tag: Tag) extends Table[(String, Option[String], Timestamp)](tag, Some("rti"), "RawMessages") {
  def id = column[String]("id", O.PrimaryKey)
  def MessageString = column[Option[String]]("MessageString")
  def CreatedDate = column[Timestamp]("CreatedDate", O.SqlType("timestamp default now()"))
  def * = (id, MessageString, CreatedDate)
}

Monday, November 21, 2016

Notes on Log4Net

1. Add log4net.config to project
2. Add keys to app.config pointing to log4net config file and setting watch attribute.
Watch dictates whether log4net polls for changes in the log4net config
    <add key="log4net.Config" value="log4net.config"/>
    <add key="log4net.Config.Watch" value="True"/>
3. Add the configSsection to the app.config
<configSections>
    <section name="log4net" type="log4net.Config.Log4NetConfigurationSectionHandler, log4net"/>
</configSections>
4. Add any appenders. Set log rollover and locking model for file appenders. Each appender must have a unique name. Include a layout

      <layout type="log4net.Layout.PatternLayout">
        <conversionPattern value="%date [%thread] %-5level %logger [%property{NDC}] - %message%newline - %type cow" />
      </layout>

5. Add at least one appender name to the root logger. More than one appender can also be added. Note that you can filter log message types using <level value="All"/>
    <root>
      <level value="All"/>
      <appender-ref ref="ManagedColoredConsoleAppender"/>
    </root>

If logging to Mongo you want to consider using a capped collection. Capped collections are disabled by default. A capped collection allows for a FIFO buffer like behavior for self regulating log storage.

If logging to Rabbit (requires custom appender https://github.com/haf/log4net.RabbitMQ/blob/master/src/log4net.RabbitMQ/ExchangeBinding.cs) consider separate security and whether vhost should be the same as the app.

Smashing Cassandra 4.6 Into Pentaho BA 6.1

I am working on a POC with the Cassandra wide column store and the Pentaho reporting suite. At the time of this POC Pentaho support is behind the DataStax Cassandra release cycle, which is going to 5.0 in a few weeks. The thing I like about Pentaho is that it is designed to allow a huge amount of flexibility. So much it can destroy your mind with the possibilities.

First off I had to change my ETL in Pentaho DI and report in Pentaho BA to use a generic database connector. The URL was the connection string URI to my Cassandra cluster

and a generic table output step. I changed my report to use the same generic database connection. I had no idea if this would work and at first it failed utterly. But the error indicated a missing JDBC driver. So I started plugging in different JDBC drivers (BigSql, DBSchema, and Datastax). I could get each to connect to my version of Cassandra but they all threw cryptic errors like:
Codec not found for requested operation: [int <-> java.lang.Long Dbschema]
Codec not found for requested operation: [timestamp <-> com.datastax.driver.core.LocalDate]

public Timestamp convertToDatabaseColumn(LocalDate ld) {

return Timestamp.valueOf(ld);

}

@Override

public timestamp format(LocalDate value) {

if (value == null) {

return "NULL";

}

else {

return convertToDatabaseColumn(value.getMillisSinceEpoch()).toDateTime();

}

If you want to make a generic database connection with a JDBC connector you would use

Source: https://github.com/datastax/java-driver/tree/3.0.x

Build Command Required: & C:\tools\apache-maven-3.3.3\bin\mvn clean package site install

URL Structure: jdbc:cassandra://host1[:port1][,host2[:port2],...[,hostN[:portN]]][/[keyspace][?options]]

URL Example: jdbc:cassandra://CassandraContactNode1,CassandraContactNode2,CassandraContactNode3/rti?consistency=LOCAL_QUORUM

Java Driver Class: com.datastax.driver.core.Connection

To find the connection class you need you would open folders for the class path until you found a connection class that allowed you to pass in multiple nodes for your cluster and a consistency level. For a recent Apache update to Cassandra I downloaded the tarball and navigated to

D:\downloads\apache-cassandra-3.0.7-src.tar.gz\apache-cassandra-3.0.7-src.tar\apache-cassandra-3.0.7-src\src\java\org\apache\cassandra\transport\

In there are a SimpleClient class that accepts a channel and version, a Client class that accepts a host, port, version, and ssl settings, and a Connection class that accepts a more complex netty channel and the CQL version.

REFS:

http://opensourceconnections.com/blog/2015/12/22/exploring-custom-typecodecs-in-the-cassandra-java-driver/

PostgreSQL Create and Modified Triggers

/* Use BEFORE UPDATE triggers for this as AFTER UPDATEcreates an infinite loop with the new comparison */CREATE OR REPLACE FUNCTION update_modified_column()
RETURNS TRIGGER AS $$BEGIN   IF row(NEW.*) IS DISTINCT FROM row(OLD.*) THEN      NEW.moded = now();      RETURN NEW;   ELSE      RETURN OLD;   END IF;END;$$ language 'plpgsql';

DROP TABLE IF EXISTS temper;
CREATE TEMP TABLE IF NOT EXISTS temper(
  id INT,
  word VARCHAR,
  creatd TIMESTAMP WITH TIME ZONE DEFAULT NOW()::TIMESTAMP WITH TIME ZONE,
  moded TIMESTAMP WITH TIME ZONE);

DROP TRIGGER IF EXISTS temper_update_date ON temper;

CREATE TRIGGER temper_update_date BEFORE UPDATE ON temper FOR EACH ROW EXECUTE PROCEDURE update_modified_column();

INSERT INTO temper (id, word) VALUES  (1, 'one'),
  (2, 'two');

SELECT * FROM temper;

UPDATE temper SET word = 'tttoooo' WHERE id = 1;

SELECT * FROM temper;

Thursday, November 10, 2016

slick

For a durable elastic scaling system Akka is hard to beat. However some of the advanced configuration is not easy to find end to end examples of.

Things change with versions. This is relevant to
Akka 2.4
SBT 0.13.12
Scala 2.10.6

One decision is what to configure in code and what to configure in the application.conf file.

Do you have a muti-actorsystem build process? If so you need each actor system to have a namespace in the config.

logging-system {akka{}...}

ai-system{akka{}...}

/** An instance of the entire Config object loaded

  * with the application.conf data  * @see src/main/resources/application.conf */
val config = ConfigFactory.load()
/** it is important to get a specifc system config namespace

  * if you have a multi akka system project  
* for our single system setup we could have akka as a root node in the config*/
val akkaConfig = config.getConfig("ai-system").withFallback(config)

/** An akka actor system setup for the ai system application */
val actorSystem = ActorSystem(name = "ai-system", config = akkaConfig)

If you have a single system there is no need for the root namespace

akka{}...

And Akka will find the akka{} section

/** An instance of the entire Config object loaded

  * with the application.conf data  * @see src/main/resources/application.conf */
val config = ConfigFactory.load()/
** An akka actor system setup for the ai system application */
val actorSystem = ActorSystem(name = "ai-system", config = config)

Sample App Object

package com.company.aisystem

import akka.actor.SupervisorStrategy.{Decider, Escalate, Restart, Resume, Stop}
import akka.actor.{ActorInitializationException, ActorKilledException, 
ActorSystem, Inbox, OneForOneStrategy, Props, SupervisorStrategy}
import akka.routing._
import com.company.aisystem.actors.{OrchestratorActor, PersistorActor, 
ValidationActor}
import com.typesafe.config.ConfigFactory

import scala.concurrent.duration._

/** Acts as the "main" class for the application to start  *  
* It extends the scala: "App" trait  */
object aiSystemApp extends App {
  /** An instance of the entire Config object loaded with the application.conf data    
* @see src/main/resources/application.conf */  
val config = ConfigFactory.load()
  /** it is important to get a specifc system config namespace 
if you have a multi akka system project
    * for our single system setup we could have akka as a root node in the config*/ 
 val aisystemAkkaConfig = config.getConfig("aisystem").withFallback(config)

  /** An akka actor system setup for the ai system application */ 
 val actorSystem = ActorSystem(name = "aisystem", 
config = aisystemAkkaConfig)

  /** A one for one supervisor strategy to be used in the router */ 
 val superviseOneForOne =
    OneForOneStrategy(maxNrOfRetries = 10, withinTimeRange = 60.seconds){
      case _: ArithmeticException      => Resume
      case _: NullPointerException     => Restart
      case _: IllegalArgumentException => Stop
      case _: Exception                => Escalate
    }

  /** A default akka inbox for the ai system actor system. not sure 
we need to declare this */  
val inbox = Inbox.create(actorSystem)

  /** An akka actor ref of the persistor actor */  val persistorActor = 
actorSystem.actorOf(Props[PersistorActor]
.withDispatcher("aisystem.akka.fork-join-dispatcher"), 
config.getString("aisystem.persistor-actor-name"))

  /** An akka actor ref of the validation actor *
///TODO why are we passing in another actor ref as the whole props?  
val validationActor = actorSystem.actorOf(ValidationActor
.props(persistorActor)
.withDispatcher("aisystem.akka.ai-thread-pool-dispatcher"), 
config.getString("aisystem.validation-actor-name"))

  final val defaultStrategy: SupervisorStrategy = {
    def defaultDecider: Decider = {
      case _: ActorInitializationException ⇒ Stop
      case _: ActorKilledException         ⇒ Stop
      case _: Exception                    ⇒ Restart
    }
    OneForOneStrategy()(defaultDecider)
  }

  /** An akka actor ref of the orchestrator actor */   
 //todo: setup resizer = Option[DefaultOptimalSizeExploringResizer]  
val orchestratorActor = actorSystem.actorOf(OrchestratorActor
      .props(validationActor)
      .withRouter(SmallestMailboxPool(
        nrOfInstances = 5,        supervisorStrategy = superviseOneForOne,      
  routerDispatcher = "aisystem.akka.ai-thread-pool-dispatcher",        
usePoolDispatcher = false)),     
 config.getString("aisystem.orchestrator-actor-name"))

  ...other app init code

Sample Config

aisystem {

queueBroker{}

database{}
akka {
    # expose actor system level debug logging for local    
loglevel = DEBUG //options include DEBUG and off    
# stdout-loglevel is only in effect during system startup and shutdown   
 stdout-loglevel = DEBUG 
#options include DEBUG and off  
  log-config-on-start = on
    actor {
      debug {
        # enable DEBUG logging of all AutoReceiveMessages (Kill, PoisonPill et.c.)  
        autoreceive = on
        # enable DEBUG logging of subscription changes on the eventStream
        event-stream = on
        # enable DEBUG logging of all LoggingFSMs for events, transitions and timers  
        fsm = on
        # enable DEBUG logging of actor lifecycle changes
        lifecycle = on
        # enable DEBUG logging of message receive events
        receive = on
        # enable DEBUG logging of unhandled messages
        unhandled = on
      }
      # Timeout for actors
      timeout = 30s
      # Timeout for ActorSystem.actorOf
      creation-timeout = 30s
      serialize-messages = off #generally only enable for testing with serialzers  
  }
    fork-join-dispatcher {
      # Dispatcher is the name of the event-based dispatcher
      type = Dispatcher
      # What kind of ExecutionService to use
      executor = "fork-join-executor"  
    # Configuration for the fork join pool      fork-join-executor {
        # Min number of threads to cap factor-based parallelism number to
        parallelism-min = 2 
       # Parallelism (threads) ... ceil(available processors * factor)
        parallelism-factor = 2.0
        # Max number of threads to cap factor-based parallelism number to
        parallelism-max = 10      }
      # Throughput defines the maximum number of messages to be 
     # processed per actor before the thread jumps to the next actor. 
     # Set to 1 for as fair as possible.
      throughput = 100
      throughput-deadline-time = 10ms
      supervisorStrategy = "Restart"    }
    ai-thread-pool-dispatcher {
      # Dispatcher is the name of the event-based dispatcher 
     type = Dispatcher
      # What kind of ExecutionService to use
      executor = "thread-pool-executor"
      # Configuration for the thread pool
      thread-pool-executor {
        # minimum number of threads to cap factor-based core number to 
       core-pool-size-min = 2 
       # No of core threads ... ceil(available processors * factor)
        core-pool-size-factor = 2.0 
       # maximum number of threads to cap factor-based number to
        core-pool-size-max = 10      }
      # Throughput defines the maximum number of messages to be
      # processed per actor before the thread jumps to the next actor. 
     # Set to 1 for as fair as possible. 
     throughput = 1      supervisorStrategy = "Restart"    }
  }
  akka.actor.deployment {
    //TODO move this from App object to deployment    Orchestrator {
      router = smallest-mailbox-pool
      nr-of-instances = 5      dispatcher = aisystem.akka.aii-dispatcher
    }
    Persistor {
      router = smallest-mailbox-pool
      nr-of-instances = 5      dispatcher = aisystem.akka.ai-dispatcher
    }
    Validator {
      router = smallest-mailbox-pool
      nr-of-instances = 5    }
}

*REFS*
http://doc.akka.io/docs/akka/2.0.2/intro/getting-started-first-scala.html
Supervisor Strategy
Fault Tolerance Primer
http://danielwestheide.com/blog/2013/03/20/the-neophytes-guide-to-scala-part-15-dealing-with-failure-in-actor-systems.html
routers-in-akka-with-parameterized-actors
Router Choices

Wednesday, November 2, 2016

Docker on Deb Cheatsheet

getting help
$ docker --help

what images are running right now. related to docker run
$ docker stats

what containers are running. related to docker start
$ docker ps

Clean up exited containers

$ docker rm -v $(docker ps -a -q -f status=exited)

All

$ docker stop $(docker ps -a -q);docker rm $(docker ps -a -q)

make sure docker service starts (two ways to do it, or use a daemon)
$ sudo service docker start
$ sudo chkconfig docker on

what images can I start right now
$ docker images

Repo of <none> is an intermediate image that can optionally be discarded with a runtime flag.

is the last run container still up
docker inspect --format '{{.State.Running}}' $(docker ps -lq)

Clean up images
$ docker rmi 651505724f29

Start a container from an image
docker start <image>

start an instance of a container by name instead of guid
this will immediately terminate if there is not a foreground process running
this will keep a terminal running

$ docker run <REPO>:<TAG>

start an instance of a container as a daemon and keep a running fg process
the tail of dev null keeps docker alive
docker run -d 09c51b079466 tail -f /dev/null

start an instance of a container with a term running for commands
docker run -t 09c51b079466 bash

using tail -f /dev/null with the daemon option is a work around if you do not have a script initializing your application as your initial starting point that runs its own make your container continuously running is point to a shell file in docker which will keep your application running. You can try with a start.sh file

Eg: docker run -d centos sh /yourlocation/start.sh

This start.sh should point to a never ending application.
In case if you dont want any application to be running,
you can install monit which will keep your docker container running.

Access a running container
this uses the cute name. you can also trap the container id at run
docker exec infallible_newton ls -l|grep u*

this uses the id from variable
docker exec $RABCID ls -l|grep u*

details on a container
$ docker inspect 09c51b079466

find your ip
$ CIP=$(docker inspect 09c51b079466 | grep IPAddress| grep 172| head -1)
$ $CIP

Capture your container id as a variable
RABCID=$(docker run -d rabbitmq:3-management); docker inspect $RABCID | grep IPAddress| grep 172| head -1

networking
docker network ls ;

ifconfig

docker inpsect rtinet

Below shows two ways to join a network. First adds the network at run
#!/usr/bin/env bash
docker rm rti-rabbit -f
docker network create rtinet
docker run -d --network rtinet --hostname rabbit1 --name rti-rabbit -p 15672:15672 -p 4369:4369 -p 5671:5671 -p 5672:5672 rabbitmq:3-management

Second connects after run
DMDCID=$(docker run -d dynamicmessagediscovery:0.1-SNAPSHOT tail -f /dev/null);
DMDIP=$(docker inspect $DMDCID | grep IPAddress| grep 172| head -1)
docker network connect rtinet $DMDCID

view logs
$ docker logs f69d53c188af

Is an instance of my container running
$ docker ps -a | grep -i 09c51b079466

Sample Startup Script
docker stop $(docker ps -a -q)
docker rm rti-rabbit -f
docker rm dmdhost -f
docker rm pghost -f
docker network rm rtinet
docker network create rtinet
RABCID=$(docker run -d --network rtinet --hostname rabbit1 --name rti-rabbit -p 15672:15672 -p 4369:4369 -p 5671:5671 -p 5672:5672 rabbitmq:3-management)
DMDCID=$(docker run -d --network rtinet --hostname dmdhost --name dmdhost -p 15200:15672 -p 15201:4369 -p 15202:5671 -p 15203:5672 dynamicmessagediscovery:0.1-SNAPSHOT tail -f /dev/null);
PGCID=$(docker run -d --network rtinet --hostname pghost --name pghost -p 5432:5432 postgres)

DMDIP=$(docker inspect $DMDCID | grep IPAddress| grep 172| head -1)
RABIP=$(docker inspect $RABCID | grep IPAddress| grep 172| head -1)
PGIP=$(docker inspect $PGCID | grep IPAddress| grep 172| head -1)

firefox -new-tab "http://$RABIP:15672/#/queues/"

sbt test stage deploy and start
$ sbt test stage docker:publishLocal; CID=$(docker run -d dynamicmessagediscovery:0.1-SNAPSHOT tail -f /dev/null);docker inspect $CID | grep IPAddress | grep 127|head -1

<> run runs an image
<> start starts a container.
<> exec accesses a running container

To make a daemon (bg process)
/usr/local/bin/confd -interval=30 -backend etcd -node $CONFIG_CENTER &

A daemon is a background, non-interactive program. It is detached from the keyboard and display of any interactive user. The word daemon for denoting a background program is from the Unix culture; it is not universal.
A server is a program which responds to requests from other programs over some inter-process communication mechanism (usually over a network). A service is what a server provides. For example, the NFS port mapping service is provided as a separate portmap server, which is implemented as the portmapd daemon.
A server doesn't have to be a daemon, but usually is. A user application with a GUI could have a server built into it: for instance, a file-sharing application. Another example is the X Window server, which is anything but in the background: it takes over your screen, keyboard and pointing device. It is a server because it responds to requests from applications (to create and manipulate windows, et cetera), which can even be elsewhere on the network. But the X server also responds to your every keystroke and mouse movement.
A process is one or more threads of execution together with their shared set of resources, the most important of which are the address space and open file descriptors. A process creates an environment for these threads of execution which looks like they have an entire machine all to themselves: it is a virtual machine.
Inside a process, the resources of other processes, and of the kernel, are invisible and not directly accessible (at least not to a thread which is executing user-space code). For example, there is no way to refer to the open files of another process, or their memory space; it is as if those things do not even exist.
The process, and its relation to the kernel and other processes, perhaps constitutes the most important abstraction in Unix-like operating systems. The resources of the system are compartmentalized into processes, and nearly everything is understood as happening inside one process or another.

https://github.com/wsargent/docker-cheat-sheet

Monday, September 26, 2016

Ubuntu On Windows

Enable Hyper V

Start - Control Panel - Programs - Turn Windows Features On - enable all - Reboot

Enable Networking

REF: https://msdn.microsoft.com/en-us/virtualization/hyperv_on_windows/quick_start/walkthrough_virtual_switch

Open up Hyper-V Manager.

Right-click on the name of the Hyper-V host and select Virtual Switch Manager...

Under ‘Virtual Switches’, select New virtual network switch.

Under ‘What type of virtual switch do you want to create?’, select External.

Select the Create Virtual Switch button.

Under ‘Virtual Switch Properties’, give the new switch a name such as VM Virtual Switch.

Under ‘Connection Type’, ensure that External Network has been selected.

Select the physical network card to be paired with the new virtual switch. This is the network card that is physically connected to the network. I use my wireless to prevent disruptions in my work.

Create a New VM
New Virtual Machine

T-SQL Windowing Functions

I had thought I was done with SQL Server posts, but a SQL Server based reporting project landed on my desk. As such, I noticed I did not have a concise sample of windowing functions in PSQL, PL/SQL, or T-SQL. Many tabular stores allow for windowing, which is a key means by which to provide buckets or slices in tabular data. For me it is a key product feature when choosing a database engine. Examples of windowing would be time slices in incremental chronological data or regional buckets in geographically bound data.

Below the grain of the time data does match the hour grain requested in the report. I can cast/convert the times (potentially rounding into the wrong time bucket) or use a range on a time dimension. Since I had no permissions to create objects nor write to the source database, I created an ad hoc time dimension in a CTE. I then used a range query (BETWEEN) to create data in hourly buckets. THe real example of windowing came with page hit counts. I needed to only report on the top N most used pages. For that I used a RANK OVER PARTITION BY. Note that the PARTITION BY gives the window and the ORDER BY is providing the actual element to rank on.

USE EventsDb;

SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;

DECLARE
@EventId INT = 222222,
@EventDate DATETIME;
SET @EventDate = '2016-01-18 08:00:00.000';

WITH DimTime (DimHour) AS (
SELECT CAST(CAST(@EventDate AS DATE) AS datetime)
UNION ALL
SELECT DATEADD(MINUTE, 60, DimHour)
FROM DimTime
WHERE DimHour < CAST(@EventDate AS DATETIME)),
PageHits AS (
SELECT
DimHour, COUNT(Url) PageHits, COUNT(DISTINCT User_Id) PageUserCount,
AVG(CAST(RequestTimeMs AS BIGINT)) AvgRequestTimeMs, MAX(RequestTimeMs) MaxRequestTimeMs,
AVG(CAST(RequestLength AS BIGINT)) AvgRequestSizeB, URL
FROM DimTime DT
JOIN FactActivity FA ON FA.RequestTimestamp between DT.DimHour AND DATEADD(MI,59,DT.DimHour)
WHERE FA.RequestTimestamp Between DATEADD(MI,-59,@EventDate) AND DATEADD(MI,59,@EventDate)
GROUP BY DT.DimHour, FA.URL),
PageRank AS (
SELECT
PageHits.*,
RANK() OVER (PARTITION BY DimHour ORDER BY PageHits DESC) PageRank
FROM PageHits)
SELECT
@EventId IsmId, DimHour, PageHits,
PageUserCount, ISNULL(AvgRequestTimeMs, -1) AvgRequestTimeMs, ISNULL(MaxRequestTimeMs, -1) MaxRequestTimeMs, ISNULL(AvgRequestSizeB, -1) AvgRequestSizeB, URL VmsPage
FROM PageRank
WHERE PageRank < =10
ORDER BY DimHour, PageHits

REFS:
https://msdn.microsoft.com/en-us/library/ms189461.aspx
http://www.postgresql.org/docs/9.1/static/tutorial-window.html

Sunday, March 27, 2016

Javascript Foreach Replacements

I am not a fan of the foreach implementation in ES5. Here are some ideas I had to work around it. Some of these are ES6.

Operate on items in an array
Does not update the items in the array

1:  var n = ['cat', 'dog', 'rat'];  
2:  var val;  
3:  for (val of n){  
4:    val = val + 's';  
5:    console.log(val);  
6:  }  
7:  console.log(n);

more verbose way of doing the same

var n = ['cat', 'dog', 'rat'];
var it = n.values();
var entry;
while (!(entry = it.next()).done) {
    entry.value = entry.value + 's';
    console.log(entry.value);
}
console.log(n);

Operate on items in an array
Updates the items in the array
more verbose way of doing the same

var n = ['cat', 'dog', 'rat'];
for (i in n){
    n[i] = n[i] + 's';
    console.log(n[i]);
}
console.log(n);

same as above but unpleasantly verbose

var n = ['cat', 'dog', 'rat'];
for (i = 0; i < n.length ;i++){
    n[i] = n[i] + 's';
    console.log(n[i]);
}
console.log(n);

Performantly operate on items in a sparse array. Updates the defined items in the array additional checks may be required if order is required to be consistent
see link below

var n = ['cat', 'dog', 'rat'];
n[10] = 'goat';
n[17] = 'chupacabra';
for (i in n){
    if(String(parseInt(i, 10)) === i && n.hasOwnProperty(i))
    n[i] = n[i] + 's';
    console.log(n[i]);
    }
}
console.log(n);

Syntax I wanted to avoid
mozilla forEach

Source of inspiration
Arrays

Friday, March 25, 2016

Node Js: Basic Express Route Notes

Routes are super easy in Express. Given the following js you can see a get for root and kittens, and a post for kitten.

var express = require('express');
var app = express();
app.get('/',function(req,res){
    console.log('Got a get for /');
    res.send("You are getting root")
})

app.get('/kittens',function(req,res){
    console.log('Got a get for /kittens');
    res.send("You are getting kittens")
})

app.post('/kitten',function(req,res){
    console.log('posted a post for /kitten');
    res.send("You are posting kitten")
})

var server = app.listen(8081, function(){
    var host = server.address().address
    var port = server.address().port
    console.log("Example app listening at http://%s:%s", host, port)
})

Using a tool like Postman, if you issue a GET to http://127.0.0.1:8081/kitten you will see an error
     Cannot GET /kitten
This is because a GET route defined on kittens, not kitten. This is the same as requesting an undefined route for http://127.0.0.1:8081/kittenzzzz. However doing a GET to http://127.0.0.1:8081/kittens returns our coded responce, same as a POST to http://127.0.0.1:8081/kitten
    You are getting kittens

My Software Notepad