Friday, February 22, 2019

Creating A Secure Schema in PostgreSQL



Your SUPERUSER account may be deployed to the database as part of provisioning your instance. This is the case when you use AWS RDS. If so you manage admin rights using a different tooling (Puppet, Terraform, Ansible).
A basic security model in an idempotent parameterized script may look like this 

-- ROLES WE USE FOR SCHEMA ACCESS GRANTS

--Role
DO $$
BEGIN
CREATE ROLE sisense_or_somthing_read_role;    
EXCEPTION   
WHEN duplicate_object THEN
RAISE NOTICE 'Role already exists. Ignoring...';    
END$$;

--Role
DO $$
BEGIN
CREATE ROLE sisense_or_somthing_readwrite_role IN ROLE sisense_or_somthing_read_role;    
EXCEPTION   
WHEN duplicate_object THEN
RAISE NOTICE 'Role already exists. Ignoring...';    
END$$;

--Role
DO $$
BEGIN
CREATE ROLE sisense_or_somthing_exececutor IN ROLE sisense_or_somthing_readwrite_role;
EXCEPTION   
WHEN duplicate_object THEN
RAISE NOTICE 'Role already exists. Ignoring...';    
END$$;

--Role
DO $$
BEGIN
    CREATE USER sisense_or_somthing_reader WITH PASSWORD :var_pwd1 IN ROLE sisense_or_somthing_read_role;
EXCEPTION   
WHEN duplicate_object THEN
RAISE NOTICE 'Role already exists. Ignoring...';    
END$$;

--Role
DO $$
BEGIN
CREATE USER sisense_or_somthing_readerwriter WITH PASSWORD :var_pwd2 IN ROLE sisense_or_somthing_readwrite_role;
EXCEPTION   
WHEN duplicate_object THEN
RAISE NOTICE 'Role already exists. Ignoring...';    
END$$;

--Role
DO $$
BEGIN
CREATE USER sisense_or_somthing_maintenance WITH PASSWORD :var_pwd3 IN ROLE sisense_or_somthing_exececutor;
EXCEPTION   
WHEN duplicate_object THEN
RAISE NOTICE 'Role already exists. Ignoring...';    
END$$;

CREATE SCHEMA IF NOT EXISTS sisense_or_somthing;
GRANT SELECT ON ALL TABLES IN SCHEMA public TO sisense_or_somthing_read_role;
GRANT EXECUTE ON ALL FUNCTIONS IN SCHEMA public TO sisense_or_somthing_exececutor;
GRANT EXECUTE ON ALL FUNCTIONS IN SCHEMA sisense_or_somthing TO sisense_or_somthing_exececutor;
GRANT SELECT ON ALL TABLES IN SCHEMA sisense_or_somthing TO sisense_or_somthing_read_role;
GRANT INSERT ON ALL TABLES IN SCHEMA sisense_or_somthing TO sisense_or_somthing_readwrite_role;
GRANT UPDATE ON ALL TABLES IN SCHEMA sisense_or_somthing TO sisense_or_somthing_readwrite_role;
GRANT DELETE ON ALL TABLES IN SCHEMA sisense_or_somthing TO sisense_or_somthing_readwrite_role;
GRANT TRUNCATE ON ALL TABLES IN SCHEMA sisense_or_somthing TO sisense_or_somthing_readwrite_role;


A validation query or two can help
   SELECT 
   pr.oid role_oid, pr.rolname role_name, pr.rolcanlogin, pr.rolinherit, pr_p.rolname parent_role, 
   m.roleid role_member_oid, m.grantor, m.admin_option,
   pr_g.rolname grantor_name, pr_g.rolcanlogin, pr_g.rolinherit
   FROM 
   pg_roles pr
   left join pg_auth_members m ON m.member = pr.oid
   left join pg_roles pr_g on pr_g.oid = m.grantor
   left join pg_roles pr_p on pr_p.oid = m.roleid
   WHERE pr.rolname in (
'kuras_or_something_reader', 'kuras_or_something_readerwriter', 
'kuras_or_something_writer', 'kuras_or_something_wrtie_role', 'kuras_or_something_exececutor', 
'kuras_or_something_maintenance')

SELECT grantee, string_agg(privilege_type, ', ') AS privileges
FROM information_schema.role_table_grants 
WHERE table_name=‘model_results_4’
GROUP BY grantee;

And the revert migration script would be something like this. 

DROP SCHEMA IF EXISTS sisense_or_somthing;
DROP USER IF EXISTS sisense_or_somthing_admin;
DROP USER IF EXISTS sisense_or_somthing_reader;
DROP USER IF EXISTS sisense_or_somthing_readerwriter;
DROP USER IF EXISTS sisense_or_somthing_maintenance;
DROP ROLE IF EXISTS sisense_or_somthing_read_role;
DROP ROLE IF EXISTS sisense_or_somthing_readwrite_role;
DROP ROLE IF EXISTS sisense_or_somthing_exececutor;

Wednesday, June 6, 2018

Data Architecture - Document Model Design

Relational modeling has drawbacks to design implementation and to access speed. One differentiator is our common language around column based design patterns. Codd's normal forms still provide some guidance for us when it comes to document model design principals. A fundamental design guideline is ease of access is sometimes aided by denormalization, but more often aided by atomicity.

Below assume a document attribute is the same as a column and that a row is synonymous with a document.

NOTE: This is consistent across most engines except PostgreSQL, which has some designs that externalize highly read JSONB document elements into individual columns in the same row as the document for old school indexing purposes pre-GIN indexing.

1NF

Each column (or attribute) has no more than one value, each row (or document) is unique.
Can identify a unique candidate key.

2NF

Each column (or attribute) has no more than one value, each row (or document) is unique.
Can identify a unique candidate key.
All attributes or columns are dependent on the candidate key.

3NF

Each column (or attribute) has no more than one value, each row (or document) is unique.
Can identify a unique candidate key.
All attributes or columns are dependent on the candidate key.
No document or row creates a transitive association
  • A -> B -> C where A !-> C  
  • Driver -> Car -> Wheel Count where Driver has no Wheels. 
In columns this is easy. If one property should have its own unique id in the table, it moves to a new table. This is much more complicated in a document model. Below are different ways to model this.The first three create transitive associations. I can select driver and wheels where id = 1 and it will return the transitive value “Bob has four wheels.” The fourth creates a child document, but embeds it. 
  1. {id: “1”, [“Bob”, “car”, 4]}
  2. {id: “1”, “driver”:”Bob”, “vehicle”: “car”, “wheels”: “4”}
  3. {id: “1”, [“driver”:”Bob”, “vehicle”: “car”, “wheels”: “4”]}
  4. {id: “1”, “driver”:”Bob”, “car”:{id: “1”, "type“: "wheels”, "count": 4}}
  5. {id: “1”, “driver”:”Bob”, “car”:[{id: “1”, "type“: "wheels”, "count": 4}, {id: “2”, "type“: "hubs”, "count": 4},{id: “3”, "type“: "lug nuts”, "count": 16}]}

    Examples four and five are different forms of the same child document embedding. This form of normalization is faster to access and has less data access code to write. Well, it is faster as long as the sub-document relationship is bounded in its many-ness (somewhere between 1:1 and 1:5).

    Areas of Concern in Embedding


    When the set of possible children can balloon out the size of the document beyond reasonable access speed, it no longer represents a well defined single entity. It defines an entity relationship, one that requires multiple entities to allow for atomicity. Large heterogeneous documents are really poorly designed entity relationships hiding in an elephant costume.

    Many to many relationships also require externalization of child documents. 

    Data with different volatility requirements (write, update, delete operations) should be considered for externalized sub documnet storage.

    Sunday, November 12, 2017

    NodeJs Bash Build Script

    #!/bin/bash
    # Each lambda is technically its own project with its own dependencies
    function npmTask() {
      startingDir=$(pwd)
      npmCommand=$1
      for nodeDir in $startingDir/lambdas/*/;
        do
          echo "start node dir: $nodeDir from $startingDir"
          if [ -d $nodeDir  ] && [ $nodeDir != "$startingDir/lambdas/spec/" ];
          then
            echo "start node dir: $nodeDir from $startingDir"
            cd $nodeDir
            npm $npmCommand
            cd $startingDir
          else
            echo "no node modules found"
          fi
        done
    }
    npmTask "install"
    npmTask "build"
    npmTask "test"

    Sunday, October 1, 2017

    Python Map Reduce on Tuple List

    I was recently asked to read some pseudocode for calculating a class average of weighted scores. I wanted to work this out in Python to demonstrate how the design decisions around data structures impact implementation. Two independent lists assume order is maintained properly in two places. However it makes the code easier to explain.

    flatScores = [75,95,85,65]
    weights = [0.2,0.35,0.15,0.3]

    def mult(x,y): return x * y
    # map(mult,flatScores,weights)
    # [15.0, 33.25, 12.75, 19.5]
    reduce((lambda sum,score: (sum + score) ),list(map(mult,flatScores,weights)))/len(flatScores)
    #20.125

    A single list of tuples allows members to include all relevant data in one place. 

    scores = [[75,0.2],[95,0.35],[85,0.15],[65,0.3]]
    def mapper(scoreWeights): return map( lambda scoreWeight: scoreWeight[0] * scoreWeight[1] , scoreWeights )
    # mapper(scores)
    # [15.0, 33.25, 12.75, 19.5]
    reduce(lambda sum, cur: sum + cur, mapper(scores))/len(scores)
    #20.125

    Thursday, May 4, 2017

    Bash Function To Create Desktop Shortcuts

    Shortcuts are located in three places
    /usr/share/applications/intellij.desktop for all users
    a folder in the home dir for a given user
    ~/Desktop to see the shortcut on the desktop

    Format is
     [Desktop Entry]
     Version=13.0
     Type=Application
     Terminal=false
     Icon[en_US]=/home/rob/.intellij-13/bin/idea.png
     Name[en_US]=IntelliJ
     Exec=/home/rob/.intellij-13/bin/idea.sh
     Name=IntelliJ
     Icon=/home/rob/.intellij-13/bin/idea.png

    The function looks like this:

    function add_a_shortcut() {
      APP=$1
      ICON=$2
      EXEFILE=$3
      EXEC="bash -ic \"$EXEFILE &\""
      SHORTCUT="/usr/share/applications/$APP.desktop"
      USERSHORTCUT="/home/vagrant/Desktop/$APP.desktop"
      if [ -e "$EXEFILE" ]; then
        sudo touch $SHORTCUT $USERSHORTCUT
        sudo chmod 777 $SHORTCUT $USERSHORTCUT
        echo "[Desktop Entry]" | tee $SHORTCUT $USERSHORTCUT
        echo "Encoding=UTF-8" | tee -a $SHORTCUT $USERSHORTCUT
        echo "Comment=Launch $1" | tee -a $SHORTCUT $USERSHORTCUT
        echo "Type=Application" | tee -a $SHORTCUT $USERSHORTCUT
        echo "Terminal=false" | tee -a $SHORTCUT $USERSHORTCUT
        echo "Exec=$EXEC" | tee -a $SHORTCUT $USERSHORTCUT
        echo "Name=$APP" | tee -a $SHORTCUT $USERSHORTCUT
        echo "Icon=$ICON" | tee -a $SHORTCUT $USERSHORTCUT
       
        sudo chmod 644 $SHORTCUT
        sudo chown root:root $SHORTCUT
        sudo chown vagrant:vagrant $USERSHORTCUT
        echo "INFO: Created $SHORTCUT $USERSHORTCUT"
      else
        echo "ERROR: Failed to create $SHORTCUT $USERSHORTCUT"
      fi
    }
    EXAMPLE USAGE
    add_a_shortcut aggregation-designer /opt/pentaho/design-tools/aggregation-designer/aggregation-designer.app/Contents/Resources/pad.icns /opt/pentaho/design-tools/aggregation-designer/startaggregationdesigner.sh

    Thursday, March 16, 2017

    Why is it so hard to install VirtualBox Guest Additions on Centos

     yum update
     yum install gcc make kernel-devel bzip2
     mkdir -p /media/cdrom
     mount /dev/sr0 /media/cdrom
     sh /media/cdrom/VBoxLinuxAdditions.run

    Wednesday, December 14, 2016

    Notes on Slick 3

    Most examples are using await and that is blocking. Why use Slick if you are going to block? Look at onComplete, onSuccess, onFailure, and promise.

    ** Chaining transactions looks like this in one case
     
    db.run{ (for {
        customer <- customerTable.save();
        address <- a
    ddressTable.save()
       } yield ()).transactionally

     
    ** Returning a list of ids from a select looks like this
     (but you can also use returning keyword) 
     
    db = Database.forConfig("myactors.database")
    
    session = db createSession;session.conn.setCatalog("mydb")
    session.conn.setSchema("myschema")
    
    val query = Tables.rawStringTest.to[List].map(_.id)
    val resultSet = db.run(query.result)
     
    **this prints the list
    resultSet.onComplete(res =>   println(s"string returns $res") ) 
    **this waits for the future to return and uses the success/failure traits
    
    resultSet.onComplete{
      case Success(res) => res.foreach(x => db.run(Tables.rawStringTest.filter(_.id === x).delete))
      case Failure(ex) =>  println(s"error $ex")
    }
    **Testing is a challenge for aysync operations. Using an eventually retry block or a continuation both work.  This is with eventually.  
    it("ids are generated when None is passed in table calls"){
      val pk = None
      val validation = Tables.rawStringMessages.map(_.message_data)
    
      sut.insertStringMsg(None, "this is a fancy message", mockSuccessFunction(waiter), mockFailureFunction(waiter))
      sut.insertStringMsg(None, "this is another fancy message", mockSuccessFunction(waiter), mockFailureFunction(waiter))
    
      eventually(timeout(duration)) {
      val resultSet = Await.result(db.run(validation.result), duration)
    resultSet.onComplete{
     case Success(res) => resultSet.head.equals(99) shouldBe true 
     case Failure(ex) => fail(s"this failed $ex) 
    } 
    
    
     
    **Some ways to insert
    table returning table forceInsert JsonMessage(receiptId, messageData, None)
    table returning table insertOrUpdate JsonMessage(receiptId, messageData, None)
    table returning table += JsonMessageInsert(receiptId, messageData)
     
    **And any of these can return nothing
    table += JsonMessageInsert(receiptId, messageData)
    table insertOrUpdate JsonMessage(receiptId, messageData, None)
    
     
     
    /** * `Await` is what is used to ensure proper handling of blocking for `Awaitable` instances. * * While occasionally useful, e.g. for testing, it is recommended that you avoid Await * when possible in favor of callbacks and combinators like onComplete and use in * for comprehensions. Await will block the thread on which it runs, and could cause * performance and deadlock issues. */ 
    Clean up is a chore as well for integration tests. In this case I did clean up in the AfterAll block  to avoid cleaning data during concurrent test runs. 
    
    
    override def afterAll() {
      db = Database.forConfig("rtisystem.database")
    
      session = db createSession;  session.conn.setCatalog("rti")
      session.conn.setSchema("rti")
    
      val query = Tables.rawStringMessages.to[List].map(_.id)
      val resultSet = db.run(query.result)
    
      resultSet.onComplete{
        case Success(res) => res.foreach(x => db.run(Tables.rawStringMessages.filter(_.id === x).delete))
        case Failure(ex) =>  println(s"error $ex")
      }
    
      val jquery = Tables.jsonMessages.to[List].map(_.id)
      val jresultSet = db.run(jquery.result)
    
      jresultSet.onComplete{
        case Success(res) => res.foreach(x => db.run(Tables.jsonMessages.filter(_.id === x).delete))
        case Failure(ex) =>  println(s"error $ex")
      }
    
    
    
    
    This is an test example where I insert two records through a DAO extension method and compare them in a select query. 
    
    it("generates ids when None is passed instead of a long in the table call"){
      val pk = None
      val msgList: List[String] = List("this is a fancy message", "this is a fancy message also")
      val read = Tables.rawStringMessages.filter(row => row.message_data inSet msgList).map(_.id)
    
      sut.insertStringMsg(None, msgList.head,      mockSuccessFunction(waiter), mockFailureFunction(waiter))
      sut.insertStringMsg(None, msgList.tail.head, mockSuccessFunction(waiter), mockFailureFunction(waiter))
    
      eventually(timeout(duration)) {
        val resultSet = db.run(read.result)
        resultSet.onComplete {
          case Success(res) => assert(res.head != res.tail.head && res.head > 100000 && res.tail.head > 100000)
          case Failure(ex) => println(s"error $ex")
        }
      }
    } 
    
    
    
    
    
    
    
    
    val visitorId: Int = // whatever
    val locationCodes = List("loc1","loc2","loc3"...)
    // your query, with bind params.
    val q = for {
        v <- Visits 
        if v.visitor is visitorId.bind
        if v.location_code inSetBind locationCodes
      } yield v
    // have a look at the generated query.
    println(q.selectStatement)
    // run the query
    q.list
     
     
    *** To use a db default ***
    O.AutoInc 
    column_name.?
    Column Option[Timestamp]
     
    table += Class(id, data, Some(timestamp))
    
    
    Or a detailed version
    
    
    import slick.driver.PostgresDriver.api._
    import slick.lifted._
    import java.sql.{Date, Timestamp}
    
    /** A representation of the message decorated for Slick persistence
      * created_date should always be null on insert operations.
      * It is set at the database level to ensure time syncronicity
      * Id is the Twitter snowflake id. All columns NotNull unless declared as Option
      * */
    class RawMessages(tag: Tag) extends Table[(String, Option[String], Timestamp)](tag, Some("rti"), "RawMessages") {
      def id = column[String]("id", O.PrimaryKey)
      def MessageString = column[Option[String]]("MessageString")
      def CreatedDate = column[Timestamp]("CreatedDate", O.SqlType("timestamp default now()"))
      def * = (id, MessageString, CreatedDate)
    }