Friday, January 30, 2015

Intro To Data Science - NoSQL Notes

Time to learn a thing or two about the world outside of RDBMS and ACID. one of the hardest things to do is to walk away from something I know intuitively and embrace something foreign.

Mongo -
Document based. BSON storage. You can link document collections or embed documents. Adding attributes to the schema is extending the schema. As you extend you add an index, or extend an existing one (if the index needs to be covering rather than an access path to a record). One presentation stated "each attribute needs an index, each time you extend you add an index" I am not sure all attributes need an index.

KVP - key value pair association is simple and effective, but lacks the ability to use the complex relation and retrieval syntax associated with ANSI 92 SQL. A pure key/value store is completely agnostic toward the data stored within it, including data types. Simplicity allows for quick read and write. Not so easy to access using fuzzy logic or grouping/composite queries.

Secondary indexes are a feature that (not to be confused with clustered indexes or primary keys) allow tagging of KVP for alternate access paths. An example is a record with a secondary index
User.getIndexes().getIndex(StringBinIndex.named("ST")).add("GA");
User.getIndexes().getIndex(LongIntIndex.named("ALTID")).add(972121001L); 


Buckets - used to define a virtual keyspace for storing Riak objects. They enable you to define non-default configurations over that keyspace concerning replication properties and other parameters.

Good article on LinkedIn's approach. Simple and surficial, it provides basic context for some key considerations I am listing here
http://www.slideshare.net/amywtang/espresso-20952131
https://gigaom.com/2014/11/26/linkedin-explains-its-complex-gobblin-big-data-framework/

Riak examples are simple and easy to understand. Hurt my head at first just because it is so different.
http://docs.basho.com/riak/latest/dev/using/2i/


MongoDB Array Updates

db.test.remove({"item" : "ABC1"});

db.test.insert(
{
    "item" : "ABC1",
    "details" : {
        "model" : "14Q3",
        "manufacturer" : "XYZ Company"
    },
    "stock" : [
        {
            "size" : "S",
            "qty" : 25
        },
        {
            "size" : "M",
            "qty" : 50
        }
    ],
    "category" : "clothing"
});

//do not do this. it replaces the document
//db.test.update(
//{$and:[ {"item" : "ABC1"}, {"stock.size":"M"} ]},
//{"qty":45}
//);

db.test.update(
{$and:[ {"item" : "ABC1"}, {"stock.size":"M"} ]},
{$set: {"stock.$.qty":45}}
);

db.test.find({$and:[ {"item" : "ABC1"}, {"stock.size":"M"} ]});