Relational modeling has drawbacks to design implementation and to access speed. One differentiator is our common language around column based design patterns. Codd's normal forms still provide some guidance for us when it comes to document model design principals. A fundamental design guideline is ease of access is sometimes aided by denormalization, but more often aided by atomicity.
Below assume a document attribute is the same as a column and that a row is synonymous with a document.
NOTE: This is consistent across most engines except PostgreSQL, which has some designs that externalize highly read JSONB document elements into individual columns in the same row as the document for old school indexing purposes pre-GIN indexing.
Below assume a document attribute is the same as a column and that a row is synonymous with a document.
NOTE: This is consistent across most engines except PostgreSQL, which has some designs that externalize highly read JSONB document elements into individual columns in the same row as the document for old school indexing purposes pre-GIN indexing.
1NF
Each column (or attribute) has no more than one value, each row (or document) is unique.
Can identify a unique candidate key.
2NF
Each column (or attribute) has no more than one value, each row (or document) is unique.
Can identify a unique candidate key.
All attributes or columns are dependent on the candidate key.
3NF
Each column (or attribute) has no more than one value, each row (or document) is unique.
Can identify a unique candidate key.
All attributes or columns are dependent on the candidate key.
No document or row creates a transitive association
- A -> B -> C where A !-> C
- Driver -> Car -> Wheel Count where Driver has no Wheels.
- {id: “1”, [“Bob”, “car”, 4]}
- {id: “1”, “driver”:”Bob”, “vehicle”: “car”, “wheels”: “4”}
- {id: “1”, [“driver”:”Bob”, “vehicle”: “car”, “wheels”: “4”]}
- {id: “1”, “driver”:”Bob”, “car”:{id: “1”, "type“: "wheels”, "count": 4}}
- {id: “1”, “driver”:”Bob”, “car”:[{id: “1”, "type“: "wheels”, "count": 4}, {id: “2”, "type“: "hubs”, "count": 4},{id: “3”, "type“: "lug nuts”, "count": 16}]}
Examples four and five are different forms of the same child document embedding. This form of normalization is faster to access and has less data access code to write. Well, it is faster as long as the sub-document relationship is bounded in its many-ness (somewhere between 1:1 and 1:5).
Areas of Concern in Embedding
When the set of possible children can balloon out the size of the document beyond reasonable access speed, it no longer represents a well defined single entity. It defines an entity relationship, one that requires multiple entities to allow for atomicity. Large heterogeneous documents are really poorly designed entity relationships hiding in an elephant costume.
Many to many relationships also require externalization of child documents.
Data with different volatility requirements (write, update, delete operations) should be considered for externalized sub documnet storage.
Many to many relationships also require externalization of child documents.
Data with different volatility requirements (write, update, delete operations) should be considered for externalized sub documnet storage.