Tuesday, December 31, 2013

Basic Fact and Dimension Guidelines

I find it difficult to locate a concise definition of what differentiates a fact and a dimension.

Fact Tables
􀂃 For every business process there should be at least one fact table. make a list of nouns describing the process or interaction.
􀂃 Each item in a fact table should have a default aggregation (or derivation) rule--e.g., sum, min, max, semi-additive, not additive. Any complexities in the aggregation method must be documented. Enter the aggregation/derivation information at the end of the definition. Begin a new paragraph and use the following format:
􀂃 The grain, or granularity, of the fact table should be at the lowest level for which a need has been identified and a requirement approved. Performance and storage constraints must also be considered.
􀂃 The grain of all items in the fact table should be the same. If there is a need for aggregation at more than one level, a separate fact table for each level of aggregation may be needed. Note: Aggregation tables are either (a) transparent to the user, such that all SQL is written to go against the lowest level of granularity, or (b) explicit--i.e., seen and queried by the user. Since the dimensional model is, among other things, a tool for communication with the user, it is preferable to include only those tables that the user will see.
Attributes describing the fact should be put in dimensional tables.
Dimension Tables
􀂃 Each dimension table has one and only one lowest level element, called the dimension grain.
􀂃 Dimension tables that are referenced or are likely to be referenced by multiple fact tables are "conformed dimensions." If conformed dimensions already exist for any of the dimensions in the model, their reuse is expected. If new dimensions with potential for usage across the agency are being developed, the design must support anticipated cross-agency needs.
􀂃 Each non-key element should appear in only one dimension table.
􀂃 Most models should have at least one period or time dimension. There may be more than one period dimension. Date and time may be split into two separate dimensions, especially if time is being captured at the hour or minute level.
􀂃 If a dimension table includes a code, in most cases the code description should be included. For example, if branch locations are identified by a branch code, and each code represents a branch name, both the code and the name should be included. An alternative is to include the description and omit the code--e.g., State = California, Status = Active.
􀂃 Generally, there should be no more than twenty dimension tables per fact table; the designer should provide justifications if more than twenty dimension tables are required.
Keys
􀂃 The primary key of a dimension table should be a surrogate key. A source system production key should not be used as a primary key.
􀂃 The primary keys of the dimension tables should be included in the fact table as foreign keys. Together these (and only these) foreign keys make up the fact table primary key (in the logical view).

No comments:

Post a Comment