The ‘architecture’ section of this weblog serves as an introduction for the Data Integration Framework Github repository (see the collaboration section). This Github repository contains the various levels of documentation that constitutes the Enterprise Data Warehouse (EDW) architecture, pattern documentation and supporting metadata and control frameworks.
In the Presentation Layer of the architecture, the data is converted from ‘data’ to ‘information’ to be further picked up by reporting and analysis tools. This consists of the processe from the Integration Layer to the Information Marts. The Presentation Layer is a collection of derived data from the Data Vault including persistent (but completely rebuildable) aggregate and helper tables which can make life easier for ETL and reporting tooling.
The hierarchies are defined and maintained here as series of primary key’s from the involved hubs in a certain order.
To create a Dimension, the data is pulled from the satellites related to these hubs to create a typical ‘Kimball’ dimension for a defined grain. The Presentation Layer is comprised of two parts (or areas): the supporting area and the presentation area. The datamart layer is not a persistent layer since it can be recreated from the integration layer at any time. Performance considerations however will usually require the information to be stored in the datamart layer permanently for quick access. The datamart layer is the location in the architecture where the all business rules (but the enterprise wide ones) are applied to the data. One very important aspect of the datamart layer is that it inherits its datawarehouse keys from the integration layer, thus enabling backtracking of information. The detailed (raw or cleaned) data from the integration layer is still applicable to the keys in the datamart layer.
The supporting area in an optional phase where calculations can be stored which are shared for every datamart. Using semi-aggregate tables, summaries or any shared calculation can prevent datamarts from executing performance heavy aggregations more than once. The use of a supporting area is very dependant of the way the datamarts are modelled and what their purpose is.
The core area of the datamart layer is the presentation area. Here all the specific subject areas of information are modelled from the integration layer model to a model which suits the requirements of the recipient of the information. The presentation area is the only part of the datawarehouse that is accessible to end-users.
Considerations which affect the presentation layer in general and the datamart area specifically are:
- Front-end analysis and reporting tooling used. Some reporting and analysis software performs better on a specific data model. In this case the datamart that is created for this tool has to conform to these requirements as much as possible.
- Views on history. Not everyone needs a historical view on the data. For most reasons the current view (type 1) might be sufficient whether other recipients might need to monitor how information changes over time.
- Quality of data. All data is stored in the integration layer: its raw form in the integration area and the cleaned form in the cleansing area. By selecting information from the cleansing area, based on the error bitmap, varying levels of data can be selected. Alternatively there is the option of creating a data quality helper summary which uses the error bitmap to enable the selection of data with or without certain errors.
- In the Integration layer every attribute has been modelled as Type 2 history (stacking), but since the datamarts themselves are specific for the end-users purposes the attributes should be individually labeled as type 1,2 or 3 (well, type 3 cannot be labelled individually).
- Every table which is accessed by front-end tooling (SQL clients excluded) should have a 1-on-1 view as per best practice of working with front-end tooling. This will add flexibility when moving changes in the table structure forward without having to impact the front-end layer right away. Of course it is best to do this quick, but at least you have the option for a more gradual change path. These views are completely similar to the tables they represent, in other words they contain no business logic. Due to the nature of these views, they can be easily generated (if you stay consistent in your approach). Handling these views this way provides a clean line between ETL/Datawarehousing and Front-end/Business Intelligence.
- In terms of database and ETL design this layer can consist of (semi) aggregates and datamarts per subject area or customer. It is advisable to combine the (semi)aggregates in a single group, ETL folder or schema to enable multiple datamarts to (re)use them. The datamarts themselves should be logically separated for whatever purpose they are presented.
- Depending on the front-end software or performance demands, dimensions can be modelled as star or as snowflake models.
Using dates and times
Dates and times always need special care in the datawarehouse data model. The following agreement exists about date, times, timestamps and so forth:
If date/time attributes are stored for aggregation then foreign keys can be used. This make roll-up in the front-end much easier. On the other hand, if the only function this attribute in a table has is pure display on a report or minor calculation then the information can be stored as date time.
History in datamarts
The datamart attributes are all type 1 by default. This information is most frequently requested by Business Intelligence users. By request datamart table attributes may be set to type2 (history) or type3 (today’s view of the history).
The supporting area is an optional area where semi-aggregates or any useful tables can be stored. These types of tables are usually added for either performance reasons or the wish to implement the same business logic in as few places as possible. Supporting tables can be modelled in any way as long as they benefit the presentation area. They are not accessible by users or front-end reporting and analysis tools.
By thoughtfully creating aggregate tables which can be shared by the datamarts one could for instance create a fact table on a certain aggregate level and have different datamarts aggregate this table further depending on their needs. The business logic and performance demanding calculations only have to be done once this way. The helper table can have any structure as long as it will support the connected datamarts. Performance and limiting the duplicate implementation of business rules is the mail goal. Usually though the helper area will follow the design of the datamarts which will in most cases lead to a star- or snowflake model.
The Presentation Layer is the final part of the outline architecture. A mart is modelled for a specific purpose, audience and technical requirement. The complete Data Warehouse can contain many different marts with different models and different ‘versions of the truth’ depending on the business needs.
In the process from loading the data from the Integration layer to the Datamart layer most of the business logic is implemented.