Category: Data Vault

2

Virtualising your Data Vault – Hubs and Links

With Data Vault, the Hub ETLs  are usually  the first to be developed – they are very easy to build once your model is complete! And it was the case with creating these virtualised ETL templates as well. Because Hubs and Links are so similar I covered them both in this post. In this virtualisation Proof of Concept I used the automation metadata I normally use for automating SSIS, Data Services and Powercenter ETL development. Using...

 
2

Data Vault 2.0 – how to handle Referential Integrity?

I was working on adding some of the automation code to support Data Vault 2.0 and this got me thinking about Referential Integrity (RI)  related to the modifications that Data Vault 2.0 requires. With Data Vault ‘1.0’ Referential Integrity is always enabled (except for very big systems – let’s leave that one out of the scope for now – see this older post) and in Data Vault 2.0 this hasn’t changed according to the specifications. For Data...

 
0

Data Vault ETL Implementation using SSIS: Step 7 – Link Satellite ETL – part 1 – Regular History

From an ETL development perspective the Data Vault Link Satellite (LSAT) ETL template is the most complex, and also conceptually the most challenging. This is mainly because the behavior of ETL depends on the type of history that is tracked. To define proper LSAT ETL templates for automation I classify this as follows: Regular / Historical Link Satellites that track the changes for attributes of a relationship over time. This is very similar to Satellite ETL...

 
4

Data Vault ETL Implementation using SSIS: Step 5 – Satellite ETL – part 2 – SQL selection (wide table exception)

There is more to say on exactly how Satellite selections are made as originally documented in the Satellite ETL post. In many cases not all records from a Staging Area table are selected, but rather a subset. This creates an interesting issue to which I usually refer to as the ‘wide table exception’. It leads to a minor change in the SQL selection for Satellite entities. The outline and solution are described in this post. Image the scenario where...

 
0

Data Vault ETL Implementation using SSIS: Step 3 – Hub ETL – part 2 – SQL selection

The outline ETL for Data Vault Hub style ETL using SSIS has been documented in ‘Hub implementation’ and ‘Hub exceptions’. As a minor addition I would like to dive a bit deeper into the selection logic associated with Hub ETL. Following the experiences ‘from the trenches’ there are basically two ways to create SQL for a Hub selection from the Staging Area; Straight-up selection of the business key Selection including a composite business key (e.g. assembled from...

 
1

Data Vault ETL Implementation using SSIS: Step 5 – Satellite ETL – part 3 – End Dating

For various reasons including (but not limited to) ease of maintenance, speed, reusability, parallelism and reducing of complexity in general, End Dating is implemented as a stand-alone generic ETL process to support the Data Vault Satellite ETL. Conceptually, End Dating is a redundant step because the information is derived from already available information: the Effective Date. However, in most situations it makes upstream ETL processing easier to manage and may be worth the storage/performance trade-off. But,...

 
0

Data Vault ETL Implementation using SSIS: Step 5 – Satellite ETL – part 1 – overview

Satellite processing is one of the types of ETL at the core of implementing Data Vault; this article focuses on the ‘vanilla’  handling of Satellite ETL. Exceptions such as Multi-Active/Multi-Variant implementations (depending on which flavour of Data Vault you use) or isolating Satellites from wide (denormalised) source tables or files will be documented in a separate post. Satellite processing for Data Vault is in many ways similar to the process that is defined for the...

 
0

Data Vault Role Playing in Links and Link Satellites

Recently I have been involved in some (very lively) discussions related to implementation of ‘role playing’ in Data Vault. In other words: how to handle (model) different types of relationships. Over the years my response has been that from the perspective of Data Vault modelling it doesn’t really matter if you create multiple Link relationships between Hubs / business entities, or if you create a single relationship with multiple Link Satellites each handling a relationship...

 
0

Comments on ‘Modeling the Agile Data Warehouse with Data Vault’

I just finished reading ‘Modeling the Agile Data Warehouse with Data Vault’ by Hans Hultgren. I think it provides a good and clear overview of the Data Warehouse design using Data Vault as technique for the core DWH layer as well and, most importantly, covers some practical aspects that have been implemented but not documented in (practical) detail such as; Key Satellites Identical Business Keys (primarily solved through defining a concatenated key) Points that are up...

 
1

Data Vault ETL Implementation: Potential exceptions in Hub ETL

The previous blog posting regarding Hub ETL processes (Implementation of Hub ETL) covered the standard (cookie cutter) functionality. However, there are some exceptions that may occur, and perhaps some additional explanations are required of some of the operations: Record Source: in most of the scenarios the Record Source will be the same between the various Hub ETL processes as you’re working towards a core enterprise wide business key. This is why the Record Source is an optional component...