Data Vault Role Playing in Links and Link Satellites
Recently I have been involved in some (very lively) discussions related to implementation of ‘role playing’ in Data Vault. In other words: how to handle (model) different types of relationships.
Over the years my response has been that from the perspective of Data Vault modelling it doesn’t really matter if you create multiple Link relationships between Hubs / business entities, or if you create a single relationship with multiple Link Satellites each handling a relationship role.
In principle both approaches do not fundamentally break Data Vault’s principles:
- In both cases it’s easy to add new objects, in this case Link or Link Satellite tables without impacting existing structures so flexibility is preserved
- In both cases no additional attribute (e.g. relationship type) is introduced
- In both cases the name of the table indicates the relationship type
- In both cases you can query the same information
So based on the above, both approaches are practically the same.
Personally I have always been using multiple Links to handle relationship types, but I think I have a better explanation now as to why this ultimately is the better solution.
To illustrate this I have taken the case of multiple potential relationships between a Person and a Vehicle. For instance the relationship (type) can be ‘driver’ or ‘owner’, because the person who drives a car does not necessarily owns it as well.
The example I prepared shows that John owns a Vehicle (with rego ‘762-ABA’) at a point in time, but Peter is driving it. After a while Peter decides to buy the car and thus becomes the owner. The following images show how this plays out in both approaches towards role playing in Data Vault.
As is visible in the above diagram there are two distinct relationships, so two records exist in the Link table. But the corresponding Link Satellites are used to explain the nature of the relationship is. This is the approach of using a Link as singular integration point. After the swap of ownership the result is as follows:
This leads to one relationship being closed-off in the Link Satellite Owner. John no longer owns the vehicle, but Peter does now. He is still the driver of the vehicle (as visible in the Link Satellite Driver table).
Now compare this to the approach of using separate Links to manage relationship types:
This is exactly the same situation, but Link records are split between tables. After the transfer of the vehicle the result is as follows:
At first the practical result seems to be storing (redundant) Link records, because of the physical separation of the Links. The results / information is exactly the same as with the first scenario.
However, there are a few considerations that lead to using separate Links as a better solution:
- The model is easier to read; navigating from Hub to Hub via unique Links makes a bit more sense
- Conceptually using a single Link leads to a form of supertyping which is avoided in Data Vault (mainly applies to Hubs)
- You can see, or query, the relative complement / symmetric difference between sets more easily (e.g. without involving joins to the Link Satellites). This is because each Link only has records that are related to its dedicated Link Satellite thus for a specific relationship (type)
- Although frowned upon, if you want to drop a relationship (type) you avoid the possibility of ending up with orphan records in the (single) Link
In the end it will cost a bit more disk space, but the model will be clearer for it.