When is Data Vault a suitable solution?
Over the years I have had mixed experiences pitching Data Vault in various situations and have come to a couple of conclusions with regards to applying (or proposing) Data Vault. It is related to various discussions such as having a 2-tiered (layered) or 3-tiered Data Warehouse architecture which in itself is subject to various considerations. And once you decided you do need a 3-tiered approach the discussion between (typically) 3NF and Data Vault as modelling techniques. This is a fairly big overview/comparison and is best posted in a separate post.
The core question in this context is whether the introduction of Data Vault as the middle-tier (Integration / SOR / Core DWH layer) is worth the additional effort in terms of (ETL) development and space.
My views are as follows below, but bear in mind that this is related to the principles as well which in itself is related to the 2 versus 3 tiered solution design. Ultimately this ‘functionality’ such as ‘rerolling the truth’ is then weighted against the additional (perceived) overhead of additional tables and ETL. Aside from the various Data Vault benefits the benefit of generating / automating a significant portion of the ETL is a strong argument in this discussion. Essentially you’re abstracting ETL processes into more atomic building blocks, which makes them more generic and therefore easier to generate.
Data Vault is a good fit if:
- The outcomes and/or requirements are not clear or are likely to change.You are following an ‘agile’ approach for Project Management or specified very short delivery cycles.
- You want to incrementally expand your data model.
- You want to plan for / expect to require additional scalability.
- You want to leverage (ETL) automation / enforce standards through automation.
- You are stuck in a tactical (2-tiered / Dimensional Bus Architecture) solution and want to expand, Data Vault can be used to incrementally ‘backfill’ the solution.
Data Vault is not a good fit if:
- You’re using a 2-tiered architecture / don’t want (or think you need) the extra layer (i.e. not an EDW).
- You’re unfamiliar with the approach. Data Vault does upset some established principles (‘holy houses’) and tends to generate resistance due to unfamiliarity. These concerns are often deeply rooted and overriding this may not get the best result from a Project Management perspective.
- You have a relatively low maturity regarding Data Modelling. Data Vault required a relatively senior/firm Modeller, which tends to be a somewhat undervalued profession. Most people are familiar with normalisation but Data Vault requires additional experience. Data Vault leaves less room for deviations, requires adequate assignment of business keys (not 1 on 1 with source primary keys) and generally requires a firm adherence to the standards.
- There is not enough involvement / drive to pursue the program. Related to the familiarity working with Data Vault requires continuous ‘selling’ of the approach as to date it is still fairly uncommon. At least not at the same level that people are familiar with 3NF / CIF and/or Dimensional Modelling.