Tagged: Data Vault

1

Data Vault ETL Implementation: Potential exceptions in Hub ETL

The previous blog posting regarding Hub ETL processes (Implementation of Hub ETL) covered the standard (cookie cutter) functionality. However, there are some exceptions that may occur, and perhaps some additional explanations are required of some of the operations: Record Source: in most of the scenarios the Record Source will be the same between the various Hub ETL processes as you’re working towards a core enterprise wide business key. This is why the Record Source is an optional component...

0

Data Vault ETL Implementation using SSIS: Step 4 – Link ETL

In the Data Vault workflow the Link is the next object to typically be loaded after the Hub has been processed. This can be done in parallel with the standard Satellites but this will be covered in a future posting about the workflow/scheduling of Data Vault ETL. The Link ETL comes down to select the distinct pairs of business keys from the respective Staging Area table, check if they’re there and insert them if they’re...

1

Data Vault ETL Implementation using SSIS: Step 3 – Hub ETL – part 1 – overview

The Staging Area and Historical Staging, both part of the conceptual Staging Layer, are not directly related to Data Vault although at least the Staging Area ETL paves to way with the definition of the Event Date/Time. The Hub however is the first true Data Vault template to be implemented in SSIS. Please note that this is true for DV1.0; for DV2.0 the Staging Area is incorporated into the design a bit more. The Hub...

Data Vault implementation A-Z: Staging data (the conceptual side) 0

Data Vault implementation A-Z: Staging data (the conceptual side)

This is the first of a planned series of implementation designs for implementing Data Vault in an end-to-end Data Warehouse environment. The positioning of the Data Vault concepts and techniques in the greater design of the system (reference architecture) is documented elsewhere in this site, mainly in the ‘papers’ section. Data Vault in itself does not provide a complete solution for most Data Warehouse purposes but provides a great set of modeling techniques to design the...

Data Vault comparisons 1

Data Vault comparisons

I have drafted a comparison between Data Vault and normalised (3NF) and denormalised (Kimball) models for reference. This comparison is applicable for using these models as the core Data Warehouse model as opposed to modelling for reporting purposes (i.e. data marts).

Data Vault versus the persistent Staging Area 3

Data Vault versus the persistent Staging Area

One of the questions I regularly get during presentations is what the benefits of Data Vault are over a persistent Staging Area. In other words: why go through the effort of defining a Data Vault model when you can receive the same ‘regeneration / recreation’ capabilities with a persistent Staging Area which directly feeds a Dimensional Model or similar presentation model. First off; in my reference architecture I use both the Data Vault (in the...

Data Vault in Australia 0

Data Vault in Australia

Anyone interested in Data Vault in Australia, please sign up for the official Data Vault User Groups (and specifically the Australian group). Just started, and Data Vault does not have many reference projects in Australia, but that can only mean there is much opportunity to set up a good foundation. The site can be found on www.dvusergroup.com. EDIT: the user groups have been dismantled in favour of the linkedIN discussion groups!!!

0

Why a separate surrogate key table?

A question I often (and recently) get is why you should create a separate table to manage the surrogate keys. This is compared to star- or snowflaked models where the surrogate key (and distribution) is part of the table. For Data Vault the Hub entity is the place where the keys are managed. There is rationale for splitting this process into separate tables: A surrogate key table makes it easy to identify the records in...