When is a change a ‘change’?

This is a post that touches on what I think is one the essential best-practices for ETL design: the ability to process multiple changes for the same key in a single pass. This is specifically relevant for typical ETL processes that load data to a time-variant target (PSA, Satellite, Dimension etc.). For non-time variant targets (Hubs, Links etc.) the process is a bit easier as this is essentially built-in the patterns already :-). In a given process, there are usually (at least) two rules I maintain:

  1. Making sure there is a safety catch to prevent loading information multiple times (by accident, out of order etc.)
  2. Making sure the correct delta is selected to be merged with the target

The paper I’ve written here (click the link below to open) captures the essence that explains the second topic: how the correct delta is selected.

When is a change a ‘change’

Also, if you happen to be in Melbourne in March and are interested in hearing more on these topics as a classroom training please have a look at the Data Vault implementation course as well.

 
Roelant Vos

Roelant Vos

You may also like...

2 Responses

  1. June 5, 2017

    […] When is a change a change? This post outlines the concepts behind effects of row condensing and why additional checks are needed to avoid record sets with (potentially) many redundant rows. […]

     
  2. December 5, 2017

    […] If you imagine you start capturing transactions early and complete your Data Warehouse design / model some time later, you can imagine you need to find a way to ‘replay’ these transactions to load into the Data Warehouse. This requires the DWH patterns such as Hubs, Links and Satellites need to be able to do this, which is a complication of the pattern (but very worth it). Hubs and Links natively support this due to their definition, but time-variant tables such as Satellites require special attention. For an idea please read this post. […]

     

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.