Data Vault ETL Implementation: Potential exceptions in Hub ETL

The previous blog posting regarding Hub ETL processes (Implementation of Hub ETL) covered the standard (cookie cutter) functionality. However, there are some exceptions that may occur, and perhaps some additional explanations are required of some of the operations:

  • Record Source: in most of the scenarios the Record Source will be the same between the various Hub ETL processes as you’re working towards a core enterprise wide business key. This is why the Record Source is an optional component of the Hub table (according to Dan Linstedt and not to Hans Hultgren I may add). In this technical implementation I have added the Record Source to all Hub tables as a rule to achieve a one-size-fits-all solution. In the exception that the same Business Key has multiple meanings (it does happen) the Record Source can be used to distinguish between these values and the ETL could handle the unique Hub key distribution for each of them automatically. But, as mentioned, in most cases there will only be a single Record Source. By default however, the Record Source is not part of the business key if at all possible
  • Key Satellite: in the described Hub ETL template the optional extra step to derive the correct Hub Key for a Business Key through a Key Satellite is not mentioned as it is not always required. In fact, the better information is managed and/or integrated enterprise-wide, the less you need it. A Key Satellite is needed in situations where alternate values for each Business Key are required for lookup; typically source system IDs / PKs. To get to the Hub Key you typically need a different Hub Key Lookup: look up the Hub Key for a particular alternate value for that Business Key as stored in the Key Satellite. A lookup directly to the Key Satellite is sufficient but purists may want to join this to the Hub or perform an additional sanity check lookup against the Hub table. Key Satellites are flexible entities and can be used to house various alternative keys in a normal or multi-active format and for various source systems
  • Unique Key constraint: there are situations where even Business Keys are reused (redistributed) and in this case the Hub ETL can be extended to cater for this in end-dating a discarded Business Key and issuing a new Hub Key for the Business Key which gets a new meaning at a point in time. This requires a Satellite to record this information, which is typically handled in a Key Satellite. This does require to remove the Unique Key constraint at the Hub table as you essentially have a duplicate Business Key, which means something difference (and therefore gets its own Hub Key) at different points in time. Alternatively, this can be handled by using a concatenated key from that point in time onwards
Roelant Vos

Roelant Vos

You may also like...

1 Response

  1. May 21, 2013

    […] This is an overview of the standard Hub processing and it works in most scenarios. Having said that the exception of the rule are documented here: Hub ETL exception. […]


Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.