Category: Data Vault

0

Embrace your Persistent Staging Area for Eventual Consistency

If you like your PSA so much… A colleague of mine asked me this: ‘if you like the Persistent Staging Area (PSA) concept so much, why not embrace it all the way?’. By this, he meant loading upstream layers such as the Data Vault directly from the PSA instead of from a Staging Area. I was a bit resistant to the idea at first, because this would require incorporation of the PSA as a mandatory...

 
3

Using a Natural Business Key – the end of hash keys?

Do we still need Hash Keys? Now there is a controversial topic! I have been thinking about the need for hash keys for almost a year now, ever since I went to the Data Vault Day in Germany (Hamburg) end of 2016. During this one-day community event, the topic of stepping away from hash keys was raised in one of the discussions after a case study. Both the presentation and following discussion were in German,...

 
0

When a full history of changes is too much: implementing abstraction for Point-In-Time (PIT) and Dimension tables

When changes are just too many When you construct a Point-In-Time (PIT) table or Dimension from your Data Vault model, do you sometimes find yourself in the situation where there are too many change records present? This is because, in the standard Data Vault design, tiny variations when loading data may result in the creation of very small time slices when the various historised data sets (e.g. Satellites) are combined. There is such a thing as too...

 
0

Creating Data Vault Point-In-Time and Dimension tables: merging historical data sources

Beyond creating Hubs, Links and Satellites and current-state (Type 1) views off the Data Vault, one of the most common requirements is the ability to represent a complete history of changes for a specific business entity (Hub, Link or groups of those). If a given Hub has on average 3 or 4 Satellites, is it useful at the very least to see the full history of changes for that specific Hub across all Satellites. How to...

 
2

Advanced row condensing for Satellites

When it comes to record condensing, DISTINCT just doesn’t cut it. I’ve been meaning to post about this for ages as the earliest templates (as also posted on this site) were not flexible enough to work in all cases (which is what we strive for). Record condensing Record condensing is making sure that the data delta (differential) you process is a true delta for the specific scope. It’s about making sure no redundant records are processed into...

 
0

NoETL – Data Vault Link tables

Virtualising Data Vault Link structures follows a similar process to that of the virtual Hubs, with some small additions such as the support for (optional) degenerate attributes. To make things a bit more interesting I created some metadata that requires different Business Key ‘types’ so this can be shown and tested in the virtualisation program. For the example in this post I created three Link definitions (the metadata), one of which (LNK_CUSTOMER_COSTING) has a three-way relationship with the following...

 
0

Quick and easy referential integrity validation (for dynamic testing)

This post is in a way related to the recent post about generating some test data. In a similar way I was looking for ways to make life a bit easier when it comes to validating the outputs of Data Vault ETL processes. Some background is provided in an earlier post on the topic of Referential Integrity (RI) specifically in the context of Data Vault 2.0. In short, by adopting the hash key concepts it...

 
1

NoETL – Data Vault Satellite tables

The recent presentations provides a push to wrap up the development and release of the Data Vault virtualisation initiative, so now everything is working properly the next few posts should be relatively quick to produce. First off is the Satellite processing, which supports the typical elements we have seen earlier: Regular, composite, concatenated business keys with hashing Zero record provision Reuse of the objects for ETL purposes if required As this is another process going...

 
1

Zero / ghost records in Data Vault Satellites versus Point In Time (PIT) tables

As posted earlier recent evolution of the Data Vault 2.0 conventions aim to remove the creation of zero records (or ‘ghost records’) in Satellites. Zero records have the sole aim of making sure that every business key in a Satellite has a complete timeline (e.g. 1900-01-01 to 9999-12-31) so that records are always returned when you query the state of the world at any given date. For instance if a certain record is created in...

 
1

World Wide Data Vault Consortium key takeaways

Last week I attended the second iteration of the World Wide Data Vault Consortium (WWDVC) as hosted by Dan Linstedt in his home state Vermont. It was great to experience the uptake in Data Vault, going from a small group of practitioners last year to a bigger group with lots of new faces this year. Especially engaging was a day prior to the conference of in-depth discussions about various use-cases and technical solutions and improvements...