Roelant Vos An expert view on Agile Data Warehousing

0

Loading too fast for unique date/time stamps – what to do?

 

 Let’s start by clarifying that this concerns the RDBMS world, not the Hadoop world 😉 It’s a good problem to have – loading data too quickly. So quickly that, even at high precision, multiple changes for the same key end up being inserted with the same Load Date/Time Stamp (LDTS). What happens here? A quick recap: in Data Vault the Load Date/Time Stamp (LDTS, LOAD_DTS, or INSERT_DATETIME) is defined as the moment data is recorded...

0

NoETL – Data Vault Link Satellite tables (part 2)

 

 This is the second part of the Link Satellite virtualisation overview (the first post on this topic is here), and it dives deeper into the logic behind Driving Key based Link Satellites. Driving Key implementation is arguably one of the more complex things to implement in Data Vault – and you (still) need to ensure you can cover reloads (deterministic outputs!), zero records / time variance and things such as re-opening closed relationships. In the example...

1

NoETL – Data Vault Link Satellite tables (part 1)

 

 The final of the series of planned posts (for now at least) about Data Warehouse Virtualisation is all about Link Satellites. As with some of the earlier posts there are various similarities to the earlier approaches – most notably the Satellite virtualisation and processing. Concepts such as zero records and ‘virtual’ or computed end-dating are all there again, as are the constructions of using subqueries to do attribute mapping and outer queries to calculate hash...

0

NoETL – Data Vault Link tables

 

 Virtualising Data Vault Link structures follows a similar process to that of the virtual Hubs, with some small additions such as the support for (optional) degenerate attributes. To make things a bit more interesting I created some metadata that requires different Business Key ‘types’ so this can be shown and tested in the virtualisation program. For the example in this post I created three Link definitions (the metadata), one of which (LNK_CUSTOMER_COSTING) has a three-way relationship with the following...

0

Quick and easy referential integrity validation (for dynamic testing)

 

 This post is in a way related to the recent post about generating some test data. In a similar way I was looking for ways to make life a bit easier when it comes to validating the outputs of Data Vault ETL processes. Some background is provided in an earlier post on the topic of Referential Integrity (RI) specifically in the context of Data Vault 2.0. In short, by adopting the hash key concepts it...

1

NoETL – Data Vault Satellite tables

 

 The recent presentations provides a push to wrap up the development and release of the Data Vault virtualisation initiative, so now everything is working properly the next few posts should be relatively quick to produce. First off is the Satellite processing, which supports the typical elements we have seen earlier: Regular, composite, concatenated business keys with hashing Zero record provision Reuse of the objects for ETL purposes if required As this is another process going...

1

Zero / ghost records in Data Vault Satellites versus Point In Time (PIT) tables

 

 As posted earlier recent evolution of the Data Vault 2.0 conventions aim to remove the creation of zero records (or ‘ghost records’) in Satellites. Zero records have the sole aim of making sure that every business key in a Satellite has a complete timeline (e.g. 1900-01-01 to 9999-12-31) so that records are always returned when you query the state of the world at any given date. For instance if a certain record is created in...

1

World Wide Data Vault Consortium key takeaways

 

 Last week I attended the second iteration of the World Wide Data Vault Consortium (WWDVC) as hosted by Dan Linstedt in his home state Vermont. It was great to experience the uptake in Data Vault, going from a small group of practitioners last year to a bigger group with lots of new faces this year. Especially engaging was a day prior to the conference of in-depth discussions about various use-cases and technical solutions and improvements...