Category: Data Vault

0

Quick and easy referential integrity validation (for dynamic testing)

This post is in a way related to the recent post about generating some test data. In a similar way I was looking for ways to make life a bit easier when it comes to validating the outputs of Data Vault ETL processes. Some background is provided in an earlier post on the topic of Referential Integrity (RI) specifically in the context of Data Vault 2.0. In short, by adopting the hash key concepts it...

 
1

NoETL – Data Vault Satellite tables

The recent presentations provides a push to wrap up the development and release of the Data Vault virtualisation initiative, so now everything is working properly the next few posts should be relatively quick to produce. First off is the Satellite processing, which supports the typical elements we have seen earlier: Regular, composite, concatenated business keys with hashing Zero record provision Reuse of the objects for ETL purposes if required As this is another process going...

 
1

Zero / ghost records in Data Vault Satellites versus Point In Time (PIT) tables

As posted earlier recent evolution of the Data Vault 2.0 conventions aim to remove the creation of zero records (or ‘ghost records’) in Satellites. Zero records have the sole aim of making sure that every business key in a Satellite has a complete timeline (e.g. 1900-01-01 to 9999-12-31) so that records are always returned when you query the state of the world at any given date. For instance if a certain record is created in...

 
1

World Wide Data Vault Consortium key takeaways

Last week I attended the second iteration of the World Wide Data Vault Consortium (WWDVC) as hosted by Dan Linstedt in his home state Vermont. It was great to experience the uptake in Data Vault, going from a small group of practitioners last year to a bigger group with lots of new faces this year. Especially engaging was a day prior to the conference of in-depth discussions about various use-cases and technical solutions and improvements...

 
1

NoETL – Data Vault Hub tables

In the previous posts we have loaded a proper data delta (Staging Area) and archived this in the Persistent Staging Area (PSA). In my designs, the PSA is the foundation for any form of upstream virtualisation – both for the Integration Layer (Data Vault) and subsequently the Presentation Layer (Dimensional Model, or anything fit-for-purpose). The Presentation Layer sits ‘on top off’ the Data Vault the same as it would be in the physical implementation so you...

 
1

NoETL (Not Only ETL) – virtualization revisited

For the last couple of weeks I have been working on a simple tool to support the Data Warehouse virtualisation concepts in practice. This is based on the idea that if you can generate the ETL you need, you can also virtualise these processes if performance requirements and / or relevant constraints allow for it. This is why I was looking for a way to virtualise where it would be possible (performance wise), and instantiate (generate ETL) where...

 
0

Data Vault 2.0 Staging Area learnings & suggestions

With Data Vault 2.0 the Data Vault methodology introduces (amongst other things)  a more formalised solution architecture which includes a Staging Area. In the designs as advocated in this blog this Staging Area is part of a conceptual Staging Layer that also cover the Persistent Staging Area (PSA). While updating the documentation in general I updated various sections in the Staging Layer definition and this prompted me to highlight some experiences specifically with implementing the...

 
4

Driving Keys and relationship history, one or more tables?

Handling Driving Key type mechanisms is one of the more challenging elements of Data Vault modelling. It’s not necessarily difficult to implement this concept, but more how to interpret this and get the right information out of the Data Vault again. In this post I’ll explore two ways of storing relationship history over time: Using a separate table to store Driving Key information and a separate table to store normal relationship history (type 2) and,...

 
2

A brief history of time in Data Vault

To quote Ronald Damhof in yesterday’s twitter conversation: ‘There are no best practices. Just a lot of good practices and even more bad practices’. Sometimes I feel Data Vault lacks a centrally managed, visible, open forum to monitor standards. And, more importantly, the evolution of these standards over time. And, even more importantly, why these standards change over time. It varies (in space and time) where sensible discussions regarding these standards take place, but lately...

 
6

Virtualising your Data Vault – regular and driving key Link Satellites

Virtualising the EDW core integration layer by applying Data Vault concepts turned out to be a very useful and achievable exercise. So achievable even, that it only requires three posts to present an idea on how this all works. The Hubs and Links are already covered in the first post, and the Satellites in the second. It’s now time for the remaining primary entities: the Link Satellites. What’s the driving key? As explained in this post...