Category: Data Vault

0

Creating Data Vault Point-In-Time and Dimension tables: merging historical data sources

Beyond creating Hubs, Links and Satellites and current-state (Type 1) views off the Data Vault, one of the most common requirements is the ability to represent a complete history of changes for a specific business entity (Hub, Link or groups of those). If a given Hub has on average 3 or 4 Satellites, is it useful at the very least to see the full history of changes for that specific Hub across all Satellites. How to...

 
1

Advanced row condensing for Satellites

When it comes to record condensing, DISTINCT just doesn’t cut it. I’ve been meaning to post about this for ages as the earliest templates (as also posted on this site) were not flexible enough to work in all cases (which is what we strive for). Record condensing Record condensing is making sure that the data delta (differential) you process is a true delta for the specific scope. It’s about making sure no redundant records are processed into...

 
0

NoETL – Data Vault Link tables

Virtualising Data Vault Link structures follows a similar process to that of the virtual Hubs, with some small additions such as the support for (optional) degenerate attributes. To make things a bit more interesting I created some metadata that requires different Business Key ‘types’ so this can be shown and tested in the virtualisation program. For the example in this post I created three Link definitions (the metadata), one of which (LNK_CUSTOMER_COSTING) has a three-way relationship with the following...

 
0

Quick and easy referential integrity validation (for dynamic testing)

This post is in a way related to the recent post about generating some test data. In a similar way I was looking for ways to make life a bit easier when it comes to validating the outputs of Data Vault ETL processes. Some background is provided in an earlier post on the topic of Referential Integrity (RI) specifically in the context of Data Vault 2.0. In short, by adopting the hash key concepts it...

 
1

NoETL – Data Vault Satellite tables

The recent presentations provides a push to wrap up the development and release of the Data Vault virtualisation initiative, so now everything is working properly the next few posts should be relatively quick to produce. First off is the Satellite processing, which supports the typical elements we have seen earlier: Regular, composite, concatenated business keys with hashing Zero record provision Reuse of the objects for ETL purposes if required As this is another process going...

 
1

Zero / ghost records in Data Vault Satellites versus Point In Time (PIT) tables

As posted earlier recent evolution of the Data Vault 2.0 conventions aim to remove the creation of zero records (or ‘ghost records’) in Satellites. Zero records have the sole aim of making sure that every business key in a Satellite has a complete timeline (e.g. 1900-01-01 to 9999-12-31) so that records are always returned when you query the state of the world at any given date. For instance if a certain record is created in...

 
1

World Wide Data Vault Consortium key takeaways

Last week I attended the second iteration of the World Wide Data Vault Consortium (WWDVC) as hosted by Dan Linstedt in his home state Vermont. It was great to experience the uptake in Data Vault, going from a small group of practitioners last year to a bigger group with lots of new faces this year. Especially engaging was a day prior to the conference of in-depth discussions about various use-cases and technical solutions and improvements...

 
1

NoETL – Data Vault Hub tables

In the previous posts we have loaded a proper data delta (Staging Area) and archived this in the Persistent Staging Area (PSA). In my designs, the PSA is the foundation for any form of upstream virtualisation – both for the Integration Layer (Data Vault) and subsequently the Presentation Layer (Dimensional Model, or anything fit-for-purpose). The Presentation Layer sits ‘on top off’ the Data Vault the same as it would be in the physical implementation so you...

 
1

NoETL (Not Only ETL) – virtualization revisited

For the last couple of weeks I have been working on a simple tool to support the Data Warehouse virtualisation concepts in practice. This is based on the idea that if you can generate the ETL you need, you can also virtualise these processes if performance requirements and / or relevant constraints allow for it. This is why I was looking for a way to virtualise where it would be possible (performance wise), and instantiate (generate ETL) where...

 
0

Data Vault 2.0 Staging Area learnings & suggestions

With Data Vault 2.0 the Data Vault methodology introduces (amongst other things)  a more formalised solution architecture which includes a Staging Area. In the designs as advocated in this blog this Staging Area is part of a conceptual Staging Layer that also cover the Persistent Staging Area (PSA). While updating the documentation in general I updated various sections in the Staging Layer definition and this prompted me to highlight some experiences specifically with implementing the...