Tagged: Data Vault

0

The Virtual Data Warehouse

Over the weekend I have written up a brief overview and ‘thought piece’ of what I mean when I talk about a Virtual Data Warehouse and Data Warehouse Virtualisation. Please have a look at the article here: The Virtual Data Warehouse. A special thanks to Bret Victor for sharing a fascinating presentation on ‘inventing in principle’. This is a concept, or I should rather say a principle, I have been working on for some time now....

 
1

Schools of thought on implementing Multi-Active Satellites

Right or wrong? When it comes to data management there are almost always various alternatives for implementation and none of them are necessarily right or wrong. They represent the various options and consequences to consider, and the right solution usually is the one which is made with full understanding of these consequences, with ‘eyes wide open’. Supporting multi-active, sometimes referred to as ‘multi-variant’ or ‘multi-valued’, behaviour of Satellites is one of these areas where opinions...

 
3

Using (and moving to) raw data types for hash keys

Making hash keys smaller A few months ago I posted an article explaining the merits of the ‘natural business key‘, which can make sense in certain situations. And, from a more generic perspective, why this is something the Data Warehouse management system (‘the engine‘) would be able to figure out automatically and change on the fly when required. This article used the common approach of storing the hash values in character fields (i.e. CHAR(32) for...

 
0

Is Data Vault becoming obsolete?

What value do we get from having an intermediate hyper-normalised layer? Let me start by stating that a Data Warehouse is a necessary evil at the best of times. In the ideal world, there would be no need for it, as optimal governance and near real-time multidirectional data harmonisation would have created an environment where it is easy to retrieve information without any ambiguity across systems (including its history of changes). Ideally, we would not...

 
1

Embrace your Persistent Staging Area for Eventual Consistency

If you like your PSA so much… A colleague of mine asked me this: ‘if you like the Persistent Staging Area (PSA) concept so much, why not embrace it all the way?’. By this, he meant loading upstream layers such as the Data Vault directly from the PSA instead of from a Staging Area. I was a bit resistant to the idea at first, because this would require incorporation of the PSA as a mandatory...

 
5

Using a Natural Business Key – the end of hash keys?

Do we still need Hash Keys? Now there is a controversial topic! I have been thinking about the need for hash keys for almost a year now, ever since I went to the Data Vault Day in Germany (Hamburg) end of 2016. During this one-day community event, the topic of stepping away from hash keys was raised in one of the discussions after a case study. Both the presentation and following discussion were in German,...

 
0

When a full history of changes is too much: implementing abstraction for Point-In-Time (PIT) and Dimension tables

When changes are just too many When you construct a Point-In-Time (PIT) table or Dimension from your Data Vault model, do you sometimes find yourself in the situation where there are too many change records present? This is because, in the standard Data Vault design, tiny variations when loading data may result in the creation of very small time slices when the various historised data sets (e.g. Satellites) are combined. There is such a thing as too...

 
0

Updated the Data Vault implementation & automation training for 12-14 June in Germany

On the 12th-14th of June I will be delivering the newly styled and updated Data Vault implementation and automation training together with Doerffler & Partner. I am really looking forward to continue the collaboration after last year’s awesome Data Vault Day (organised by Doerffler as well). Working really hard to wrap up the next layer of virtualisation to discuss there and I’m really excited about it: imagine having multiple versions of not only the Data...

 
2

When is a change a ‘change’?

This is a post that touches on what I think is one the essential best-practices for ETL design: the ability to process multiple changes for the same key in a single pass. This is specifically relevant for typical ETL processes that load data to a time-variant target (PSA, Satellite, Dimension etc.). For non-time variant targets (Hubs, Links etc.) the process is a bit easier as this is essentially built-in the patterns already :-). In a given...

 
0

Creating Data Vault Point-In-Time and Dimension tables: merging historical data sources

Beyond creating Hubs, Links and Satellites and current-state (Type 1) views off the Data Vault, one of the most common requirements is the ability to represent a complete history of changes for a specific business entity (Hub, Link or groups of those). If a given Hub has on average 3 or 4 Satellites, is it useful at the very least to see the full history of changes for that specific Hub across all Satellites. How to...