Roelant Vos Data solution design patterns, implementation, and automation

0

Data Vault 2.0 Staging Area learnings & suggestions

With Data Vault 2.0 the Data Vault methodology introduces (amongst other things)  a more formalised solution architecture which includes a Staging Area. In the designs as advocated in this blog this Staging Area is part of a conceptual Staging Layer that also cover the Persistent Staging Area (PSA). While updating the documentation in general I updated various sections in the Staging Layer definition and this prompted me to highlight some experiences specifically with implementing the...

1

Do we still want to automate against ETL tools?

In the various Data Warehouse initiatives I have been involved with recently I tend to use ETL software less and less. Over the last few years I spent a lot of time figuring out how to automate/generate ETL in various tools – most notably in SSIS, Pentaho, Powercenter and recently SAS (but various others as well). To the point that most of the intended solution can be generated from metadata. But as recently outlined in this...

6

Driving Keys and relationship history, one or more tables?

Handling Driving Key type mechanisms is one of the more challenging elements of Data Vault modelling. It’s not necessarily difficult to implement this concept, but more how to interpret this and get the right information out of the Data Vault again. In this post I’ll explore two ways of storing relationship history over time: Using a separate table to store Driving Key information and a separate table to store normal relationship history (type 2) and,...

3

Zero records, time-variance and point-in-time selection

While finalising the posts for the overall Data Vault ETL implementation I have also started thinking about how to document the next steps: the loading patterns for the Presentation Layer. From here on I will refer to the Presentation Layer, Data Marts and Information Marts simply as ‘Information Marts’. This reminded me that I haven’t yet properly covered the ‘zero record’ concept. This is a timely consideration: the whole reason that zero records exist is to make the...

Should we record the fact that a piece is missing? 0

Data Vault implementation overview

I’m almost at the end of the basic outlines for ETL implementation in the context of Data Vault. For a (hopefully) tidy overview I created a page that lists all relevant posts for Data Vault implementation here. I’m working towards writing up the last few topics now, including Zero Records and Link Satellites before focusing on Data Mart automation from a core Data Vault model. I think this covers the essential elements for implementation, but...

4

A brief history of time in Data Vault

To quote Ronald Damhof in yesterday’s twitter conversation: ‘There are no best practices. Just a lot of good practices and even more bad practices’. Sometimes I feel Data Vault lacks a centrally managed, visible, open forum to monitor standards. And, more importantly, the evolution of these standards over time. And, even more importantly, why these standards change over time. It varies (in space and time) where sensible discussions regarding these standards take place, but lately...

6

Virtualising your Data Vault – regular and driving key Link Satellites

Virtualising the EDW core integration layer by applying Data Vault concepts turned out to be a very useful and achievable exercise. So achievable even, that it only requires three posts to present an idea on how this all works. The Hubs and Links are already covered in the first post, and the Satellites in the second. It’s now time for the remaining primary entities: the Link Satellites. What’s the driving key? As explained in this post...

2

Virtualising your Data Vault – Hubs and Links

With Data Vault, the Hub ETLs  are usually  the first to be developed – they are very easy to build once your model is complete! And it was the case with creating these virtualised ETL templates as well. Because Hubs and Links are so similar I covered them both in this post. In this virtualisation Proof of Concept I used the automation metadata I normally use for automating SSIS, Data Services and Powercenter ETL development. Using...

3

Virtualising your Data Vault – Satellites

Once you have nailed the fundamental metadata requirements and prerequisites for Data Vault ETL automation, changing the automation output in terms of target platform or (ETL) tool is relatively easy. This is especially true for Satellites as their implementation in a virtualised setting is usually 1-to-1 with their ETL instantiated counterparts. To make the distinction; Hubs are handled differently for virtualization as you are essentially combining various ETLs into a single Hub view. For example: an ‘Employee’ Hub...

3

Minimal metadata requirements for ETL Automation / Virtualisation (and prerequisites)

At the Worldwide Data Vault conference in Vermont USA I presented the steps to automate the ETL development for your end-to-end Data Warehouse. We put a lot of thought in what would be the absolute minimum of metadata you would need to insert into the automation logic, as most of the details are already ‘readable’ from the data model (and corresponding data dictionary or system tables). Data Vault 2.0 defines a complete solution architecture covering...