Roelant Vos An expert view on Agile Data Warehousing

2

A brief history of time in Data Vault

 

 To quote Ronald Damhof in yesterday’s twitter conversation: ‘There are no best practices. Just a lot of good practices and even more bad practices’. Sometimes I feel Data Vault lacks a centrally managed, visible, open forum to monitor standards. And, more importantly, the evolution of these standards over time. And, even more importantly, why these standards change over time. It varies (in space and time) where sensible discussions regarding these standards take place, but lately...

5

Virtualising your Data Vault – regular and driving key Link Satellites

 

 Virtualising the EDW core integration layer by applying Data Vault concepts turned out to be a very useful and achievable exercise. So achievable even, that it only requires three posts to present an idea on how this all works. The Hubs and Links are already covered in the first post, and the Satellites in the second. It’s now time for the remaining primary entities: the Link Satellites. What’s the driving key? As explained in this post...

2

Virtualising your Data Vault – Hubs and Links

 

 With Data Vault, the Hub ETLs  are usually  the first to be developed – they are very easy to build once your model is complete! And it was the case with creating these virtualised ETL templates as well. Because Hubs and Links are so similar I covered them both in this post. In this virtualisation Proof of Concept I used the automation metadata I normally use for automating SSIS, Data Services and Powercenter ETL development. Using...

3

Virtualising your Data Vault – Satellites

 

 Once you have nailed the fundamental metadata requirements and prerequisites for Data Vault ETL automation, changing the automation output in terms of target platform or (ETL) tool is relatively easy. This is especially true for Satellites as their implementation in a virtualised setting is usually 1-to-1 with their ETL instantiated counterparts. To make the distinction; Hubs are handled differently for virtualization as you are essentially combining various ETLs into a single Hub view. For example: an ‘Employee’ Hub...

3

Minimal metadata requirements for ETL Automation / Virtualisation (and prerequisites)

 

 At the Worldwide Data Vault conference in Vermont USA I presented the steps to automate the ETL development for your end-to-end Data Warehouse. We put a lot of thought in what would be the absolute minimum of metadata you would need to insert into the automation logic, as most of the details are already ‘readable’ from the data model (and corresponding data dictionary or system tables). Data Vault 2.0 defines a complete solution architecture covering...

3

Virtualising your (Enterprise) Data Warehouse – what do you need?

 

 For a while I have been promoting the concept of defining a Historical (raw) Staging Area / archive to complement the Data Warehouse architecture. A quick recap: the Historical Staging Area is really just an insert-only persistent archive of all original data delta that has been received. One of the great things is that you can deploy this from day one, start capturing changes (data delta) and never have to do an initial load again. In...

0

Data Visualisation, Data Warehousing and Big Data: one pitch to rule them all

 

 Of the concepts that have emerged over the last few years, the ‘Data Lake’ is not one of my favourites. Although it has to be said I had a lot of fun out of various parodies on Data Lakes – which I’ll not repeat here! While I am on board with the cheap redundant storage concept it is clear that data management is still needed in this day and age (more than ever, really) and that concepts...

1

Using an ETL platform for your Data Warehouse, is it still relevant?

 

 When I started my career in Data Warehousing and Business Intelligence 15 years ago there was a massive push towards adopting ETL software. Traditionally, specialised ETL software such as Informatica Powercenter, Oracle Warehouse Builder (later superseded by Oracle Data Integration), Microsoft DTS (later superseded by SSIS) and many similar platforms were very successful because of two main reason: ETL software provided a way to ‘explain’ or document what was happening in a way that made...

0

Data Modeling Zone Europe 2014 – #DMZone

 

 The site of the European Data Modeling Zone (DMZ) is up, and it looks really good! I look forward to being in Europe again for this event, which is hosted in Hamburg on the 29th and 30th of September 2014. On behalf of Analytics8 I will present the ‘model driven design’ approach of how the metadata that is embedded in the data model can be used to forward engineer ETL in various platforms. Feel free...

2

Data Vault 2.0 – how to handle Referential Integrity?

 

 I was working on adding some of the automation code to support Data Vault 2.0 and this got me thinking about Referential Integrity (RI)  related to the modifications that Data Vault 2.0 requires. With Data Vault ‘1.0’ Referential Integrity is always enabled (except for very big systems – let’s leave that one out of the scope for now – see this older post) and in Data Vault 2.0 this hasn’t changed according to the specifications. For Data...