Tagged: Virtualisation

1

NoETL – Data Vault Satellite tables

The recent presentations provides a push to wrap up the development and release of the Data Vault virtualisation initiative, so now everything is working properly the next few posts should be relatively quick to produce. First off is the Satellite processing, which supports the typical elements we have seen earlier: Regular, composite, concatenated business keys with hashing Zero record provision Reuse of the objects for ETL purposes if required As this is another process going...

 
1

NoETL – Data Vault Hub tables

In the previous posts we have loaded a proper data delta (Staging Area) and archived this in the Persistent Staging Area (PSA). In my designs, the PSA is the foundation for any form of upstream virtualisation – both for the Integration Layer (Data Vault) and subsequently the Presentation Layer (Dimensional Model, or anything fit-for-purpose). The Presentation Layer sits ‘on top off’ the Data Vault the same as it would be in the physical implementation so you...

 
0

NoETL – Persistent (History) Staging Area (PSA)

After setting up the initial data staging in the previous post we can load the detected data delta into the historical archive: the Persistent Staging Area (PSA). The PSA is the foundation of the Virtual Enterprise Data Warehouse because all upstream modelling and representation essentially reads from this ‘archive of (data) changes’. This is because the PSA has all the information that was ever presented to the Data Warehouse, either in structured or unstructured format....

 
1

NoETL (Not Only ETL) – virtualization revisited

For the last couple of weeks I have been working on a simple tool to support the Data Warehouse virtualisation concepts in practice. This is based on the idea that if you can generate the ETL you need, you can also virtualise these processes if performance requirements and / or relevant constraints allow for it. This is why I was looking for a way to virtualise where it would be possible (performance wise), and instantiate (generate ETL) where...

 
1

Do we still want to automate against ETL tools?

In the various Data Warehouse initiatives I have been involved with recently I tend to use ETL software less and less. Over the last few years I spent a lot of time figuring out how to automate/generate ETL in various tools – most notably in SSIS, Pentaho, Powercenter and recently SAS (but various others as well). To the point that most of the intended solution can be generated from metadata. But as recently outlined in this...

 
6

Virtualising your Data Vault – regular and driving key Link Satellites

Virtualising the EDW core integration layer by applying Data Vault concepts turned out to be a very useful and achievable exercise. So achievable even, that it only requires three posts to present an idea on how this all works. The Hubs and Links are already covered in the first post, and the Satellites in the second. It’s now time for the remaining primary entities: the Link Satellites. What’s the driving key? As explained in this post...

 
2

Virtualising your Data Vault – Hubs and Links

With Data Vault, the Hub ETLs  are usually  the first to be developed – they are very easy to build once your model is complete! And it was the case with creating these virtualised ETL templates as well. Because Hubs and Links are so similar I covered them both in this post. In this virtualisation Proof of Concept I used the automation metadata I normally use for automating SSIS, Data Services and Powercenter ETL development. Using...

 
3

Virtualising your Data Vault – Satellites

Once you have nailed the fundamental metadata requirements and prerequisites for Data Vault ETL automation, changing the automation output in terms of target platform or (ETL) tool is relatively easy. This is especially true for Satellites as their implementation in a virtualised setting is usually 1-to-1 with their ETL instantiated counterparts. To make the distinction; Hubs are handled differently for virtualization as you are essentially combining various ETLs into a single Hub view. For example: an ‘Employee’ Hub...

 
3

Virtualising your (Enterprise) Data Warehouse – what do you need?

For a while I have been promoting the concept of defining a Historical (raw) Staging Area / archive to complement the Data Warehouse architecture. A quick recap: the Historical Staging Area is really just an insert-only persistent archive of all original data delta that has been received. One of the great things is that you can deploy this from day one, start capturing changes (data delta) and never have to do an initial load again. In...