Roelant Vos An expert view on Agile Data Warehousing

0

NoETL – Data Vault Link tables

 

 Virtualising Data Vault Link structures follows a similar process to that of the virtual Hubs, with some small additions such as the support for (optional) degenerate attributes. To make things a bit more interesting I created some metadata that requires different Business Key ‘types’ so this can be shown and tested in the virtualisation program. For the example in this post I created three Link definitions (the metadata), one of which (LNK_CUSTOMER_COSTING) has a three-way relationship with the following...

0

Quick and easy referential integrity validation (for dynamic testing)

 

 This post is in a way related to the recent post about generating some test data. In a similar way I was looking for ways to make life a bit easier when it comes to validating the outputs of Data Vault ETL processes. Some background is provided in an earlier post on the topic of Referential Integrity (RI) specifically in the context of Data Vault 2.0. In short, by adopting the hash key concepts it...

1

NoETL – Data Vault Satellite tables

 

 The recent presentations provides a push to wrap up the development and release of the Data Vault virtualisation initiative, so now everything is working properly the next few posts should be relatively quick to produce. First off is the Satellite processing, which supports the typical elements we have seen earlier: Regular, composite, concatenated business keys with hashing Zero record provision Reuse of the objects for ETL purposes if required As this is another process going...

1

Zero / ghost records in Data Vault Satellites versus Point In Time (PIT) tables

 

 As posted earlier recent evolution of the Data Vault 2.0 conventions aim to remove the creation of zero records (or ‘ghost records’) in Satellites. Zero records have the sole aim of making sure that every business key in a Satellite has a complete timeline (e.g. 1900-01-01 to 9999-12-31) so that records are always returned when you query the state of the world at any given date. For instance if a certain record is created in...

1

World Wide Data Vault Consortium key takeaways

 

 Last week I attended the second iteration of the World Wide Data Vault Consortium (WWDVC) as hosted by Dan Linstedt in his home state Vermont. It was great to experience the uptake in Data Vault, going from a small group of practitioners last year to a bigger group with lots of new faces this year. Especially engaging was a day prior to the conference of in-depth discussions about various use-cases and technical solutions and improvements...

1

Quick and easy test data for even more agile development

 

 This post provides a quick and easy way to generate some test records for your unit or system integration testing. The reason I initially started to work on a ‘virtualisation’ approach for Data Vault was to pre-test the ETLs that were generated using whatever automation technique (e.g. SQL, BIML and many others). Using the virtual Data Warehouse – the views I have been posting about recently – you get a representation of the Data Warehouse...

0

Staging and interpreting XMLs, BLOBs and similar

 

 Recently I had a couple of conversations about the ‘staging’ (loading data delta into your environment) of multi-structured datatypes such as JSON, XML (and some blobs). Today I had one of these conversations with my esteemed ex-colleagues Bruce and Glenn which made me think some additional information and considerations about the recent post to parse XML using XQuery would be a good idea. These conversations focused on where the interpretation of XML should happen: storing the unmodified (raw) XML...

0

Data Virtualisation and DWH Virtualisation

 

 I presented a new style of content today at the Sydney DAMA March event, and thought it may be worthwhile to post the transcript as a paper. It is all about virtualisation your entire Data Warehouse, what is needed to achieve this and what it means for ETL and what role emerging Data Virtualisation techniques play here. In a way it’s the overarching story that supports recent blog posts about NoETL. The premise is that we...

1

NoETL – Data Vault Hub tables

 

 In the previous posts we have loaded a proper data delta (Staging Area) and archived this in the Persistent Staging Area (PSA). In my designs, the PSA is the foundation for any form of upstream virtualisation – both for the Integration Layer (Data Vault) and subsequently the Presentation Layer (Dimensional Model, or anything fit-for-purpose). The Presentation Layer sits ‘on top off’ the Data Vault the same as it would be in the physical implementation so you...

0

NoETL – Persistent (History) Staging Area (PSA)

 

 After setting up the initial data staging in the previous post we can load the detected data delta into the historical archive: the Persistent Staging Area (PSA). The PSA is the foundation of the Virtual Enterprise Data Warehouse because all upstream modelling and representation essentially reads from this ‘archive of (data) changes’. This is because the PSA has all the information that was ever presented to the Data Warehouse, either in structured or unstructured format....