Roelant Vos An expert view on Agile Data Warehousing

1

World Wide Data Vault Consortium key takeaways

 

 Last week I attended the second iteration of the World Wide Data Vault Consortium (WWDVC) as hosted by Dan Linstedt in his home state Vermont. It was great to experience the uptake in Data Vault, going from a small group of practitioners last year to a bigger group with lots of new faces this year. Especially engaging was a day prior to the conference of in-depth discussions about various use-cases and technical solutions and improvements...

1

Quick and easy test data for even more agile development

 

 This post provides a quick and easy way to generate some test records for your unit or system integration testing. The reason I initially started to work on a ‘virtualisation’ approach for Data Vault was to pre-test the ETLs that were generated using whatever automation technique (e.g. SQL, BIML and many others). Using the virtual Data Warehouse – the views I have been posting about recently – you get a representation of the Data Warehouse...

0

Staging and interpreting XMLs, BLOBs and similar

 

 Recently I had a couple of conversations about the ‘staging’ (loading data delta into your environment) of multi-structured datatypes such as JSON, XML (and some blobs). Today I had one of these conversations with my esteemed ex-colleagues Bruce and Glenn which made me think some additional information and considerations about the recent post to parse XML using XQuery would be a good idea. These conversations focused on where the interpretation of XML should happen: storing the unmodified (raw) XML...

0

Data Virtualisation and DWH Virtualisation

 

 I presented a new style of content today at the Sydney DAMA March event, and thought it may be worthwhile to post the transcript as a paper. It is all about virtualisation your entire Data Warehouse, what is needed to achieve this and what it means for ETL and what role emerging Data Virtualisation techniques play here. In a way it’s the overarching story that supports recent blog posts about NoETL. The premise is that we...

1

NoETL – Data Vault Hub tables

 

 In the previous posts we have loaded a proper data delta (Staging Area) and archived this in the Persistent Staging Area (PSA). In my designs, the PSA is the foundation for any form of upstream virtualisation – both for the Integration Layer (Data Vault) and subsequently the Presentation Layer (Dimensional Model, or anything fit-for-purpose). The Presentation Layer sits ‘on top off’ the Data Vault the same as it would be in the physical implementation so you...

0

NoETL – Persistent (History) Staging Area (PSA)

 

 After setting up the initial data staging in the previous post we can load the detected data delta into the historical archive: the Persistent Staging Area (PSA). The PSA is the foundation of the Virtual Enterprise Data Warehouse because all upstream modelling and representation essentially reads from this ‘archive of (data) changes’. This is because the PSA has all the information that was ever presented to the Data Warehouse, either in structured or unstructured format....

1

NoETL (Not Only ETL) – virtualization revisited

 

 For the last couple of weeks I have been working on a simple tool to support the Data Warehouse virtualisation concepts in practice. This is based on the idea that if you can generate the ETL you need, you can also virtualise these processes if performance requirements and / or relevant constraints allow for it. This is why I was looking for a way to virtualise where it would be possible (performance wise), and instantiate (generate ETL) where...

1

Denormalise (flatten) XML using Xquery

 

 ETL software have ways to interpret XML files or structures that vary in maturity, but recently I had to find another way to import XML structures that are received as blobs using (transact) SQL only. Because the upstream patterns (Staging to Integration) can easily handle denormalised source data sets the easiest way was to cast each blob as the XML file it is anyway and completely denormalise it (‘flatten’, more rows…). This is opposed to normalising...

1

Hashing in SQL Server and Oracle for the same output

 

 Thanks go out to my colleagues for testing this out further, but we were testing out a way to get Oracle to generate hash values with the same output as the default SQL Server MD5 hashbytes function. In previous projects I used to rely on the DBMS_CRYPTO package in Oracle but this requires additional privileges. So if there is a some red tape to cut through the following code using the DBMS_OBFUSCATION_TOOLKIT works as well, and you...

0

Data Vault 2.0 Staging Area learnings & suggestions

 

 With Data Vault 2.0 the Data Vault methodology introduces (amongst other things)  a more formalised solution architecture which includes a Staging Area. In the designs as advocated in this blog this Staging Area is part of a conceptual Staging Layer that also cover the Persistent Staging Area (PSA). While updating the documentation in general I updated various sections in the Staging Layer definition and this prompted me to highlight some experiences specifically with implementing the...