Roelant Vos An expert view on Agile Data Warehousing

Data Vault in Brisbane (Australia): the new user group is up! 0

Data Vault in Brisbane (Australia): the new user group is up!

 

 Happy New Year! We have just initiated a platform for local Brisbane Data Vault enthusiasts to get together and share information and improve the methodology. If you’re living or working close to Brisbane, definitiely check out the Brisbane Data Vault user group. I look forward to meeting anyone interested through this new portal!

Data Vault implementation A-Z: Staging data (the conceptual side) 0

Data Vault implementation A-Z: Staging data (the conceptual side)

 

 This is the first of a planned series of implementation designs for implementing Data Vault in an end-to-end Data Warehouse environment. The positioning of the Data Vault concepts and techniques in the greater design of the system (reference architecture) is documented elsewhere in this site, mainly in the ‘papers’ section. Data Vault in itself does not provide a complete solution for most Data Warehouse purposes but provides a great set of modeling techniques to design the...

Data Vault implementation preparations – fundamental ETL requirements 0

Data Vault implementation preparations – fundamental ETL requirements

 

 Prior to working my way through the end-to-end ETL solution for Data Vault certain fundamentals must be in place. The reference architecture is one of them and this is largely documented as part of this site and corresponding Wiki. The other main component from an implementation perspective are the requirements for ETL. As all concepts have their place in the reference architecture for good reasons they also have tight relationships and changes to one concept...

Designing reference tables for the Data Warehouse 1

Designing reference tables for the Data Warehouse

 

 In a typical Data Warehouse it is common to introduce additional descriptive information that is not provided by the operational systems feeding data into the Data Warehouse. However, the exact positioning and implementation of this reference data can cause confusion. Especially when applying this concept in the Integration Layer. Reference data is additional contextual or descriptive information that is not provided by the source system. Examples are descriptions for industry standard codes. This information can...

Implementing User Managed data (User Managed Staging Tables) 0

Implementing User Managed data (User Managed Staging Tables)

 

 Related to the handling of reference data it is sometimes required to feed information into the Data Warehouse that does not have a formal source in the organisation’s information landscape. This information can be needed in the form of reference data (additional information about attributes provided by a source), relationships or really anything that is required to ultimately meet reporting requirements. It can be a vital element to ‘glue’ information together or to provide details...

Data Vault comparisons 1

Data Vault comparisons

 

 I have drafted a comparison between Data Vault and normalised (3NF) and denormalised (Kimball) models for reference. This comparison is applicable for using these models as the core Data Warehouse model as opposed to modelling for reporting purposes (i.e. data marts).

Data Vault versus the persistent Staging Area 3

Data Vault versus the persistent Staging Area

 

 One of the questions I regularly get during presentations is what the benefits of Data Vault are over a persistent Staging Area. In other words: why go through the effort of defining a Data Vault model when you can receive the same ‘regeneration / recreation’ capabilities with a persistent Staging Area which directly feeds a Dimensional Model or similar presentation model. First off; in my reference architecture I use both the Data Vault (in the...

Changes to the site(s) 0

Changes to the site(s)

 

 In preparation of enabling an online (proof of concept) ETL generation site  I have updated various sections of this weblog to conform to the implementations (templates) used for ETL generation. Moreover most documentation has been moved to the corresponding Wiki and consequently removed from this site. Ultimately the Wiki is better suited to add more detailed documentation and examples and as such only the high level concepts and positioning are available on this site. To complete the...

ETL generation in SSIS (adult steps) 0

ETL generation in SSIS (adult steps)

 

 Full generation of ETL is the missing component towards the model driven design of the Data Warehouse and I am still pursuing various methods for various ETL suites to add to this concept. Some time ago I looked into ETL generation for Microsoft SSIS using the available DTS libraries (see the post) which really were baby steps. At some point one of my colleagues suggested looking into ‘BIML’ (Business Intelligence Markup Language) and the accompanying...

Handling logical deletes in the Data Warehouse 1

Handling logical deletes in the Data Warehouse

 

 While working on a recent project I had a brief discussion on the implementation of logical deletes. This prompted me to define for once and for all how ETL should handle these occurences. This is, of course, assuming that some CDC mechanism provides the required details or you have the capacity to compare full sets of data to derive the logical deletes yourself. For this purpose I drafted a Design Pattern which should explain this...