Category: Architecture

0

Staging and interpreting XMLs, BLOBs and similar

Recently I had a couple of conversations about the ‘staging’ (loading data delta into your environment) of multi-structured datatypes such as JSON, XML (and some blobs). Today I had one of these conversations with my esteemed ex-colleagues Bruce and Glenn which made me think some additional information and considerations about the recent post to parse XML using XQuery would be a good idea. These conversations focused on where the interpretation of XML should happen: storing the unmodified (raw) XML...

 
1

NoETL (Not Only ETL) – virtualization revisited

For the last couple of weeks I have been working on a simple tool to support the Data Warehouse virtualisation concepts in practice. This is based on the idea that if you can generate the ETL you need, you can also virtualise these processes if performance requirements and / or relevant constraints allow for it. This is why I was looking for a way to virtualise where it would be possible (performance wise), and instantiate (generate ETL) where...

 
1

Do we still want to automate against ETL tools?

In the various Data Warehouse initiatives I have been involved with recently I tend to use ETL software less and less. Over the last few years I spent a lot of time figuring out how to automate/generate ETL in various tools – most notably in SSIS, Pentaho, Powercenter and recently SAS (but various others as well). To the point that most of the intended solution can be generated from metadata. But as recently outlined in this...

 
3

Zero records, time-variance and point-in-time selection

While finalising the posts for the overall Data Vault ETL implementation I have also started thinking about how to document the next steps: the loading patterns for the Presentation Layer. From here on I will refer to the Presentation Layer, Data Marts and Information Marts simply as ‘Information Marts’. This reminded me that I haven’t yet properly covered the ‘zero record’ concept. This is a timely consideration: the whole reason that zero records exist is to make the...

 
0

Data Vault implementation overview

I’m almost at the end of the basic outlines for ETL implementation in the context of Data Vault. For a (hopefully) tidy overview I created a page that lists all relevant posts for Data Vault implementation here. I’m working towards writing up the last few topics now, including Zero Records and Link Satellites before focusing on Data Mart automation from a core Data Vault model. I think this covers the essential elements for implementation, but...

 
3

Virtualising your (Enterprise) Data Warehouse – what do you need?

For a while I have been promoting the concept of defining a Historical (raw) Staging Area / archive to complement the Data Warehouse architecture. A quick recap: the Historical Staging Area is really just an insert-only persistent archive of all original data delta that has been received. One of the great things is that you can deploy this from day one, start capturing changes (data delta) and never have to do an initial load again. In...

 
0

Comparisons between Data Warehouse modelling techniques

This post provides an overview of the main pros and cons for various Data Modelling techniques: Third Normal Form (3NF) – The Corporate Data Model. Dimensional Modelling – Facts and Dimensions. Hybrids – Best of both worlds? Data Vault, Anchor Modelling and similar. It has become a bit of a large post but then again, there is a lot of ground to cover. Third Normal Form (3NF) The pros for 3NF are: Most IT professionals...

 
0

When is Data Vault a suitable solution?

Over the years I have had mixed experiences pitching Data Vault in various situations and have come to a couple of conclusions with regards to applying (or proposing) Data Vault. It is related to various discussions such as having a 2-tiered (layered) or 3-tiered Data Warehouse architecture which in itself is subject to various considerations. And once you decided you do need a 3-tiered approach the discussion between (typically) 3NF and Data Vault as modelling...

 
Data Vault implementation A-Z: Staging data (the conceptual side) 0

Data Vault implementation A-Z: Staging data (the conceptual side)

This is the first of a planned series of implementation designs for implementing Data Vault in an end-to-end Data Warehouse environment. The positioning of the Data Vault concepts and techniques in the greater design of the system (reference architecture) is documented elsewhere in this site, mainly in the ‘papers’ section. Data Vault in itself does not provide a complete solution for most Data Warehouse purposes but provides a great set of modeling techniques to design the...

 
Data Vault implementation preparations – fundamental ETL requirements 0

Data Vault implementation preparations – fundamental ETL requirements

Prior to working my way through the end-to-end ETL solution for Data Vault certain fundamentals must be in place. The reference architecture is one of them and this is largely documented as part of this site and corresponding Wiki. The other main component from an implementation perspective are the requirements for ETL. As all concepts have their place in the reference architecture for good reasons they also have tight relationships and changes to one concept...