Roelant Vos An expert view on Agile Data Warehousing

0

Data Visualisation, Data Warehousing and Big Data: one pitch to rule them all

 

 Of the concepts that have emerged over the last few years, the ‘Data Lake’ is not one of my favourites. Although it has to be said I had a lot of fun out of various parodies on Data Lakes – which I’ll not repeat here! While I am on board with the cheap redundant storage concept it is clear that data management is still needed in this day and age (more than ever, really) and that concepts...

1

Using an ETL platform for your Data Warehouse, is it still relevant?

 

 When I started my career in Data Warehousing and Business Intelligence 15 years ago there was a massive push towards adopting ETL software. Traditionally, specialised ETL software such as Informatica Powercenter, Oracle Warehouse Builder (later superseded by Oracle Data Integration), Microsoft DTS (later superseded by SSIS) and many similar platforms were very successful because of two main reason: ETL software provided a way to ‘explain’ or document what was happening in a way that made...

0

Data Modeling Zone Europe 2014 – #DMZone

 

 The site of the European Data Modeling Zone (DMZ) is up, and it looks really good! I look forward to being in Europe again for this event, which is hosted in Hamburg on the 29th and 30th of September 2014. On behalf of Analytics8 I will present the ‘model driven design’ approach of how the metadata that is embedded in the data model can be used to forward engineer ETL in various platforms. Feel free...

2

Data Vault 2.0 – how to handle Referential Integrity?

 

 I was working on adding some of the automation code to support Data Vault 2.0 and this got me thinking about Referential Integrity (RI)  related to the modifications that Data Vault 2.0 requires. With Data Vault ‘1.0’ Referential Integrity is always enabled (except for very big systems – let’s leave that one out of the scope for now – see this older post) and in Data Vault 2.0 this hasn’t changed according to the specifications. For Data...

1

Data Vault 2.0 – Introduction and (technical) differences with 1.0

 

 While finalizing the content covering Data Vault implementation it is time to start looking forward towards Data Vault 2.0. For this purpose it makes sense to provide an overview of the changes between the two data modelling approaches. To limit the scope to implementation I’ll consider it sufficient to mention that Data Vault 2.0 (DV2.0) is a complete approach covering not only the modelling (that was already part of DV1.0) but also the layered DWH...

0

Data Vault ETL Implementation using SSIS: Step 7 – Link Satellite ETL – part 1 – Regular History

 

 From an ETL development perspective the Data Vault Link Satellite (LSAT) ETL template is the most complex, and also conceptually the most challenging. This is mainly because the behavior of ETL depends on the type of history that is tracked. To define proper LSAT ETL templates for automation I classify this as follows: Regular / Historical Link Satellites that track the changes for attributes of a relationship over time. This is very similar to Satellite ETL...

4

Data Vault ETL Implementation using SSIS: Step 5 – Satellite ETL – part 2 – SQL selection (wide table exception)

 

 There is more to say on exactly how Satellite selections are made as originally documented in the Satellite ETL post. In many cases not all records from a Staging Area table are selected, but rather a subset. This creates an interesting issue to which I usually refer to as the ‘wide table exception’. It leads to a minor change in the SQL selection for Satellite entities. The outline and solution are described in this post. Image the scenario where...

0

Data Vault ETL Implementation using SSIS: Step 3 – Hub ETL – part 2 – SQL selection

 

 The outline ETL for Data Vault Hub style ETL using SSIS has been documented in ‘Hub implementation’ and ‘Hub exceptions’. As a minor addition I would like to dive a bit deeper into the selection logic associated with Hub ETL. Following the experiences ‘from the trenches’ there are basically two ways to create SQL for a Hub selection from the Staging Area; Straight-up selection of the business key Selection including a composite business key (e.g. assembled from...

0

Takeaways from the World Wide Data Vault Conference

 

 There were some really great sessions at the World Wide Data Vault Conference (#WWDVC), and I really need to take a bit of time to upgrade some of my templates to DV2.0. Thankfully that’s pretty straightforward with the framework in place already; the changes, or upgrade, can be automated as well using the same metadata. My takeaways are; There is a push towards virtualisation for the Presentation Layer / Dimensional Model (or similar); using views directly...

0

User Defined Properties for ETL Automation – the final piece of the puzzle

 

 At the end of the development efforts to support true Model Driven Design there are some elements you just don’t want to store in another metadata table. But somehow you need specific information which isn’t available in the Data Dictionary either (or directly derivable that way). In my case I wanted to label attributes within a Dimensional Model to behave following specific paradigms for presenting history – Type0, Type1 or Type2 – and essentially use this information...