Roelant Vos An expert view on Agile Data Warehousing

0

Takeaways from the World Wide Data Vault Conference

 

 There were some really great sessions at the World Wide Data Vault Conference (#WWDVC), and I really need to take a bit of time to upgrade some of my templates to DV2.0. Thankfully that’s pretty straightforward with the framework in place already; the changes, or upgrade, can be automated as well using the same metadata. My takeaways are; There is a push towards virtualisation for the Presentation Layer / Dimensional Model (or similar); using views directly...

0

User Defined Properties for ETL Automation – the final piece of the puzzle

 

 At the end of the development efforts to support true Model Driven Design there are some elements you just don’t want to store in another metadata table. But somehow you need specific information which isn’t available in the Data Dictionary either (or directly derivable that way). In my case I wanted to label attributes within a Dimensional Model to behave following specific paradigms for presenting history – Type0, Type1 or Type2 – and essentially use this information...

0

Referential Integrity in Data Vault

 

 Traditionally, Referential Integrity (RI) is not enforced for a Data Warehouse in some approaches. This may result that the Foreign Key (FK) constraint is either not generated at all, or generated but disabled on database level. The latter can improve the efficiency of the RDBMS optimiser depending on the platform used (and should by a case-by-case consideration). Unlike operational (transactional) systems, data for a DWH is prepared (scrubbed) before it is inserted and this happens in...

1

World Wide Data Vault Consortium (User Group) meeting

 

 Hi everyone, I will attend the first World Wide DV Consortium (User Group) meeting in St. Albans (Vermont / USA) and talk about the Data Vault automation approaches in general, but SSIS in particular. Looking forward to it and hopefully with so many Data Vault experts we’ll get the opportunity to build & expand on this. Hope you see you there Roelant

1

Data Vault ETL Implementation using SSIS: Step 5 – Satellite ETL – part 3 – End Dating

 

 For various reasons including (but not limited to) ease of maintenance, speed, reusability, parallelism and reducing of complexity in general, End Dating is implemented as a stand-alone generic ETL process to support the Data Vault Satellite ETL. Conceptually, End Dating is a redundant step because the information is derived from already available information: the Effective Date. However, in most situations it makes upstream ETL processing easier to manage and may be worth the storage/performance trade-off. But,...

0

Data Vault ETL Implementation using SSIS: Step 5 – Satellite ETL – part 1 – overview

 

 Satellite processing is one of the types of ETL at the core of implementing Data Vault; this article focuses on the ‘vanilla’  handling of Satellite ETL. Exceptions such as Multi-Active/Multi-Variant implementations (depending on which flavour of Data Vault you use) or isolating Satellites from wide (denormalised) source tables or files will be documented in a separate post. Satellite processing for Data Vault is in many ways similar to the process that is defined for the...

0

Data Vault Role Playing in Links and Link Satellites

 

 Recently I have been involved in some (very lively) discussions related to implementation of ‘role playing’ in Data Vault. In other words: how to handle (model) different types of relationships. Over the years my response has been that from the perspective of Data Vault modelling it doesn’t really matter if you create multiple Link relationships between Hubs / business entities, or if you create a single relationship with multiple Link Satellites each handling a relationship...

0

Comments on ‘Modeling the Agile Data Warehouse with Data Vault’

 

 I just finished reading ‘Modeling the Agile Data Warehouse with Data Vault’ by Hans Hultgren. I think it provides a good and clear overview of the Data Warehouse design using Data Vault as technique for the core DWH layer as well and, most importantly, covers some practical aspects that have been implemented but not documented in (practical) detail such as; Key Satellites Identical Business Keys (primarily solved through defining a concatenated key) Points that are up...

1

Data Vault ETL Implementation: Potential exceptions in Hub ETL

 

 The previous blog posting regarding Hub ETL processes (Implementation of Hub ETL) covered the standard (cookie cutter) functionality. However, there are some exceptions that may occur, and perhaps some additional explanations are required of some of the operations: Record Source: in most of the scenarios the Record Source will be the same between the various Hub ETL processes as you’re working towards a core enterprise wide business key. This is why the Record Source is an optional component...

0

Data Vault ETL Implementation using SSIS: Step 4 – Link ETL

 

 In the Data Vault workflow the Link is the next object to typically be loaded after the Hub has been processed. This can be done in parallel with the standard Satellites but this will be covered in a future posting about the workflow/scheduling of Data Vault ETL. The Link ETL comes down to select the distinct pairs of business keys from the respective Staging Area table, check if they’re there and insert them if they’re...