Roelant Vos - Page 12 of 21 - Data solution design patterns, implementation, and automation

Roelant Vos Data solution design patterns, implementation, and automation

October 30, 2015

The Data Warehouse Time Machine: synchronising model and automation metadata versions

I’ve completed a fairly large body of work that I’ve been meaning to do for a long time: how to automatically version the Data Warehouse data model in sync with the version of the ETL automation metadata. Although versioning models and code is relevant (but rarely implemented) in the traditional ETL area, this requirement to becomes very real when moving to a virtualised Data Warehouse / integrated model approach (Data Vault 2.0 in my case)....

General

October 20, 2015

Virtual Enterprise Data Warehouse ideas & updates (towards 1.2)

Lately I have had a bit more head space to work on some ideas I find interesting, and these are now intended to culminate into ‘version 1.2’ of the Virtual EDW tool I have been developing. I’ve been using this tool extensively for various Data Warehouses and am generally very happy with it as a quick prototyping tool. But what really starts to play up is the requirement for a physical Data Vault (Integration Layer), as...

General

August 11, 2015

Loading too fast for unique date / time stamps – what to do?

Let’s start by clarifying that this concerns the RDBMS world, not the Hadoop world 😉 It’s a good problem to have – loading data too quickly. So quickly that, even at high precision, multiple changes for the same key end up being inserted with the same Load Date/Time Stamp (LDTS). What happens here? A quick recap: in Data Vault the Load Date/Time Stamp (LDTS, LOAD_DTS, or INSERT_DATETIME) is defined as the moment data is recorded...

General

June 26, 2015

Data Warehouse versioning… for virtualisation

Recent discussions around Data Warehouse virtualisation made me realise I forgot to post one of the important requirements: version control. In the various recent presentations this was discussed at length but somehow it didn’t make it to the transcript. Data Warehouse virtualisation needs versioning. Think of it this way – if you can drop and refactor your Data Warehouse based on (the changes in your) metadata then your upstream reports and analytics are very likely...

General

June 18, 2015

NoETL – Data Vault Link Satellite tables (part 2)

This is the second part of the Link Satellite virtualisation overview (the first post on this topic is here), and it dives deeper into the logic behind Driving Key based Link Satellites. Driving Key implementation is arguably one of the more complex things to implement in Data Vault – and you (still) need to ensure you can cover reloads (deterministic outputs!), zero records / time variance and things such as re-opening closed relationships. In the example...

General

June 13, 2015

NoETL – Data Vault Link Satellite tables (part 1)

The final of the series of planned posts (for now at least) about Data Warehouse Virtualisation is all about Link Satellites. As with some of the earlier posts there are various similarities to the earlier approaches – most notably the Satellite virtualisation and processing. Concepts such as zero records and ‘virtual’ or computed end-dating are all there again, as are the constructions of using subqueries to do attribute mapping and outer queries to calculate hash...

General

June 11, 2015

NoETL and ETL automation metadata overview

One of the last items to write about regarding Data Warehouse virtualisation (and any other form of ETL generation) is the handling of the metadata itself. In a previous post I covered what metadata needs to be captured at a minimum for ETL automation, and this post is all about how to incorporate this metadata from various locations. One technique in particular I tend to use is the user defined properties (or extended properties) of...

Data Vault / ETL

June 7, 2015

NoETL – Data Vault Link tables

Virtualising Data Vault Link structures follows a similar process to that of the virtual Hubs, with some small additions such as the support for (optional) degenerate attributes. To make things a bit more interesting I created some metadata that requires different Business Key ‘types’ so this can be shown and tested in the virtualisation program. For the example in this post I created three Link definitions (the metadata), one of which (LNK_CUSTOMER_COSTING) has a three-way relationship with the following...

Data Vault / ETL

June 3, 2015

Quick and easy referential integrity validation (for dynamic testing)

This post is in a way related to the recent post about generating some test data. In a similar way I was looking for ways to make life a bit easier when it comes to validating the outputs of Data Vault ETL processes. Some background is provided in an earlier post on the topic of Referential Integrity (RI) specifically in the context of Data Vault 2.0. In short, by adopting the hash key concepts it...

Data Vault / ETL

June 3, 2015

NoETL – Data Vault Satellite tables

The recent presentations provides a push to wrap up the development and release of the Data Vault virtualisation initiative, so now everything is working properly the next few posts should be relatively quick to produce. First off is the Satellite processing, which supports the typical elements we have seen earlier: Regular, composite, concatenated business keys with hashing Zero record provision Reuse of the objects for ETL purposes if required As this is another process going...

Roelant Vos Data solution design patterns, implementation, and automation

The Data Warehouse Time Machine: synchronising model and automation metadata versions

Virtual Enterprise Data Warehouse ideas & updates (towards 1.2)

Loading too fast for unique date / time stamps – what to do?

Data Warehouse versioning… for virtualisation

NoETL – Data Vault Link Satellite tables (part 2)

NoETL – Data Vault Link Satellite tables (part 1)

NoETL and ETL automation metadata overview

NoETL – Data Vault Link tables

Quick and easy referential integrity validation (for dynamic testing)

NoETL – Data Vault Satellite tables

Search this site

Upcoming Events

Recent Posts