Category: General

Remarks, expressions and thoughts. And everything that does not fit in the other categories :)

0

Early-bird for upcoming training (22-24 October) about to expire

If you are interested in understanding and discussing the intricacies of developing a Data Vault based data platform / Data Warehouse, then please consider the upcoming Data Vault implementation training course. It is a practical training focused on understanding the impacts of various design decisions on the (Data Vault) patterns and overall solution architecture. Regardless whether you develop your solution using custom scripting or by using an ‘off-the-shelf’ vendor application, you need detailed understand how...

 
0

How to be sure you load 100% of the data 100% of the time!

How can you be sure you load 100% of the data 100% of the time? This is an article that expands on a topic that often comes up in the Data Vault implementation training: applying Referential Integrity and consistency checks in Data Vault. It sounds straightforward, but in reality these are interesting topics for any Data Warehouse (DWH) and Data Vault practitioner. I started to write a blog post about it, but before I knew...

 
0

Update on development & collaboration efforts

Development update This is a status update for the small community of open source developers and collaborators that use, or contribute to, the growing ecosystem of metadata management and Data Vault generation tooling. I’ve worked hard on creating a more stable set of tooling around ETL generation and as a result the TEAM and VEDW applications (as well as some others) have now been completely separated with a view to increase greater interoperability with other...

 
0

Upcoming training and events

Hi everyone, sorry it has been a while! I’m done travelling for a while (both for work as well as leisure) and have been a bit quiet working on new training materials and code updates for the various collaboration areas. As a result, and in the short term, I will resume posting plenty of articles on the weblog to capture various lessons learned and ideas as well as open-source code releases – so watch this...

 
3

Using (and moving to) raw data types for hash keys

Making hash keys smaller A few months ago I posted an article explaining the merits of the ‘natural business key‘, which can make sense in certain situations. And, from a more generic perspective, why this is something the Data Warehouse management system (‘the engine‘) would be able to figure out automatically and change on the fly when required. This article used the common approach of storing the hash values in character fields (i.e. CHAR(32) for...

 
0

Registration now working!

I’ve finally properly (I think) configured the website to allow registration and the adding of comments in a user-friendly way, without having the burden of endless spambots. Registration, the creation of an account, will allow commenting and discussing content on the site itself which is a big improvement over the current email-based correspondence. After having the account setup you will receive a welcome email and be able to log in to the site using the...

 
0

Adopting GitHub for documentation, and resulting blog changes

After having used Git(Hub) to work and collaborate on code for a long time, I have recently spent some time to merge and move various documentation artefacts to GitHub as well. This covers the Data Integration framework and Enterprise Data Warehouse (EDW) architecture documentation, most importantly the various Design Patterns and Solution Patterns. These patterns form the central body of content that actually try to explain how things work in practice. I think it makes a...

 
0

Is Data Vault becoming obsolete?

What value do we get from having an intermediate hyper-normalised layer? Let me start by stating that a Data Warehouse is a necessary evil at the best of times. In the ideal world, there would be no need for it, as optimal governance and near real-time multidirectional data harmonisation would have created an environment where it is easy to retrieve information without any ambiguity across systems (including its history of changes). Ideally, we would not...

 
1

Some Q&A on Data Warehouse Virtualisation

I receive a fair bit of questions on the Data Warehouse Virtualisation ideas and wanted to respond and discuss this via this post. I don’t have all the answer but can share my views and expectations. When it comes to DWH Virtualisation and the Persistent Staging Area (PSA), the questions generally fall into two categories: Isn’t it too slow? How about performance? Surely users don’t want to wait for hours to see results? Why bother...

 
1

Biml Express 2017 tests, comments and work-arounds

The new version of Biml Express, the free script-based ETL generation plug-in for Visual Studio provided by Varigence, has been out for a few months. Mid-July 2017 to be precise. However only recently I have been able to find some time to properly regression-test this new release against my library of patterns / scripts. The driver is the upcoming Data Modelling Zone event and Data Vault Implementation & Automation training sessions – better keep up...