Roelant Vos An expert view on Agile Data Warehousing

1

Biml Express 2017 tests, comments and work-arounds

 

 The new version of Biml Express, the free script-based ETL generation plug-in for Visual Studio provided by Varigence, has been out for a few months. Mid-July 2017 to be precise. However only recently I have been able to find some time to properly regression-test this new release against my library of patterns / scripts. The driver is the upcoming Data Modelling Zone event and Data Vault Implementation & Automation training sessions – better keep up...

0

Updated sample and metadata models for Data Vault generation and virtualisation

 

 After a bit of a pause in working on the weblog and technology (caused by an extended period of high pressure in the day job) I am once again working on some changes in the various concepts I’m writing about on this site. Recently I was made aware of this great little tool that supports easy creation and sharing of simple data models: Quick Database Diagrams (‘QuickDBD’). The tool is 100% online and can be...

3

Using a Natural Business Key – the end of hash keys?

 

 Do we still need Hash Keys? Now there is a controversial topic! I have been thinking about the need for hash keys for almost a year now, ever since I went to the Data Vault Day in Germany (Hamburg) end of 2016. During this one-day community event, the topic of stepping away from hash keys was raised in one of the discussions after a case study. Both the presentation and following discussion were in German,...

0

When a full history of changes is too much: implementing abstraction for Point-In-Time (PIT) and Dimension tables

 

 When changes are just too many When you construct a Point-In-Time (PIT) table or Dimension from your Data Vault model, do you sometimes find yourself in the situation where there are too many change records present? This is because, in the standard Data Vault design, tiny variations when loading data may result in the creation of very small time slices when the various historised data sets (e.g. Satellites) are combined. There is such a thing as too...

0

Updated the Data Vault implementation & automation training for 12-14 June in Germany

 

 On the 12th-14th of June I will be delivering the newly styled and updated Data Vault implementation and automation training together with Doerffler & Partner. I am really looking forward to continue the collaboration after last year’s awesome Data Vault Day (organised by Doerffler as well). Working really hard to wrap up the next layer of virtualisation to discuss there and I’m really excited about it: imagine having multiple versions of not only the Data...

1

When is a change a ‘change’?

 

 This is a post that touches on what I think is one the essential best-practices for ETL design: the ability to process multiple changes for the same key in a single pass. This is specifically relevant for typical ETL processes that load data to a time-variant target (PSA, Satellite, Dimension etc.). For non-time variant targets (Hubs, Links etc.) the process is a bit easier as this is essentially built-in the patterns already :-). In a given...

1

Some insights about … Insights

 

 Can I get some insights, please? Over the years, I have come to somewhat dislike the term ‘insights’ almost to the same level as, say, a ‘Data Lake’. And that’s saying something. Not because these concepts themselves are related that much (they are to some extent, of course). But, because to me personally, they both conjure the same feeling: a mixture of annoyance and desperation. One of the reasons is that since the word ‘insights’...

0

Creating Data Vault Point-In-Time and Dimension tables: merging historical data sources

 

 Beyond creating Hubs, Links and Satellites and current-state (Type 1) views off the Data Vault, one of the most common requirements is the ability to represent a complete history of changes for a specific business entity (Hub, Link or groups of those). If a given Hub has on average 3 or 4 Satellites, is it useful at the very least to see the full history of changes for that specific Hub across all Satellites. How to...

1

Tech tip: making SSIS Project Connections generate correctly using BIML Express

 

 A bit more of a technical view on things today. In order to stay up to date with the latest when it comes to generating ETL for the Microsoft stack (SSIS), I recently upgraded from Visual Studio 2013 with BIDS Helper 1.6.6. to Visual Studio 2015 with BIML Express. And this means a lot of regression testing for years and years of increasingly complex BIML and C# scripts. As it turns out it wasn’t too...

2

Advanced row condensing for Satellites

 

 When it comes to record condensing, DISTINCT just doesn’t cut it. I’ve been meaning to post about this for ages as the earliest templates (as also posted on this site) were not flexible enough to work in all cases (which is what we strive for). Record condensing Record condensing is making sure that the data delta (differential) you process is a true delta for the specific scope. It’s about making sure no redundant records are processed into...