Category: Architecture

This category contains all documents regarding BI, DWH, ETL and Front-end architecture. Also includes datamodelling (Data Vault, Inmon, Kimball).

0

Running SSIS packages continuously without scheduling

No more Batch ETL A few weeks ago I wrote a post about the concept of having continuous execution of ETL individual processes to achieve ‘eventual consistency‘. In that post I made the case to step away from ‘Batch’ execution of ETLs, where related processes are executed as a mini workflow, in favour if fully independent execution of the individual (modular) ETL processes. I have spend some time developing this concept in SQL Server using...

 
2

Beyond ETL Generation & DWH Virtualisation – what’s next?

Bridging the gap At the recent Data Modelling Zone (DMZ) in Germany I presented an overview of the ideas around Data Warehouse Virtualisation and the thought processes leading up to this. In this post I wanted to elaborate on some of these ideas a bit further, as together they can be combined to allow something even more powerful. This post provides an overview of how various (technical) concepts together can help faster delivery of meaningful...

 
1

Embrace your Persistent Staging Area for Eventual Consistency

If you like your PSA so much… A colleague of mine asked me this: ‘if you like the Persistent Staging Area (PSA) concept so much, why not embrace it all the way?’. By this, he meant loading upstream layers such as the Data Vault directly from the PSA instead of from a Staging Area. I was a bit resistant to the idea at first, because this would require incorporation of the PSA as a mandatory...

 
4

Using a Natural Business Key – the end of hash keys?

Do we still need Hash Keys? Now there is a controversial topic! I have been thinking about the need for hash keys for almost a year now, ever since I went to the Data Vault Day in Germany (Hamburg) end of 2016. During this one-day community event, the topic of stepping away from hash keys was raised in one of the discussions after a case study. Both the presentation and following discussion were in German,...

 
2

Some insights about … Insights

Can I get some insights, please? Over the years, I have come to somewhat dislike the term ‘insights’ almost to the same level as, say, a ‘Data Lake’. And that’s saying something. Not because these concepts themselves are related that much (they are to some extent, of course). But, because to me personally, they both conjure the same feeling: a mixture of annoyance and desperation. One of the reasons is that since the word ‘insights’...

 
0

Creating Data Vault Point-In-Time and Dimension tables: merging historical data sources

Beyond creating Hubs, Links and Satellites and current-state (Type 1) views off the Data Vault, one of the most common requirements is the ability to represent a complete history of changes for a specific business entity (Hub, Link or groups of those). If a given Hub has on average 3 or 4 Satellites, is it useful at the very least to see the full history of changes for that specific Hub across all Satellites. How to...

 
2

Advanced row condensing for Satellites

When it comes to record condensing, DISTINCT just doesn’t cut it. I’ve been meaning to post about this for ages as the earliest templates (as also posted on this site) were not flexible enough to work in all cases (which is what we strive for). Record condensing Record condensing is making sure that the data delta (differential) you process is a true delta for the specific scope. It’s about making sure no redundant records are processed into...

 
0

Why you really want a Persistent Staging Area in your Data Vault architecture

Recently at the Worldwide Data Vault Conference in Vermont USA (WWDVC) I had many conversations about the Persistent Staging Area (PSA) concept, also known as Historical Staging Area. I have been using this idea for years and really can’t do without it. I would even go as far as saying you really want a PSA in your architecture. However there is a common opinion that having a PSA isn’t the best idea as it introduces a ‘2nd...

 
0

Staging and interpreting XMLs, BLOBs and similar

Recently I had a couple of conversations about the ‘staging’ (loading data delta into your environment) of multi-structured datatypes such as JSON, XML (and some blobs). Today I had one of these conversations with my esteemed ex-colleagues Bruce and Glenn which made me think some additional information and considerations about the recent post to parse XML using XQuery would be a good idea. These conversations focused on where the interpretation of XML should happen: storing the unmodified (raw) XML...

 
1

NoETL (Not Only ETL) – virtualization revisited

For the last couple of weeks I have been working on a simple tool to support the Data Warehouse virtualisation concepts in practice. This is based on the idea that if you can generate the ETL you need, you can also virtualise these processes if performance requirements and / or relevant constraints allow for it. This is why I was looking for a way to virtualise where it would be possible (performance wise), and instantiate (generate ETL) where...