Roelant Vos An expert view on Agile Data Warehousing

1

Some insights about … Insights

 

 Can I get some insights, please? Over the years, I have come to somewhat dislike the term ‘insights’ almost to the same level as, say, a ‘Data Lake’. And that’s saying something. Not because these concepts themselves are related that much (they are to some extent, of course). But, because to me personally, they both conjure the same feeling: a mixture of annoyance and desperation. One of the reasons is that since the word ‘insights’...

0

Creating Data Vault Point-In-Time and Dimension tables: merging historical data sources

 

 Beyond creating Hubs, Links and Satellites and current-state (Type 1) views off the Data Vault, one of the most common requirements is the ability to represent a complete history of changes for a specific business entity (Hub, Link or groups of those). If a given Hub has on average 3 or 4 Satellites, is it useful at the very least to see the full history of changes for that specific Hub across all Satellites. How to...

0

Tech tip: making SSIS Project Connections generate correctly using BIML Express

 

 A bit more of a technical view on things today. In order to stay up to date with the latest when it comes to generating ETL for the Microsoft stack (SSIS), I recently upgraded from Visual Studio 2013 with BIDS Helper 1.6.6. to Visual Studio 2015 with BIML Express. And this means a lot of regression testing for years and years of increasingly complex BIML and C# scripts. As it turns out it wasn’t too...

1

Advanced row condensing for Satellites

 

 When it comes to record condensing, DISTINCT just doesn’t cut it. I’ve been meaning to post about this for ages as the earliest templates (as also posted on this site) were not flexible enough to work in all cases (which is what we strive for). Record condensing Record condensing is making sure that the data delta (differential) you process is a true delta for the specific scope. It’s about making sure no redundant records are processed into...

0

Why you really want a Persistent Staging Area in your Data Vault architecture

 

 Recently at the Worldwide Data Vault Conference in Vermont USA (WWDVC) I had many conversations about the Persistent Staging Area (PSA) concept, also known as Historical Staging Area. I have been using this idea for years and really can’t do without it. I would even go as far as saying you really want a PSA in your architecture. However there is a common opinion that having a PSA isn’t the best idea as it introduces a ‘2nd...

0

Unknown keys (zero keys or ghost keys) in Hubs for DV2.0

 

 I am still working towards capturing the generation (using BIML in SSIS) and virtualisation (using views / SQL) of the Presentation Layer (in a Dimensional Model). But before we get there, some topics need to be addressed first. One of these is the requirement to have ‘unknown’ keys available in the Hubs. Thankfully, this is one of the easiest concepts to implement. The basic idea is that you create a dummy record in the Hub which...

0

Data Vault ETL Implementation using SSIS: Step 7 – Link Satellite ETL – part 3 – End Dating

 

 I’m catching up on old drafts within WordPress, and in the spirit of being complete on the older SSIS series felt I should pick this one up and complete it. While most of my focus is on developing the virtualisation concepts I still work a lot with more traditional ETL tools, one of which is Microsoft SSIS. Recently I merged the metadata models that underpin the virtualisation and SSIS automation and I am retesting everything...

0

Best practices on developing Data Vault in SQL Server (including SSIS)

 

 Sharing is caring, so today’s post covers some technical details for the Microsoft world: implementing Data Vault models on the SQL Server database and corresponding ETL using SSIS and technologies such as BIML. This is based on experiences gained developing many Data Warehouses (both Data Vault based as well as using other methodologies). Physical modelling (for Data Vault-based Integration Layers): Don’t use clustered indexes on Primary Keys! This is the single biggest tip to be aware of...

1

Foreign Keys in the Staging Layer – joining or not?

 

 Warning – this is another post in the ‘options and considerations’ context, meaning that some people will probably disagree with this based on their personal convictions or ideas! One or two Satellites? The case in question is how to handle complexities that may arise if you want to simplify loading by joining tables in the Staging Layer. You may want to do this depending on the design choices made for the source system your are...

0

The DWH Time Machine: synchronising model and automation metadata versions

 

 I’ve completed a fairly large body of work that I’ve been meaning to do for a long time: how to automatically version the Data Warehouse data model in sync with the version of the ETL automation metadata. Although versioning models and code is relevant (but rarely implemented) in the traditional ETL area, this requirement to becomes very real when moving to a virtualised Data Warehouse / integrated model approach (Data Vault 2.0 in my case)....