Category: General

Remarks, expressions and thoughts. And everything that does not fit in the other categories :)

0

Some Q&A on Data Warehouse Virtualisation

I receive a fair bit of questions on the Data Warehouse Virtualisation ideas and wanted to respond and discuss this via this post. I don’t have all the answer but can share my views and expectations. When it comes to DWH Virtualisation and the Persistent Staging Area (PSA), the questions generally fall into two categories: Isn’t it too slow? How about performance? Surely users don’t want to wait for hours to see results? Why bother...

 
1

Biml Express 2017 tests, comments and work-arounds

The new version of Biml Express, the free script-based ETL generation plug-in for Visual Studio provided by Varigence, has been out for a few months. Mid-July 2017 to be precise. However only recently I have been able to find some time to properly regression-test this new release against my library of patterns / scripts. The driver is the upcoming Data Modelling Zone event and Data Vault Implementation & Automation training sessions – better keep up...

 
0

Updated sample and metadata models for Data Vault generation and virtualisation

After a bit of a pause in working on the weblog and technology (caused by an extended period of high pressure in the day job) I am once again working on some changes in the various concepts I’m writing about on this site. Recently I was made aware of this great little tool that supports easy creation and sharing of simple data models: Quick Database Diagrams (‘QuickDBD’). The tool is 100% online and can be...

 
0

When a full history of changes is too much: implementing abstraction for Point-In-Time (PIT) and Dimension tables

When changes are just too many When you construct a Point-In-Time (PIT) table or Dimension from your Data Vault model, do you sometimes find yourself in the situation where there are too many change records present? This is because, in the standard Data Vault design, tiny variations when loading data may result in the creation of very small time slices when the various historised data sets (e.g. Satellites) are combined. There is such a thing as too...

 
0

Updated the Data Vault implementation & automation training for 12-14 June in Germany

On the 12th-14th of June I will be delivering the newly styled and updated Data Vault implementation and automation training together with Doerffler & Partner. I am really looking forward to continue the collaboration after last year’s awesome Data Vault Day (organised by Doerffler as well). Working really hard to wrap up the next layer of virtualisation to discuss there and I’m really excited about it: imagine having multiple versions of not only the Data...

 
2

Advanced row condensing for Satellites

When it comes to record condensing, DISTINCT just doesn’t cut it. I’ve been meaning to post about this for ages as the earliest templates (as also posted on this site) were not flexible enough to work in all cases (which is what we strive for). Record condensing Record condensing is making sure that the data delta (differential) you process is a true delta for the specific scope. It’s about making sure no redundant records are processed into...

 
0

Unknown keys (zero keys or ghost keys) in Hubs for DV2.0

I am still working towards capturing the generation (using BIML in SSIS) and virtualisation (using views / SQL) of the Presentation Layer (in a Dimensional Model). But before we get there, some topics need to be addressed first. One of these is the requirement to have ‘unknown’ keys available in the Hubs. Thankfully, this is one of the easiest concepts to implement. The basic idea is that you create a dummy record in the Hub which...

 
0

Data Vault ETL Implementation using SSIS: Step 7 – Link Satellite ETL – part 3 – End Dating

I’m catching up on old drafts within WordPress, and in the spirit of being complete on the older SSIS series felt I should pick this one up and complete it. While most of my focus is on developing the virtualisation concepts I still work a lot with more traditional ETL tools, one of which is Microsoft SSIS. Recently I merged the metadata models that underpin the virtualisation and SSIS automation and I am retesting everything...

 
0

Best practices on developing Data Vault in SQL Server (including SSIS)

Sharing is caring, so today’s post covers some technical details for the Microsoft world: implementing Data Vault models on the SQL Server database and corresponding ETL using SSIS and technologies such as BIML. This is based on experiences gained developing many Data Warehouses (both Data Vault based as well as using other methodologies). Physical modelling (for Data Vault-based Integration Layers): Don’t use clustered indexes on Primary Keys! This is the single biggest tip to be aware of...

 
1

Foreign Keys in the Staging Layer – joining or not?

Warning – this is another post in the ‘options and considerations’ context, meaning that some people will probably disagree with this based on their personal convictions or ideas! One or two Satellites? The case in question is how to handle complexities that may arise if you want to simplify loading by joining tables in the Staging Layer. You may want to do this depending on the design choices made for the source system your are...