Roelant Vos An expert view on Agile Data Warehousing

BIML 0

Tech tip: making SSIS Project Connections generate correctly using BIML Express

 

 A bit more of a technical view on things today. In order to stay up to date with the latest when it comes to generating ETL for the Microsoft stack (SSIS), I recently upgraded from Visual Studio 2013 with BIDS Helper 1.6.6. to Visual Studio 2015 with BIML Express. And this means a lot of regression testing for years and years of increasingly complex BIML and C# scripts. As it turns out it wasn’t too...

Flattening 1

Advanced row condensing for Satellites

 

 When it comes to record condensing, DISTINCT just doesn’t cut it. I’ve been meaning to post about this for ages as the earliest templates (as also posted on this site) were not flexible enough to work in all cases (which is what we strive for). Record condensing Record condensing is making sure that the data delta (differential) you process is a true delta for the specific scope. It’s about making sure no redundant records are processed into...

Rocket_Tanks_Separation 0

Why you really want a Persistent Staging Area in your Data Vault architecture

 

 Recently at the Worldwide Data Vault Conference in Vermont USA (WWDVC) I had many conversations about the Persistent Staging Area (PSA) concept, also known as Historical Staging Area. I have been using this idea for years and really can’t do without it. I would even go as far as saying you really want a PSA in your architecture. However there is a common opinion that having a PSA isn’t the best idea as it introduces a ‘2nd...

Unknown Hub 0

Unknown keys (zero keys or ghost keys) in Hubs for DV2.0

 

 I am still working towards capturing the generation (using BIML in SSIS) and virtualisation (using views / SQL) of the Presentation Layer (in a Dimensional Model). But before we get there, some topics need to be addressed first. One of these is the requirement to have ‘unknown’ keys available in the Hubs. Thankfully, this is one of the easiest concepts to implement. The basic idea is that you create a dummy record in the Hub which...

Data Vault 0

Data Vault ETL Implementation using SSIS: Step 7 – Link Satellite ETL – part 3 – End Dating

 

 I’m catching up on old drafts within WordPress, and in the spirit of being complete on the older SSIS series felt I should pick this one up and complete it. While most of my focus is on developing the virtualisation concepts I still work a lot with more traditional ETL tools, one of which is Microsoft SSIS. Recently I merged the metadata models that underpin the virtualisation and SSIS automation and I am retesting everything...

Microsoft SQL Server and SSIS 0

Best practices on developing Data Vault in SQL Server (including SSIS)

 

 Sharing is caring, so today’s post covers some technical details for the Microsoft world: implementing Data Vault models on the SQL Server database and corresponding ETL using SSIS and technologies such as BIML. This is based on experiences gained developing many Data Warehouses (both Data Vault based as well as using other methodologies). Physical modelling (for Data Vault-based Integration Layers): Don’t use clustered indexes on Primary Keys! This is the single biggest tip to be aware of...

Link 1

Foreign Keys in the Staging Layer – joining or not?

 

 Warning – this is another post in the ‘options and considerations’ context, meaning that some people will probably disagree with this based on their personal convictions or ideas! One or two Satellites? The case in question is how to handle complexities that may arise if you want to simplify loading by joining tables in the Staging Layer. You may want to do this depending on the design choices made for the source system your are...

ETL Automation 0

The DWH Time Machine: synchronising model and automation metadata versions

 

 I’ve completed a fairly large body of work that I’ve been meaning to do for a long time: how to automatically version the Data Warehouse data model in sync with the version of the ETL automation metadata. Although versioning models and code is relevant (but rarely implemented) in the traditional ETL area, this requirement to becomes very real when moving to a virtualised Data Warehouse / integrated model approach (Data Vault 2.0 in my case)....

Virtualisation 0

Virtual Enterprise Data Warehouse ideas & updates (towards 1.2)

 

 Lately I have had a bit more head space to work on some ideas I find interesting, and these are now intended to culminate into ‘version 1.2’ of the Virtual EDW tool I have been developing. I’ve been using this tool extensively for various Data Warehouses and am generally very happy with it as a quick prototyping tool. But what really starts to play up is the requirement for a physical Data Vault (Integration Layer), as...

Driving Key 0

Loading too fast for unique date/time stamps – what to do?

 

 Let’s start by clarifying that this concerns the RDBMS world, not the Hadoop world 😉 It’s a good problem to have – loading data too quickly. So quickly that, even at high precision, multiple changes for the same key end up being inserted with the same Load Date/Time Stamp (LDTS). What happens here? A quick recap: in Data Vault the Load Date/Time Stamp (LDTS, LOAD_DTS, or INSERT_DATETIME) is defined as the moment data is recorded...