Category: ETL

Mappings, code, guidelines and ideas about Extracting, Transformation and Loading.

0

Running SSIS packages continuously without scheduling

No more Batch ETL A few weeks ago I wrote a post about the concept of having continuous execution of ETL individual processes to achieve ‘eventual consistency‘. In that post I made the case to step away from ‘Batch’ execution of ETLs, where related processes are executed as a mini workflow, in favour if fully independent execution of the individual (modular) ETL processes. I have spend some time developing this concept in SQL Server using...

 
1

Embrace your Persistent Staging Area for Eventual Consistency

If you like your PSA so much… A colleague of mine asked me this: ‘if you like the Persistent Staging Area (PSA) concept so much, why not embrace it all the way?’. By this, he meant loading upstream layers such as the Data Vault directly from the PSA instead of from a Staging Area. I was a bit resistant to the idea at first, because this would require incorporation of the PSA as a mandatory...

 
0

When a full history of changes is too much: implementing abstraction for Point-In-Time (PIT) and Dimension tables

When changes are just too many When you construct a Point-In-Time (PIT) table or Dimension from your Data Vault model, do you sometimes find yourself in the situation where there are too many change records present? This is because, in the standard Data Vault design, tiny variations when loading data may result in the creation of very small time slices when the various historised data sets (e.g. Satellites) are combined. There is such a thing as too...

 
2

When is a change a ‘change’?

This is a post that touches on what I think is one the essential best-practices for ETL design: the ability to process multiple changes for the same key in a single pass. This is specifically relevant for typical ETL processes that load data to a time-variant target (PSA, Satellite, Dimension etc.). For non-time variant targets (Hubs, Links etc.) the process is a bit easier as this is essentially built-in the patterns already :-). In a given...

 
1

Tech tip: making SSIS Project Connections generate correctly using BIML Express

A bit more of a technical view on things today. In order to stay up to date with the latest when it comes to generating ETL for the Microsoft stack (SSIS), I recently upgraded from Visual Studio 2013 with BIDS Helper 1.6.6. to Visual Studio 2015 with BIML Express. And this means a lot of regression testing for years and years of increasingly complex BIML and C# scripts. As it turns out it wasn’t too...

 
2

Advanced row condensing for Satellites

When it comes to record condensing, DISTINCT just doesn’t cut it. I’ve been meaning to post about this for ages as the earliest templates (as also posted on this site) were not flexible enough to work in all cases (which is what we strive for). Record condensing Record condensing is making sure that the data delta (differential) you process is a true delta for the specific scope. It’s about making sure no redundant records are processed into...

 
0

NoETL – Data Vault Link tables

Virtualising Data Vault Link structures follows a similar process to that of the virtual Hubs, with some small additions such as the support for (optional) degenerate attributes. To make things a bit more interesting I created some metadata that requires different Business Key ‘types’ so this can be shown and tested in the virtualisation program. For the example in this post I created three Link definitions (the metadata), one of which (LNK_CUSTOMER_COSTING) has a three-way relationship with the following...

 
0

Quick and easy referential integrity validation (for dynamic testing)

This post is in a way related to the recent post about generating some test data. In a similar way I was looking for ways to make life a bit easier when it comes to validating the outputs of Data Vault ETL processes. Some background is provided in an earlier post on the topic of Referential Integrity (RI) specifically in the context of Data Vault 2.0. In short, by adopting the hash key concepts it...

 
1

NoETL – Data Vault Satellite tables

The recent presentations provides a push to wrap up the development and release of the Data Vault virtualisation initiative, so now everything is working properly the next few posts should be relatively quick to produce. First off is the Satellite processing, which supports the typical elements we have seen earlier: Regular, composite, concatenated business keys with hashing Zero record provision Reuse of the objects for ETL purposes if required As this is another process going...

 
1

Quick and easy test data for even more agile development

This post provides a quick and easy way to generate some test records for your unit or system integration testing. The reason I initially started to work on a ‘virtualisation’ approach for Data Vault was to pre-test the ETLs that were generated using whatever automation technique (e.g. SQL, BIML and many others). Using the virtual Data Warehouse – the views I have been posting about recently – you get a representation of the Data Warehouse...