NoETL – Data Vault Satellite tables
The recent presentations provides a push to wrap up the development and release of the Data Vault virtualisation initiative, so now everything is working properly the next few posts should be relatively quick to produce. First off is the Satellite processing, which supports the typical elements we have seen earlier:
- Regular, composite, concatenated business keys with hashing
- Zero record provision
- Reuse of the objects for ETL purposes if required
As this is another process going from ‘staging’ to ‘integration’ the approach is very similar to the NoETL Hub post from a while ago. The pattern used is the virtual Satellite as explained here. As with the virtual Hubs the really cool concept is that you can use the view to act as a normal table. This is because the information visible in the view is 100% the same as if you were to physically create the table, create the ETL and load the table with data using this ETL process. The view is a true mirror of the end result, hence the virtualisation terminology. And even more interesting: you can use the same object (the view) to act as ETL again so you can literally swap between virtualised and physical without any rework or downtime.
For good reference the example model is shown here as well:
As you can see this example model contains various testcases I used to validate the automation logic including the code built in to the virtualisation tool. Specifically the following are relevant for this post:
- SAT_CUSTOMER (has less attributes than the underlying Staging Area table)
- SAT_INCENTIVE_OFFER (normal, easiest one)
- SAT_MEMBERSHIP_PLAN_DETAIL (one out of two Satellites off the same Hub table)
- SAT_MEMBERSHIP_PLAN_VALUATION (two out of two Satellites off the same Hub table, with a multi-active attribute)
- SAT_SEGMENT (based on a user managed staging table e.g. without true source system)
As with the Hub view, the first thing to do is to configure the required metadata. In this case this covers the mapping between the source and target (Satellite) tables, the interpretation of the Business Key and any filtering criteria.
As an example I have setup the following metadata:
This shows that, for instance, the SAT_CUSTOMER table is sourced from the STG_PROFILER_CUSTOMER_PERSONAL table using the CustomerID source attribute as the (column to be mapped to the) Business Key.
The generated logic is visible below, together with the information that 5 views have been generated as expected.
The views have been directly generated in the database, which now contains 5 view objects for the Satellites. In terms of structure and contents these are identical to the physical tables including end-dating and the optional current record indicator.
This is visible in the screenshot below although there is not enough space to show the various attributes.
Because there is only one data delta available in the PSA there are two records for every Satellite key; the zero record and the normal record. Subsequent deltas in the PSA will add records to this view accordingly. Bear in mind that at present the ideas around zero records have changed slightly (more options are available). In any case this behaves 100% identical to the generated ETL processes we have used for Satellites so rather than going into details about the record output it is probably more interesting to look at the view logic.
I should mention the following two things:
- Kent Graziano commented recently that the Current Row Indicator is better derived using the LEAD statement (similar to the LOAD_END_DATETIME) to improve performance, as RANK is more expensive. I haven’t made this change yet but will update this shortly
- The SELECT DISTINCT in the subquery still needs to be updated to a GROUP BY or similar. This is because DISTINCT doesn’t necessarily condense timelines properly. The idea is that removing attributes from the selection (e.g. when only a few of the available attributes are mapped to the Satellite) requires timelines to be condensed in this pass and DISTINCT is not sufficient to achieve this. When I have some time intend to update this as well.
One of the really cool features of this concept is that now the view is there and ready to be used for upstream interpretation, you can also use the same view to act as an ETL process. For this purpose I created the INSERT INTO feature in the tool. This generates an INSERT INTO statement that selects from the view, while checking if information is not inserted more than once.
There you go: the Virtual Satellite and ETL generated from metadata and supported using a single database object!