Data Vault ETL Implementation using SSIS: Step 5 – Satellite ETL – part 3 – End Dating

For various reasons including (but not limited to) ease of maintenance, speed, reusability, parallelism and reducing of complexity in general, End Dating is implemented as a stand-alone generic ETL process to support the Data Vault Satellite ETL.

Conceptually, End Dating is a redundant step because the information is derived from already available information: the Effective Date. However, in most situations it makes upstream ETL processing easier to manage and may be worth the storage/performance trade-off. But, because its redundant nature End Dating has to be considered as an optional step and therefore needs to be designed to be removed (or optional) if required.

End Dating in this context is defined as the process that ‘closes off’ changed records by setting the Expiry/End Date and sometimes even an Active Record Indicator (if implemented). The End Dating process is defined for each Satellite entity and is therefore generic; it can be shared between the various ETL workflows / batches if for some reason a Satellite entity is populated from different data sets (exceptions, but it can happen). It goes without saying that it needs to be run after the regular Satellite ETL process.

In SSIS the End Dating process for Data Vault Satellites is created as a dedicated Package with an Execute SQL Task. Following standard ETL conventions it requires the same logging of records as the other processes do which means updated records will be flagged with the Process ID of the (instance of the) job that actually did the update. Earlier ventures to define End Dating were based on using a normal Data Flow, but this creates a row based operation which is quite performance intensive for SSIS to handle. Using an Execute SQL Task forces execution on database level which is set based. Thanks to SSIS’ ability to pass parameters into the Execute SQL Task we are able to provide the Process ID information into this SQL before execution, so it will be a relatively efficient task to run (for a redundant step, that is).

In any case it is a very straightforward process with only one operation in the SSIS Control Flow:

Satellite End Dating

All the logic is buried in the SQL query which identifies any record that needs to be end-dated and also does the corresponding updates; the selected Expiry Date (the Effective Date from the next record). This means that this process can be run at any time and closes off all records which need to be closed. An example SQL is provided below. Please note the ‘?’ symbols are mapped to parameters via the ‘Parameter Mapping’ in the Execute SQL Task properties. Also, in my view of the world the high end date/times are always set to ‘9999-12-31’ so the logic is built around that. These high dates are inserted by the regular Satellite ETL and already in place from the perspective of End Dating.

With MyCTE AS (
SELECT
A.<Hub Key>,
-- Add additional key for multi-active satellites here!
A.<EFFECTIVE_DATETIME>,
B.<EFFECTIVE_DATETIME> AS EXPIRY_DATETIME
FROM
<Satellite table> A
INNER JOIN <Satellite table> B
ON A.<Hub Key>=B.<Hub Key>
-- Add additional join condition for multi-active satellites here!
AND B.EFFECTIVE_DATE =
(
SELECT TOP 1 EFFECTIVE_DATETIME
FROM <Satellite table> C
WHERE A.<Hub Key> = C.<Hub Key>
-- Add additional where clauses for multi-active satellites here!
AND C.EFFECTIVE_DATETIME > EFFECTIVE_DATETIME
)
JOIN
(
SELECT
<Hub Key>
-- Add additional key for multi-active satellites
FROM <Satellite table>
WHERE EXPIRY_DATETIME = '9999-12-31'
GROUP BY
<Hub Key
-- Add additional key in grouping condition for multi-active satellites
HAVING COUNT(*) > 1
) dupquery ON dupquery.<Hub Key>=A.<Hub Key>
-- AND additional key for multi-active satellites
AND A.EXPIRY_DATETIME='9999-12-31'
)
UPDATE A
SET A.EXPIRY_DATETIME = B.EXPIRY_DATETIME,
A.ETL_PROCESS_ID = ?, -- If required
A.ACTIVE_RECORD_INDICATOR = 'N' -- If required
FROM <Satellite table>
INNER JOIN MyCTE B ON A.<Hub Key>=B.<Hub Key>
AND A.EFFECTIVE_DATETIME = B.EFFECTIVE_DATETIME
 
Roelant Vos

Roelant Vos

You may also like...

1 Response

  1. April 18, 2016

    […] the same reasons as outlined for the SSIS Satelllite End Dating mechanisms, Link Satellites End Dating is always implemented as a stand-alone generic ETL process. […]

     

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.