Easy generation of Data Vault Satellites using standard SQL

This new entry in the ‘simple ETL generation series‘ leaves the Data Vault Hub concepts behind. Much of the logic shared in earlier Hub ETL generation posts will be reused for the purpose of generating a new type of ETL process: loading a Data Vault Satellite table.

In Data Vault, a Satellite provides context for the Core Business Concept (CBC) – embodied by the Hub and its Business Key. Or simply put: a Satellite contains the descriptive information (the attributes) that describe a Hub / Business Key.

Technically, the Hub Surrogate Key acts as a Foreign Key in the corresponding Satellite table. As a side note: this Foreign Key constraint is usually not enforced. A Hub table can refer to many Satellites.

The Hub Surrogate Key is in many cases a ‘hashed’ version of the Business Key, but can also be the Business Key itself, usually as a concatenated key to guarantee uniqueness (see the article on the Natural Business Key). In any case, because the Satellite ‘inherits’ the Surrogate Key as a Foreign key from the Hub it is no surprise that a lot of the generation logic required to generate the Business Key for the Satellite loading process can be reused from the Hub generation code.

The generation of the Business Key was one of the hardest parts to automate so far (see generating complex Business Keys). Thankfully, this evaluation is essentially the same for the Satellite as for the Hub and can pretty much be copied-and-pasted.

The Satellite pattern

At a high level, the Satellite load process pattern is as follows:

  • Select the distinct set of applicable attributes. An explanation of what ‘distinct’ means in this context can be read in the ‘when is a change a change?‘ paper about row compacting / condensing.
  • Calculate full rows checksums for change detection and derive the data delta.
  • Lookup the Business Key. This is meant as ‘functionally’. If you distribute Data Warehouse keys as sequence values in your Hub then a lookup is required, otherwise (i.e. using Hash Keys or Natural Business Keys) the ‘lookup’ is inferred.
  • Map the attributes (source-to-target).

I always add a ‘prevent reprocessing’ check to the Satellite pattern as well because this meets an, in my view, fundamental requirement for ETL processes: a process should detect whether it can insert records or not, i.e. it should not fail on constraints.

This enables the ability for an ETL process to avoid inserting incorrect (duplicate) values by preventing loading the same data set multiple times. While this seems a fairly inconspicuous and perhaps redundant function, it makes administering the overall data solution very easy later on.

I highly recommend it for any load process targeting a time-variant (historised) table such as a Satellite, Dimension or Persistent Staging table.

In the context of the Satellite pattern this requirement translates as ‘Lookup the Business Key and Load Date / Time Stamp to assert whether this combination of values already exists in the the target, and ignore if it does’. I have mentioned this separately form the ‘standard’ bullet points as this is a personal preference that I like to add to my interpretation of the load pattern.

What metadata do I need?

Two sets of metadata are needed to generate Satellite load processes:

  • A source-to-target table mapping, including the Business Key definition. This is essentially the same metadata as is required for a Hub (except that the target is now a Satellite table). If you use the TEAM metadata this is available from the [interface].[INTERFACE_SOURCE_SATELLITE_XREF] view.
  • The individual attribute mappings for this particular source-to-target table mapping. If you use the TEAM metadata this is available from the [interface].[INTERFACE_SOURCE_SATELLITE_ATTRIBUTE_XREF] view.

The metadata views and sample data can be generated using the free TEAM application for metadata management. The ‘repository’ menu option allow the creation of a metadata repository including sample data. This is the same data that has been used for this Satellite pattern. Within TEAM, these are interface views that expose the metadata for ETL generation.

For the purpose of sharing these examples in a way that doesn’t require to have TEAM installed I have stored the content of these views in corresponding tables, and scripted them for download.

The Satellite generation process & metadata

As with earlier posts on these topics I have used an UML Activity Diagram to explain what is happening. The diagram contains the parameters and variables used in the generation process, and also shows the metadata queries that are provide the input.

As you can see, the Satellite process includes an ‘outer’ loop that retrieves the table mapping, as well as two ‘inner’ loops.

The first inner loop retrieves the attribute mappings that belong to the table mapping and constructs the SELECT and alias (‘AS’) sections of the target query.

The second inner loop evaluates the Business Key by deconstructing the key components. This process supports concatenated keys, composite keys and hard-coded values. This part of the logic constructs the Surrogate Key and type-casts concerning the attributes required for the Business Key.

Running the Satellite ETL generation process

This process can be implemented as a simple SQL cursor that iterates over the INTERFACE_SOURCE_SATELLITE_XREF metadata row set and generates a Satellite load process for each row. The source table and target table, including their corresponding Business Key definitions are retrieved directly from each row in the metadata set.

The Surrogate Key name is also available from metadata directly. This is a change from earlier Hub generation processes, where conventions were used to derive the name of the Surrogate Key. Both approaches can be used interchangeably based on personal preferences.

The SQL script at the bottom of this post (in the ‘code’ section) can be used to run the examples. The code is basically a big cursor (loop) over the table mapping metadata and within this a smaller loop to add the attribute mappings. The Business Key evaluation logic is the same as was used for the Hub (the complex Business Key example). Easy!

The resulting SQL can be copy / pasted and executed straight away if the target tables exist in the database. As always, these examples can also be generated using TEAM. The generated SQL will load the target Satellite tables from the designated sources. If you only want to see the SELECT component the initial INSERT INTO statement can be removed from the template.

An example of the output is listed below:

-- Working on mapping to SAT_CUSTOMER from source table STG_PROFILER_CUSTOMER_PERSONAL

USE [200_Integration_Layer];

INSERT INTO [200_Integration_Layer].[dbo].[SAT_CUSTOMER]
(
  [CUSTOMER_HSH],
  [HASH_FULL_RECORD],
  [LOAD_DATETIME],
  [LOAD_END_DATETIME],
  [CURRENT_RECORD_INDICATOR],
  [ETL_INSERT_RUN_ID],
  [ETL_UPDATE_RUN_ID],
  [RECORD_SOURCE],
  [CDC_OPERATION],
  [SOURCE_ROW_ID],
  [COUNTRY],
  [DATE_OF_BIRTH],
  [GENDER],
  [GIVEN_NAME],
  [POSTCODE],
  [REFERRAL_OFFER_MADE_INDICATOR],
  [Suburb],
  [SURNAME]
)
SELECT main.* FROM (
SELECT
  HASHBYTES('MD5',
    ISNULL(RTRIM(CONVERT(NVARCHAR(100),[CUSTOMER_ID])),'NA')+'|' 
  ) AS [CUSTOMER_HSH],
  HASHBYTES('MD5',
    ISNULL(RTRIM(CONVERT(NVARCHAR(100),[CDC_OPERATION])),'NA')+'|'+
       ISNULL(RTRIM(CONVERT(NVARCHAR(100),[COUNTRY])),'NA')+'|' +
       ISNULL(RTRIM(CONVERT(NVARCHAR(100),[DOB])),'NA')+'|' +
       ISNULL(RTRIM(CONVERT(NVARCHAR(100),[GENDER])),'NA')+'|' +
       ISNULL(RTRIM(CONVERT(NVARCHAR(100),[Given])),'NA')+'|' +
       ISNULL(RTRIM(CONVERT(NVARCHAR(100),[POSTCODE])),'NA')+'|' +
       ISNULL(RTRIM(CONVERT(NVARCHAR(100),[Referee_Offer_Made])),'NA')+'|' +
       ISNULL(RTRIM(CONVERT(NVARCHAR(100),[Suburb])),'NA')+'|' +
       ISNULL(RTRIM(CONVERT(NVARCHAR(100),[SURNAME])),'NA')+'|' 
  ) AS [HASH_FULL_RECORD],
  DATEADD(mcs,[SOURCE_ROW_ID],[LOAD_DATETIME]) AS [LOAD_DATETIME],
  '9999-12-31'[LOAD_END_DATETIME],
  'Y'[CURRENT_RECORD_INDICATOR],
  -1 AS [ETL_INSERT_RUN_ID],
  -1 AS [ETL_UPDATE_RUN_ID],
  [RECORD_SOURCE],
  [CDC_OPERATION],
  [SOURCE_ROW_ID],
  [COUNTRY] AS [COUNTRY],
  [DOB] AS [DATE_OF_BIRTH],
  [GENDER] AS [GENDER],
  [Given] AS [GIVEN_NAME],
  [POSTCODE] AS [POSTCODE],
  [Referee_Offer_Made] AS [REFERRAL_OFFER_MADE_INDICATOR],
  [Suburb] AS [Suburb],
  [SURNAME] AS [SURNAME]
FROM
(
 SELECT 
    [LOAD_DATETIME],
    [RECORD_SOURCE],
    [CDC_OPERATION],
    [SOURCE_ROW_ID],
    [EVENT_DATETIME],
    [COMBINED_VALUE],
    [CUSTOMER_ID],
    [COUNTRY],
    [DOB],
    [GENDER],
    [Given],
    [POSTCODE],
    [Referee_Offer_Made],
    [Suburb],
    [SURNAME],
    CASE 
      WHEN LAG([COMBINED_VALUE],1,'N/A') OVER (PARTITION BY 
     [CUSTOMER_ID]
       ORDER BY [LOAD_DATETIME] ASC, [EVENT_DATETIME] ASC, [CDC_OPERATION] DESC) = [COMBINED_VALUE]
      THEN 'Same' ELSE 'Different'
    END AS [VALUE_CHANGE_INDICATOR],
    CASE WHEN LAG([CDC_OPERATION],1,'') OVER (PARTITION BY
     [CUSTOMER_ID]
     ORDER BY [LOAD_DATETIME] ASC, [EVENT_DATETIME] ASC, [CDC_OPERATION] ASC) = [CDC_OPERATION]
      THEN 'Same' ELSE 'Different'
      END AS [CDC_CHANGE_INDICATOR],
    CASE WHEN LEAD([LOAD_DATETIME],1,'9999-12-31') OVER (PARTITION BY 
     [CUSTOMER_ID]
       ORDER BY [LOAD_DATETIME] ASC, [EVENT_DATETIME] ASC, [CDC_OPERATION] ASC)= [LOAD_DATETIME]
      THEN 'Same' ELSE 'Different'
    END AS [TIME_CHANGE_INDICATOR]
FROM (
SELECT
    [LOAD_DATETIME],
    [EVENT_DATETIME],
    [RECORD_SOURCE],
    [SOURCE_ROW_ID],
    [CDC_OPERATION],
    CAST(CustomerID AS NVARCHAR(100)) AS CUSTOMER_ID,
    [COUNTRY],
    [DOB],
    [GENDER],
    [Given],
    [POSTCODE],
    [Referee_Offer_Made],
    [Suburb],
    [SURNAME],
    CONVERT(CHAR(32),HASHBYTES('MD5',
       ISNULL(RTRIM(CONVERT(NVARCHAR(100),[COUNTRY])),'NA')+'|' +
       ISNULL(RTRIM(CONVERT(NVARCHAR(100),[DOB])),'NA')+'|' +
       ISNULL(RTRIM(CONVERT(NVARCHAR(100),[GENDER])),'NA')+'|' +
       ISNULL(RTRIM(CONVERT(NVARCHAR(100),[Given])),'NA')+'|' +
       ISNULL(RTRIM(CONVERT(NVARCHAR(100),[POSTCODE])),'NA')+'|' +
       ISNULL(RTRIM(CONVERT(NVARCHAR(100),[Referee_Offer_Made])),'NA')+'|' +
       ISNULL(RTRIM(CONVERT(NVARCHAR(100),[Suburb])),'NA')+'|' +
       ISNULL(RTRIM(CONVERT(NVARCHAR(100),[SURNAME])),'NA')+'|' 
    ),2) AS COMBINED_VALUE
  FROM [100_Staging_Area].[dbo].[STG_PROFILER_CUSTOMER_PERSONAL]
  ) sub
) combined_value
WHERE ([VALUE_CHANGE_INDICATOR] ='Different' AND [CDC_OPERATION] IN ('Insert', 'Change'))
OR ([CDC_CHANGE_INDICATOR] = 'Different' AND [TIME_CHANGE_INDICATOR] = 'Different')) main
LEFT OUTER JOIN [200_Integration_Layer].[dbo].[SAT_CUSTOMER] sat
 ON sat.[CUSTOMER_HSH] = main.[CUSTOMER_HSH]
AND sat.[LOAD_DATETIME] = main.[LOAD_DATETIME]
WHERE sat.[CUSTOMER_HSH] IS NULL

The code

Since the SQL has become fairly extensive I have moved it here to the bottom of the post. The code can be copy / pasted and run or simply downloaded from the link below.

USE []

-- Parameters
DECLARE @targetDatabase				VARCHAR(100) = '[Integration_Layer]';
DECLARE @targetSchema				VARCHAR(100) = '[dbo]';
DECLARE @sourceDatabase				VARCHAR(100) = '[Staging_Area]';
DECLARE @sourceSchema				VARCHAR(100) = '[dbo]';
DECLARE @loadDateTimeAttribute			VARCHAR(100) = '[LOAD_DATETIME]';
DECLARE @etlProcessIdAttribute			VARCHAR(100) = '[ETL_INSERT_RUN_ID]';
DECLARE @etlProcessIdUpdateAttribute		VARCHAR(100) = '[ETL_UPDATE_RUN_ID]';
DECLARE @recordSourceAttribute			VARCHAR(100) = '[RECORD_SOURCE]';
DECLARE @loadEndDateTimeAttribute		VARCHAR(100) = '[LOAD_END_DATETIME]';
DECLARE @currentRecordIndicatorAttribute	VARCHAR(100) = '[CURRENT_RECORD_INDICATOR]';
DECLARE @eventDateTimeAttribute			VARCHAR(100) = '[EVENT_DATETIME]';
DECLARE @checksumAttribute			VARCHAR(100) = '[HASH_FULL_RECORD]';
DECLARE @cdcAttribute				VARCHAR(100) = '[CDC_OPERATION]';
DECLARE @sourceRowIdAttribute			VARCHAR(100) = '[SOURCE_ROW_ID]';

-- Variables input / metadata (from the metadata database)
DECLARE @targetTable				VARCHAR(100);	-- The Satellite (target) table name
DECLARE @sourceTable				VARCHAR(100);	-- The source table name
DECLARE @sourceBusinessKey			VARCHAR(MAX);
DECLARE @sourceAttributeName		        VARCHAR(MAX);	-- A local variable for use in the attribute cursor
DECLARE @targetAttributeName			VARCHAR(MAX);	-- A local variable for use in the attribute cursor

-- Variabels local / helper
DECLARE @pattern				VARCHAR(MAX);	-- The complete selection / generated output 
DECLARE @targetHashKeyName			VARCHAR(100);	-- The derived name of the Hash Key
DECLARE @hubHashKeySelectPart			VARCHAR(MAX);   -- The selection inside the hashbytes function (to define the Hash Key)
DECLARE @keyPartSource				VARCHAR(100);	-- The key component (source name), for composite / concatenated keys
DECLARE @keyPartTarget				VARCHAR(100);	-- The target key component, in case of composite or concatenated keys
DECLARE @keyPartConvertPart			VARCHAR(100);	-- The key component block that needs to be VARCHAR for lead/lag purposes in the row condensing
DECLARE @attributeSelectPartSource		VARCHAR(MAX);	-- The listing of all in-scope attributes for the SELECT statement (inserted as a block)
DECLARE @attributeSelectPartTarget	        VARCHAR(MAX);	-- The listing of all in-scope attributes for the SELECT statement (inserted as a block)
DECLARE @attributeSelectPartSourceAsTarget	VARCHAR(MAX);	-- The listing of all in-scope attributes for the SELECT statement (inserted as a block)
DECLARE @attributesBasePart			VARCHAR(MAX);	-- The listing of attributes without the business key for change evaluation (combined value)
DECLARE @attributeChecksumPart			VARCHAR(MAX);	-- The constructed hashbytes for all attributes (inserted as a block)
DECLARE @keyPartSelectPart			VARCHAR(MAX);	-- The listing of all business key (part) attributes for the SELECT statement (inserted as a block)


-- The cursor is designed to 'pull' into the target Satellite, as opposed to 'push' from the source. This provides better control of which metadata gets used.
-- This cursor is the main / 'outer' cursor which cycles through the tables and creates the source-to-staging pattern
DECLARE sat_cursor CURSOR FOR   
  SELECT [SOURCE_NAME], [TARGET_NAME], [SOURCE_BUSINESS_KEY_DEFINITION], [SURROGATE_KEY]
  FROM [interface].[INTERFACE_SOURCE_SATELLITE_XREF]
  WHERE [TARGET_TYPE] = 'Normal'-- AND [TARGET_NAME]='SAT_MEMBERSHIP_PLAN_VALUATION'

OPEN sat_cursor  

FETCH NEXT FROM sat_cursor   
INTO @sourceTable, @targetTable, @sourceBusinessKey, @targetHashKeyName

WHILE @@FETCH_STATUS = 0  
BEGIN

	--Clear out local variables where required for each new iteration
	SET @attributeSelectPartSource='';
	SET @attributeSelectPartTarget='';
	SET @attributeSelectPartSourceAsTarget='';
	SET @attributesBasePart='';
	SET @attributeChecksumPart='';
	SET @hubHashKeySelectPart='';
	SET @keyPartConvertPart='';
	SET @keyPartSelectPart='';

	--1st inner Cursor: retrieve the attributes (from the target table)
	DECLARE attribute_cursor CURSOR FOR
		SELECT [SOURCE_ATTRIBUTE_NAME], [TARGET_ATTRIBUTE_NAME]
		FROM [interface].[INTERFACE_SOURCE_SATELLITE_ATTRIBUTE_XREF]
		WHERE [TARGET_NAME] = ''+@targetTable+''
		AND [SOURCE_NAME] = ''+@sourceTable+''

	OPEN attribute_cursor
	  
	FETCH NEXT FROM attribute_cursor INTO @sourceAttributeName, @targetAttributeName

	WHILE @@FETCH_STATUS = 0  
	BEGIN  

	    -- Construct the block of attributes for use in SELECT statements
		SET @attributeSelectPartSource = @attributeSelectPartSource+'    ['+@sourceAttributeName+'],'+CHAR(13);
		SET @attributesBasePart = @attributesBasePart+'    ['+@sourceAttributeName+'],'+CHAR(13);
		SET @attributeSelectPartTarget = @attributeSelectPartTarget+'  ['+@targetAttributeName+'],'+CHAR(13);

		-- Construct the checksum across all attributes
		SET @attributeChecksumPart = @attributeChecksumPart+'       ISNULL(RTRIM(CONVERT(NVARCHAR(100),['+ @sourceAttributeName +'])),''NA'')+''|'' +'+CHAR(13);

		-- Create the aliases
		SET @attributeSelectPartSourceAsTarget = @attributeSelectPartSourceAsTarget+'  ['+@sourceAttributeName+'] AS ['+@targetAttributeName+'],'+CHAR(13);

		FETCH NEXT FROM attribute_cursor INTO @sourceAttributeName, @targetAttributeName
	END  

	CLOSE attribute_cursor;  
	DEALLOCATE attribute_cursor;  
	--End of attribute cursor

	-- 2nd inner cursor - understand the key configuration
	DECLARE keypart_cursor CURSOR FOR
	WITH [MAINQUERY] AS
	(
	SELECT
		[TARGET_NAME],
		[SOURCE_NAME],
		[BUSINESS_KEY_COMPOSITION],
		CASE
			WHEN [BUSINESS_KEY_COMPOSITION]='Concatenate' THEN REPLACE(SOURCE_BUSINESS_KEY_DEFINITION,',','+')
			ELSE [SOURCE_BUSINESS_KEY_DEFINITION]
		END AS [SOURCE_BUSINESS_KEY_DEFINITION],
		[TARGET_BUSINESS_KEY_DEFINITION]
	FROM 
	(
	  SELECT 
	    [TARGET_NAME],
	    [SOURCE_NAME],
	    CASE 
	  	WHEN CHARINDEX('COMPOSITE(',[SOURCE_BUSINESS_KEY_DEFINITION], 1) > 0 THEN 'Composite' 
	  	WHEN CHARINDEX('CONCATENATE(',[SOURCE_BUSINESS_KEY_DEFINITION], 1) > 0 THEN 'Concatenate' 
	  	ELSE 'Regular'
	    END AS [BUSINESS_KEY_COMPOSITION],
	    REPLACE(
	  	REPLACE(
	  		REPLACE(
	  			REPLACE([SOURCE_BUSINESS_KEY_DEFINITION],'COMPOSITE(','')
	  		,'CONCATENATE(','')
	  	,')','')
	    ,';',',') AS [SOURCE_BUSINESS_KEY_DEFINITION], -- Strip out any metadata information i.e. classification, commas and brackets
	    [TARGET_BUSINESS_KEY_DEFINITION]
	  FROM interface.[INTERFACE_SOURCE_SATELLITE_XREF]
	  WHERE [TARGET_NAME] = ''+@targetTable+'' AND [SOURCE_NAME] = ''+@sourceTable+'' AND [SOURCE_BUSINESS_KEY_DEFINITION] = ''+@sourceBusinessKey+''
	) sub
	-- Define the source business key as XML
	), [SOURCEKEY] AS
	(   
	SELECT 
	  [TARGET_NAME],
	  [SOURCE_NAME],
	  [BUSINESS_KEY_COMPOSITION],
	  [SOURCE_BUSINESS_KEY_DEFINITION], 
	  CAST ('<M>' + REPLACE([SOURCE_BUSINESS_KEY_DEFINITION], ',', '</M><M>') + '</M>' AS XML) AS [BUSINESS_KEY_SOURCE_XML]
	FROM [MAINQUERY] 
	-- Define the target business key as XML
	), [TARGETKEY] AS
	(     
	SELECT 
	  [TARGET_NAME],
	  [SOURCE_NAME],
	  [BUSINESS_KEY_COMPOSITION],
	  [TARGET_BUSINESS_KEY_DEFINITION],
	  CAST ('<M>' + REPLACE([TARGET_BUSINESS_KEY_DEFINITION], ',', '</M><M>') + '</M>' AS XML) AS [BUSINESS_KEY_TARGET_XML]
	FROM [MAINQUERY] 
	-- Break up the source business key in parts to support composite keys
	), [SOURCEKEYPARTS] AS
	( 
	SELECT 
	  [TARGET_NAME],
	  [SOURCE_NAME],
	  [BUSINESS_KEY_COMPOSITION],
	  LTRIM(Split.a.value('.', 'VARCHAR(100)')) AS [BUSINESS_KEY_PART],
	  ROW_NUMBER() OVER (PARTITION BY [TARGET_NAME], [SOURCE_NAME] ORDER BY (SELECT 100)) AS [ROW_NR]
	FROM [SOURCEKEY]
	CROSS APPLY [SOURCEKEY].[BUSINESS_KEY_SOURCE_XML].nodes ('/M') AS Split(a)
	-- Break up the target business key to match the composite keys on ordinal position
	), [TARGETKEYPARTS] AS
	( 
	SELECT
	  [TARGET_NAME],
	  [SOURCE_NAME],
	  [BUSINESS_KEY_COMPOSITION],
	  LTRIM(Split.a.value('.', 'VARCHAR(100)')) AS [BUSINESS_KEY_PART],
	  ROW_NUMBER() OVER (PARTITION BY [TARGET_NAME], [SOURCE_NAME] ORDER BY (SELECT 100)) AS [ROW_NR]
	FROM [TARGETKEY]
	CROSS APPLY [TARGETKEY].[BUSINESS_KEY_TARGET_XML].nodes ('/M') AS Split(a)
	)
	SELECT
	  SOURCEKEYPARTS.BUSINESS_KEY_PART AS SOURCE_BUSINESS_KEY_PART
	 ,TARGETKEYPARTS.BUSINESS_KEY_PART AS TARGET_BUSINESS_KEY_PART
	FROM SOURCEKEYPARTS
	JOIN TARGETKEYPARTS ON SOURCEKEYPARTS.[TARGET_NAME]=TARGETKEYPARTS.[TARGET_NAME] AND SOURCEKEYPARTS.[SOURCE_NAME] = TARGETKEYPARTS.[SOURCE_NAME]
	WHERE SOURCEKEYPARTS.ROW_NR = TARGETKEYPARTS.ROW_NR

	OPEN keypart_cursor
	  
	FETCH NEXT FROM keypart_cursor INTO @keyPartSource, @keyPartTarget
	
	WHILE @@FETCH_STATUS = 0  
	BEGIN  

	SET @hubHashKeySelectPart = @hubHashKeySelectPart+'    ISNULL(RTRIM(CONVERT(NVARCHAR(100),['+ @keyPartTarget +'])),''NA'')+''|'' +'+CHAR(13);

	SET @keyPartConvertPart = @keyPartConvertPart + '    CAST('+@keyPartSource+' AS NVARCHAR(100)) AS '+@keyPartTarget+','+CHAR(13);

	SET @keyPartSelectPart = @keyPartSelectPart + '     [' + @keyPartTarget+'],'+CHAR(13);

	    -- Remove the business key from the attribute base array
	SET @attributesBasePart = REPLACE(@attributesBasePart,@keyPartSource,''); 

	FETCH NEXT FROM keypart_cursor   
	INTO  @keyPartSource, @keyPartTarget
	END  

	CLOSE keypart_cursor;  
	DEALLOCATE keypart_cursor;  
	--End of key part cursor

	-- Remove trailing commas and delimiters	
	--SET @attributeSelectPartSource = LEFT(@attributeSelectPartSource,DATALENGTH(@attributeSelectPartSource)-2)+CHAR(13);
	SET @attributeSelectPartTarget = LEFT(@attributeSelectPartTarget,DATALENGTH(@attributeSelectPartTarget)-2)+CHAR(13);
	SET @attributeSelectPartSourceAsTarget = LEFT(@attributeSelectPartSourceAsTarget,DATALENGTH(@attributeSelectPartSourceAsTarget)-2)+CHAR(13);
	SET @hubHashKeySelectPart = LEFT(@hubHashKeySelectPart,DATALENGTH(@hubHashKeySelectPart)-2)+CHAR(13);
	SET @attributeChecksumPart = LEFT(@attributeChecksumPart,DATALENGTH(@attributeChecksumPart)-2)+CHAR(13);
	SET @keyPartSelectPart = LEFT(@keyPartSelectPart,DATALENGTH(@keyPartSelectPart)-2)+CHAR(13);

	-- Source to Staging Full Outer Join Pattern
 	SET @pattern = '-- Working on mapping to ' +  @targetTable + ' from source table ' + @sourceTable+CHAR(13)+CHAR(13);
	SET @pattern = @pattern+'USE '+@targetDatabase+';'+CHAR(13)+CHAR(13);
	SET @pattern = @pattern+'INSERT INTO '+@targetDatabase+'.'+@targetSchema+'.['+@targetTable+']'+CHAR(13);
	SET @pattern = @pattern+'('+CHAR(13)
	SET @pattern = @pattern+'  ['+@targetHashKeyName+'],'+CHAR(13);	
	SET @pattern = @pattern+'  '+@checksumAttribute+','+CHAR(13);
	SET @pattern = @pattern+'  '+@loadDateTimeAttribute+','+CHAR(13);
	SET @pattern = @pattern+'  '+@loadEndDateTimeAttribute+','+CHAR(13);
	SET @pattern = @pattern+'  '+@currentRecordIndicatorAttribute+','+CHAR(13);
	SET @pattern = @pattern+'  '+@etlProcessIdAttribute+','+CHAR(13);
	SET @pattern = @pattern+'  '+@etlProcessIdUpdateAttribute+','+CHAR(13);
	SET @pattern = @pattern+'  '+@recordSourceAttribute+','+CHAR(13);
	SET @pattern = @pattern+'  '+@cdcAttribute+','+CHAR(13);
	SET @pattern = @pattern+'  '+@sourceRowIdAttribute+','+CHAR(13);
	SET @pattern = @pattern+''+@attributeSelectPartTarget;-- Add the attribtes to insert into the target table
	SET @pattern = @pattern+')'+CHAR(13); 

	-- Outer selection
	SET @pattern = @pattern+'SELECT main.* FROM ('+CHAR(13);

	-- Start of the SELECT statement
	SET @pattern = @pattern+'SELECT'+CHAR(13);
	SET @pattern = @pattern+'  HASHBYTES(''MD5'','+CHAR(13);
	SET @pattern = @pattern+''+@hubHashKeySelectPart;
	SET @pattern = @pattern+'  ) AS ['+@targetHashKeyName+'],'+CHAR(13);
	SET @pattern = @pattern+'  HASHBYTES(''MD5'','+CHAR(13);
	SET @pattern = @pattern+'    ISNULL(RTRIM(CONVERT(NVARCHAR(100),'+@cdcAttribute+')),''NA'')+''|''+'+CHAR(13);
	SET @pattern = @pattern+@attributeChecksumPart;
	SET @pattern = @pattern+'  ) AS '+@checksumAttribute+','+CHAR(13);
	SET @pattern = @pattern+'  DATEADD(mcs,'+@sourceRowIdAttribute+','+@loadDateTimeAttribute+') AS '+@loadDateTimeAttribute+','+CHAR(13); 
	SET @pattern = @pattern+' '+' ''9999-12-31'''+@loadEndDateTimeAttribute+','+CHAR(13);
	SET @pattern = @pattern+' '+' ''Y'''+@currentRecordIndicatorAttribute+','+CHAR(13);
	SET @pattern = @pattern+'  -1 AS '+@etlProcessIdAttribute+','+CHAR(13);
	SET @pattern = @pattern+'  -1 AS '+@etlProcessIdUpdateAttribute+','+CHAR(13);
	SET @pattern = @pattern+'  '+@recordSourceAttribute+','+CHAR(13);
	SET @pattern = @pattern+'  '+@cdcAttribute+','+CHAR(13);
	SET @pattern = @pattern+'  '+@sourceRowIdAttribute+','+CHAR(13);
	SET @pattern = @pattern+''+@attributeSelectPartSourceAsTarget;
    SET @pattern = @pattern+'FROM'+CHAR(13);
	-- Start of the first sub query
	SET @pattern = @pattern+'('+CHAR(13);
	SET @pattern = @pattern+' SELECT '+CHAR(13);
	SET @pattern = @pattern+'    '+@loadDateTimeAttribute+','+CHAR(13);
	SET @pattern = @pattern+'    '+@recordSourceAttribute+','+CHAR(13);
	SET @pattern = @pattern+'    '+@cdcAttribute+','+CHAR(13);
	SET @pattern = @pattern+'    '+@sourceRowIdAttribute+','+CHAR(13);
	SET @pattern = @pattern+'    '+@eventDateTimeAttribute+','+CHAR(13);
	SET @pattern = @pattern+'    [COMBINED_VALUE],'+CHAR(13);
	SET @pattern = @pattern+@keyPartSelectPart+',';
	SET @pattern = @pattern+@attributeSelectPartSource;
	SET @pattern = @pattern+'    CASE '+CHAR(13);
	SET @pattern = @pattern+'      WHEN LAG([COMBINED_VALUE],1,''N/A'') OVER (PARTITION BY '+CHAR(13);
	SET @pattern = @pattern+@keyPartSelectPart;
	SET @pattern = @pattern+'       ORDER BY '+@loadDateTimeAttribute+' ASC, '+@eventDateTimeAttribute+' ASC, '+@cdcAttribute+' DESC) = [COMBINED_VALUE]' +CHAR(13);
	SET @pattern = @pattern+'      THEN ''Same'' ELSE ''Different'''+CHAR(13);
	SET @pattern = @pattern+'    END AS [VALUE_CHANGE_INDICATOR],'+CHAR(13);
	SET @pattern = @pattern+'    CASE WHEN LAG('+@cdcAttribute+',1,'''') OVER (PARTITION BY'+CHAR(13);
	SET @pattern = @pattern+@keyPartSelectPart;	
	SET @pattern = @pattern+'     ORDER BY '+@loadDateTimeAttribute+' ASC, '+@eventDateTimeAttribute+' ASC, '+@cdcAttribute+' ASC) = '+@cdcAttribute+''+CHAR(13);
	SET @pattern = @pattern+'      THEN ''Same'' ELSE ''Different'''+CHAR(13);
	SET @pattern = @pattern+'      END AS [CDC_CHANGE_INDICATOR],'+CHAR(13);
	SET @pattern = @pattern+'    CASE WHEN LEAD('+@loadDateTimeAttribute+',1,''9999-12-31'') OVER (PARTITION BY '+CHAR(13);
	SET @pattern = @pattern+@keyPartSelectPart;
	SET @pattern = @pattern+'       ORDER BY '+@loadDateTimeAttribute+' ASC, '+@eventDateTimeAttribute+' ASC, '+@cdcAttribute+' ASC)= '+@loadDateTimeAttribute+''+CHAR(13);
	SET @pattern = @pattern+'      THEN ''Same'' ELSE ''Different'''+CHAR(13);
	SET @pattern = @pattern+'    END AS [TIME_CHANGE_INDICATOR]'+CHAR(13);
	SET @pattern = @pattern+'FROM ('+CHAR(13);
	SET @pattern = @pattern+'SELECT'+CHAR(13);
	SET @pattern = @pattern+'    '+@loadDateTimeAttribute+','+CHAR(13);
	SET @pattern = @pattern+'    '+@eventDateTimeAttribute+','+CHAR(13);
	SET @pattern = @pattern+'    '+@recordSourceAttribute+','+CHAR(13);
	SET @pattern = @pattern+'    '+@sourceRowIdAttribute+','+CHAR(13);
	SET @pattern = @pattern+'    '+@cdcAttribute+','+CHAR(13);
	SET @pattern = @pattern+@keyPartConvertPart;
	SET @pattern = @pattern+''+@attributeSelectPartSource;
	SET @pattern = @pattern+'    CONVERT(CHAR(32),HASHBYTES(''MD5'','+CHAR(13);
	SET @pattern = @pattern+@attributeChecksumPart;
	SET @pattern = @pattern+'    ),2) AS COMBINED_VALUE'+CHAR(13);
	SET @pattern = @pattern+'  FROM '+@sourceDatabase+'.'+@sourceSchema+'.['+@sourceTable+']'+CHAR(13);
	SET @pattern = @pattern+'  ) sub'+CHAR(13);
	SET @pattern = @pattern+') combined_value'+CHAR(13);
	SET @pattern = @pattern+'WHERE ([VALUE_CHANGE_INDICATOR] =''Different'' AND '+@cdcAttribute+' IN (''Insert'', ''Change''))' +CHAR(13);
	SET @pattern = @pattern+'OR ([CDC_CHANGE_INDICATOR] = ''Different'' AND [TIME_CHANGE_INDICATOR] = ''Different'')'
	SET @pattern = @pattern+') main'+CHAR(13);
	SET @pattern = @pattern+'LEFT OUTER JOIN '+@targetDatabase+'.'+@targetSchema+'.['+@targetTable+'] sat'+CHAR(13);
	SET @pattern = @pattern+' ON sat.['+@targetHashKeyName+'] = main.['+@targetHashKeyName+']'+CHAR(13);
	SET @pattern = @pattern+'AND sat.'+@loadDateTimeAttribute+' = main.'+@loadDateTimeAttribute+''+CHAR(13);
	SET @pattern = @pattern+'WHERE sat.['+@targetHashKeyName+'] IS NULL'+CHAR(13);
			  
	-- Spool the pattern to the console
   	PRINT @pattern+CHAR(13);     
		 
    FETCH NEXT FROM sat_cursor   
	INTO @sourceTable, @targetTable, @sourceBusinessKey, @targetHashKeyName
END  
 
CLOSE sat_cursor;  
DEALLOCATE sat_cursor;  

Roelant Vos

Ravos Business Intelligence admin

You may also like...

Leave a Reply