User Defined Properties for ETL Automation – the final piece of the puzzle
At the end of the development efforts to support true Model Driven Design there are some elements you just don’t want to store in another metadata table. But somehow you need specific information which isn’t available in the Data Dictionary either (or directly derivable that way). In my case I wanted to label attributes within a Dimensional Model to behave following specific paradigms for presenting history – Type0, Type1 or Type2 – and essentially use this information to be queried by the ETL automation logic.
The goal is to automate the ETL from a Data Vault model into a typical Star Schema for different use-cases related to time variance. We were already able to generate the rest of the Data Warehouse using the defined patterns for Data Vault and any sourcing patterns we encountered.
The best solution in this scenario is to extend the table and/or column definition in the database with configurable elements, which can be defined as part of the modelling effort and consequently be read by querying the Data Dictionary. This would allow the Model Driven Design to be completed; different ETLs on the Data Vault model for a Star Schema could be generated using information captured in the model itself.
For this purpose I used ERwin 9.5 (Community Edition) with SQL Server 2012, although this works on similar products such as Embarcadero ER Studio as well. In ERwin, go to the Column Properties and navigate to the UDP (User Defined Properties) tab. In this tab you can create a type for anything you want, which in our case is a ‘HistoryType’ label for each column with the default value of ‘None’ and a type of ‘Text’ as the contents are string values. Creating this UDP allows you to set a HistoryType for each individual column.
In ERwin, don’t forget to set the following two properties or the logic will not forward-engineer:
- The table needs a schema to be defined
- The flag Database Property in the UDP definition needs to be set to True (the default is False)
After setting the desired values for each column (e.g. Type0, Type1 or Type2) you can forward engineer the logic to add these labels to the database. In SQL Server these are generated as ‘Extended Properties’, which is the SQL Server terminology for this. If the UDP is applied per attribute the forward engineering process will generate a Stored Procedure per column to add the label to the Data Dictionary. The (example) code for this is as follows:
@name = 'HistoryType', @value = 'Type1',
@level0type = 'SCHEMA', @level0name = 'dbo',
@level1type = 'TABLE', @level1name = 'DIM_MEMBER',
@level2type = 'COLUMN', @level2name = 'GIVEN_NAME'
As you can see the T-SQL code is not very intuitive, but thankfully this is all handled by the modelling tool. Running this code, which is generated as part of the table DDL updates the SQL Server dictionary. This information can be retrieved for further use, for automation the ETL in our case, using a query such as the following:
DECLARE @TableName AS VARCHAR(MAX)='DIM_MEMBER'
DECLARE @ColumnName AS VARCHAR(MAX)='GIVEN_NAME'
FROM sys.tables st
INNER JOIN sys.columns sc on st.object_id = sc.object_id
LEFT JOIN sys.extended_properties sep on st.object_id = sep.major_id
AND sc.column_id = sep.minor_id
AND sep.name = 'HistoryType'
-- WHERE st.name = @TableName
-- AND sc.name = @ColumnName
This gives you the result back in SQL:
You can use similar model metadata as well for other information you would otherwise store in a repository, examples are:
- Multi-active and/or degenerate attributes (although these can be detected by patterns in the Data Dictionary as well without additional reference)
- Driving Keys in relationships
- Business Keys
- Defining certain types of tables (e.g. Facts/Dimensions, Regular LSAT versus Driving Key LSAT etc.)
Lastly, UDPs / Extended Properties can be defined for a large variety of objects including, but not limited to Indexes, Views, Tables, Schemas, Functions and even databases.
For ETL automation and development, this saves a massive amount of time otherwise spent creating Source-To-Target mappings or repositories to store this type of content.