Tagged: ETL Generation

0

The DWH Time Machine: synchronising model and automation metadata versions

I’ve completed a fairly large body of work that I’ve been meaning to do for a long time: how to automatically version the Data Warehouse data model in sync with the version of the ETL automation metadata. Although versioning models and code is relevant (but rarely implemented) in the traditional ETL area, this requirement to becomes very real when moving to a virtualised Data Warehouse / integrated model approach (Data Vault 2.0 in my case)....

 
1

NoETL – Data Vault Satellite tables

The recent presentations provides a push to wrap up the development and release of the Data Vault virtualisation initiative, so now everything is working properly the next few posts should be relatively quick to produce. First off is the Satellite processing, which supports the typical elements we have seen earlier: Regular, composite, concatenated business keys with hashing Zero record provision Reuse of the objects for ETL purposes if required As this is another process going...

 
Changes to the site(s) 0

Changes to the site(s)

In preparation of enabling an online (proof of concept) ETL generation site  I have updated various sections of this weblog to conform to the implementations (templates) used for ETL generation. Moreover most documentation has been moved to the corresponding Wiki and consequently removed from this site. Ultimately the Wiki is better suited to add more detailed documentation and examples and as such only the high level concepts and positioning are available on this site. To complete the...

 
ETL generation in SSIS (adult steps) 0

ETL generation in SSIS (adult steps)

Full generation of ETL is the missing component towards the model driven design of the Data Warehouse and I am still pursuing various methods for various ETL suites to add to this concept. Some time ago I looked into ETL generation for Microsoft SSIS using the available DTS libraries (see the post) which really were baby steps. At some point one of my colleagues suggested looking into ‘BIML’ (Business Intelligence Markup Language) and the accompanying...

 
Current mapping generation improvement points 0

Current mapping generation improvement points

I’ve demo-ed the posted processes for mapping generation and Data Vault a couple of times for varying audiences and (lucky me) no one has noticed the existing flaws 🙂 So it’s time to make a list of them to keep me from not forgetting to fix them. And it’s a heads up for the few people I know who are using the scripts. There is currently no way to keep the ‘transactional’ attributes out of...

 
Mapping generation for Data Vault demo: part 6 (links) 6

Mapping generation for Data Vault demo: part 6 (links)

Getting the Data Vault link entities right is probably the hardest part of the mapping generation algorithm. The following script will require user input at critical moments in the process. It helps to have the target datamodel at hand so you know why (and how) these choices are made. By reading from the history area tables and using the Hub entities created in the previous step this algorithm will create relationship tables (links) including link satellites....

 
Mapping generation for Data Vault demo: part 5 (hubs and satellites) 0

Mapping generation for Data Vault demo: part 5 (hubs and satellites)

After the successful creation of a history layer, it is time to focus on the core parts of the datawarehouse: the Data Vault. In this example a ‘raw’ Data Vault is created: no changes or transformations are done here. The first script in this layer creates the Hubs and the corresponding Satellite. Currently this is done based on the history layer, which is probably not the best approach. It would be better to have the...

 
Mapping generation for Data Vault demo: part 4 (history area) 0

Mapping generation for Data Vault demo: part 4 (history area)

Now that the staging area tables and the mappings from source to staging are created, it’s time for the next step: the history area. In this step the source data is archived in a SCD2 way. The source tcl file for the staging to history mappings can be downloaded here: 2_staging_to_history_generation. When run (source command!) the script will ask the familiar questions about replacing ETL metadata. The script executes the following steps, specific elements are...