Mapping generation for Data Vault demo: part 3 (staging area)

Right now we’ve got the basics for the mapping generation in order: source data and a configured Oracle Warehouse Builder project. The next step would be to generate the staging area based on the source definitions.

First we need to import the source definitions metadata so OWB has something to work on. The source definitions can be imported as source definition metadata to the 00_SOURCE_SYSTEM_WORLD module. The scripts will use this source folder to select the initial metadata.

When the source table definitions are available in OWB  it’s time to run the tcl script 1_source_to_staging_generation.tcl (download: 1_source_to_staging_generation) to generate the staging area for these new sources. The following steps are taken for this:
1. Run the script using the source command in OMB (source 1_source_to_staging_generation.tcl)
2. The script will ask if it’s all right to truncate the entire staging area, in a green field like this demo that’s ok.
3. The script will cycle through the source definitions and create target tables and mappings with the prefixes as defined in the settings.tcl script. In the current supplied script the source module (00_SOURCE_SYSTEM_WORLD is hardcoded).

The script executes the following code. Specific elements are highlighted and commented. I will not copy the entire script every time in the future posts (comments are added to the scripts as well) since they get progressively more complicated and copying them as a whole will only complicate things. Many conceptes and bits of code appear in all the scripts so it would only be more of the same. I will however try to explain the thoughts that are behind it as much as possible.
                                                                
# Name:       OWB Staging Area generation TCL                         
# Author:      Roelant Vos                                          
# Purpose:    Generate staging mappings                               
                                                             
source C:/TCL/settings.tcl
puts “”
puts “nConnecting to OWB Metadata Repository, project: $projectn”

##################################
# General connection information #
##################################
OMBDISCONNECT
OMBCONNECT $OWB_repository/$OWB_password@$OWB_server:1521:$environment
OMBCC ‘$project’

set ModList “00_SOURCE_SYSTEM_WORLD” 
The modules that are processed are coded here, this can be done using an array as well.
 
# Truncate and create Staging Area  
puts -nonewline stdout “Do you want to drop the staging area? (Y/N)  => “; gets stdin StgYN;
puts “”

if { $StgYN == “j” || $StgYN == “J” || $StgYN == “y” || $StgYN == “Y”  } {
  puts “The existing Staging Area $StgMod will be dropped and recreated…”
  puts “”

  OMBDROP ORACLE_MODULE ‘$StgMod’
  OMBCREATE ORACLE_MODULE ‘$StgMod’ SET PROPERTIES (DESCRIPTION)
            VALUES (‘This is the Staging Area module in which all the records from the different sources are stored for further integration.
                     The current SA contains full import data from the following source systems: $ModList’)
            SET REFERENCE LOCATION ‘$StgLocation’
}
puts “Processing source module $ModListn”

###########################
# Staging Area generation #
###########################
Here the overall loop starts with the cycling through multiple modules, currently there is only one (hardcoded)
OMBCC ‘$StgMod’
foreach ModName $ModList {

The prefixes are displayed, but they are stored in the settings.tcl file.
  puts “Working on Staging Area: $ModNamen”
  puts “The staging prefix is set to: $StagingPrefix”
  puts “The source system prefix is set to: $SourceSystemn”

# Table generation #

  # Change context to module <$ModName>
  OMBCC ‘..’  OMBCC ‘$ModName’

  # Process all the tables in module <$ModName>
  set TabList [ OMBLIST TABLES ]

  OMBCC ‘..’  OMBCC ‘$StgMod’
  set AllTableListStg [ OMBLIST TABLES ]
  set AllMappingListStg [ OMBLIST MAPPINGS ]

For every table in the array TabList (so all the tables within a module) all the next steps, including mapping generation are executed.
  foreach TabName $TabList {

    set TabName [string toupper $TabName]
The name of the target table is defined here using the prefix supplied by the settings file and the source table name. Addition checks are in place to drop the table if it already exists.
    set TgtTable $StagingPrefix$TabName
    set TgtTable [string range $TgtTable 0 29]

    if { [ lsearch $AllTableListStg $TgtTable ] >= 0 } {
        puts “Table: $TgtTable already exists and will not be replaced –> SKIPPEDn”
    } else {

    if { [ lsearch $AllTableListStg $TgtTable ] >= 0 } {
      OMBDROP TABLE ‘$TgtTable’
      puts “Replacing table $TgtTable”
    }

    puts “Working on Table: $TabNamen”
    puts “$project/$ModName/$TabNamen”
    puts ”   —> $project/$StgMod/$TgtTablen”

    # Back one level …
    OMBCC ‘..’
    OMBCC ‘$StgMod’
    OMBCREATE TABLE ‘$TgtTable’
    puts ”   —> … table created”

    # Back one level …
    OMBCC ‘..’    OMBCC ‘$ModName’

The next part of code will cycle through the present attributes for the particular table. In other words this is a nested loop within the table loop. Every column will be streamlined according to the staging area conventions. The datatype char or varchar is taken as an example.

    set columnList [ OMBRETRIEVE TABLE ‘$TabName’ GET COLUMNS ]

    foreach columnName $columnList {

      set columnName [string toupper $columnName]  

      set columnDataType [ OMBRETRIEVE TABLE ‘$TabName’ COLUMN ‘$columnName’ GET PROPERTIES (DATATYPE) ]
      set columnDataNull [ OMBRETRIEVE TABLE ‘$TabName’ COLUMN ‘$columnName’ GET PROPERTIES (NOT_NULL) ]
      set columnDataLength ‘N/A’

      if {“$columnDataType” == “VARCHAR2” || “$columnDataType” == “CHAR” || “$columnDataType” == “NVARCHAR”} {
        set columnDataLength [ OMBRETRIEVE TABLE ‘$TabName’ COLUMN ‘$columnName’ GET PROPERTIES (LENGTH) ]

    OMBCC ‘..’       
    OMBCC ‘$StgMod’

If the length is less than 100 then the datatype will be set to 100, higher than 100 but less than 1000 will be 1000 and otherwise the source length will be used.

        if {$columnDataLength <= 100} {
            set columnDataLength ‘100’
        } elseif {$columnDataLength > 100 && $columnDataLength <= 1000} {
            set columnDataLength ‘1000’
        } else {
            set columnDataLength ‘$columnDataLength’
        }
 
        OMBALTER TABLE ‘$TgtTable’ ADD COLUMN ‘$columnName’ SET PROPERTIES (DATATYPE, LENGTH, NOT_NULL)
            VALUES (‘VARCHAR2’, $columnDataLength, $columnDataNull)
        puts ”   —> … added column $columnName to $TgtTable”    
      } elseif {“$columnDataType” == “NUMBER” || “$columnDataType” == “INT” || “$columnDataType” == “MONEY” || “$columnDataType” == “SMALLINT”} {

       set columnDataPrec ‘0’
       set columnDataScale ‘0’

        OMBCC ‘..’
        OMBCC ‘$StgMod’

        OMBALTER TABLE ‘$TgtTable’ ADD COLUMN ‘$columnName’ SET PROPERTIES (DATATYPE, PRECISION, SCALE, NOT_NULL)
             VALUES (‘NUMBER’, $columnDataPrec, $columnDataScale, $columnDataNull)
        puts ”   —> … added column $columnName to $TgtTable”

      } elseif {“$columnDataType” == “UNIQUEIDENTIFIER”} {

        OMBCC ‘..’
        OMBCC ‘$StgMod’

        OMBALTER TABLE ‘$TgtTable’ ADD COLUMN ‘$columnName’ SET PROPERTIES (DATATYPE, LENGTH, NOT_NULL)
            VALUES (‘VARCHAR2’, ‘1000’, $columnDataNull)
        puts ”   —> … added column $columnName to $TgtTable”
       
      } elseif {“$columnDataType” == “DATE” || “$columnDataType” == “DATETIME”} {

        # Back one level …
        OMBCC ‘..’
        OMBCC ‘$StgMod’

        OMBALTER TABLE ‘$TgtTable’ ADD COLUMN ‘$columnName’ SET PROPERTIES (DATATYPE)
                 VALUES (‘DATE’)
        puts ”   —> … added column $columnName to $TgtTable”
      } elseif  { “$columnDataType” == “MDSYS.SDO_GEOMETRY”} {

        # Back one level …
        OMBCC ‘..’
        OMBCC ‘$StgMod’

        OMBALTER TABLE ‘$TgtTable’ ADD COLUMN ‘$columnName’ SET PROPERTIES (DATATYPE)
                 VALUES (‘$columnDataType’)

If the source contains a datatype that is not yet defined in this script an error will be thrown.
        puts ”        ERROR: unsupported datatype ($columnDataType)!”;
      }

      # Back one level …
      OMBCC ‘..’
      OMBCC ‘$ModName’

    }

    # Back one level …
    OMBCC ‘..’
    OMBCC ‘$StgMod’

At this point the metadata attributes are added to the staging area table definition.

    OMBALTER TABLE ‘$TgtTable’ ADD COLUMN ‘PROCESS_CODE’ SET PROPERTIES (DATATYPE, LENGTH)
                                  VALUES (‘VARCHAR2’, 30)
                               ADD COLUMN ‘LOAD_DATE’ SET PROPERTIES (DATATYPE)
                                  VALUES (‘DATE’)
                 ADD COLUMN ‘SOURCE_SYSTEM’ SET PROPERTIES (DATATYPE, LENGTH)
                                  VALUES (‘VARCHAR2’, 30)
                 ADD COLUMN ‘WORKFLOW_ID’ SET PROPERTIES (DATATYPE)
                                  VALUES (‘NUMBER’) 

    puts ”   —> … added metadata column PROCESS_CODE,to $TgtTable”
    puts ”   —> … added metadata column LOAD_DATE to $TgtTable”
    puts ”   —> … added metadata column SOURCE_SYSTEM to $TgtTable”
    puts ”   —> … added metadata column WORKFLOW_ID to $TgtTable”

    # Back one level …
    OMBCC ‘..’
    OMBCC ‘$ModName’

The following bit of code retrieves the primary key from the source table and creates it on the staging area table. This could be made optional since there are valid arguments against having a primary key in a staging area.

    set PrimaryKey [OMBRETRIEVE TABLE ‘$TabName’ GET PRIMARY_KEY]
    set PrimaryKeyFinal “”

    if {$PrimaryKey !=[] } {
      set PrimaryKeyCols [OMBRETRIEVE TABLE ‘$TabName’ PRIMARY_KEY ‘$PrimaryKey’ GET COLUMNS]
    }

    # Back one level …
    OMBCC ‘..’
    OMBCC ‘$StgMod’

    # Determine primary key parts for possible composite primary key
    if {$PrimaryKey !=[] } {
      foreach KeyPart $PrimaryKeyCols {      
        set PrimaryKeyFinal “$PrimaryKeyFinal ‘$KeyPart'”         
      }
     
      set PrimaryKeyCols [string trim $PrimaryKeyFinal]
      set PrimaryKeyCols [string map {” ” ,} $PrimaryKeyCols]
      set PrimaryKeyCols [string map {, “, “} $PrimaryKeyCols]
           
      OMBALTER TABLE ‘$TgtTable’
            ADD PRIMARY_KEY ‘$PrimaryKey’ SET REF COLUMNS ($PrimaryKeyCols)
           
      puts ”   —> … added Primary Key (columns: $PrimaryKeyCols) $PrimaryKey to $TgtTable”

      OMBCOMMIT    
    }
  }

The name of the mapping is defined here with the hardcoded prefix M_ and the name of the target (staging area) table.
  set MappingName M_$TgtTable
  set MappingName [string range $MappingName 0 29]
  if { [ lsearch $AllMappingListStg $MappingName ] >= 0 } {
      puts “Mapping: $MappingName does exist and will not be replaced –> SKIPPINGn”
  } else {

    if { [ lsearch $AllTableListStg $TgtTable ] >= 0 } {
      OMBDROP MAPPING ‘$MappingName’
      puts “Replacing mapping $MappingName”
    }

    OMBCC ‘..’
    OMBCC ‘$ModName’
      set PrimaryKey [OMBRETRIEVE TABLE ‘$TabName’ GET PRIMARY_KEY]
      set PrimaryKeyCols [OMBRETRIEVE TABLE ‘$TabName’ PRIMARY_KEY ‘$PrimaryKey’ GET COLUMNS]
    OMBCC ‘..’
    OMBCC ‘$StgMod’
   
    ######################
    # Mapping generation #
    ######################

In this step the actual mapping is created, between the stored source definition and the newly created staging area definition. The mapping is done based on the fact that source and target attributes are the same and the process attributes always follow the same rule for every mapping.

     OMBCREATE MAPPING ‘$MappingName’ SET PROPERTIES (DESCRIPTION)
                                          VALUES (‘Import new full load data’)
         ADD TABLE OPERATOR ‘$TabName’ BOUND TO TABLE ‘/$project/$ModName/$TabName’
         ADD TABLE OPERATOR ‘$TgtTable’ SET PROPERTIES (LOADING_TYPE) VALUES (‘TRUNCATE/INSERT’)
                            BOUND TO TABLE ‘/$project/$StgMod/$TgtTable’
         ADD CONSTANT OPERATOR ‘PROCESS_CONSTANTS’
             SET PROPERTIES (DESCRIPTION)
             VALUES (‘Process run information for mapping $MappingName’)
         ADD ATTRIBUTE ‘PROCESS_CODE’ OF GROUP ‘OUTGRP1’ OF OPERATOR ‘PROCESS_CONSTANTS’
             SET PROPERTIES (DATATYPE, LENGTH, EXPRESSION)
             VALUES (‘VARCHAR2′, 30, ”’$MappingName”’)
         ADD ATTRIBUTE ‘LOAD_DATE’ OF GROUP ‘OUTGRP1’ OF OPERATOR ‘PROCESS_CONSTANTS’
             SET PROPERTIES (DATATYPE, EXPRESSION)
             VALUES (‘DATE’, ‘sysdate’)
         ADD ATTRIBUTE ‘SOURCE_SYSTEM’ OF GROUP ‘OUTGRP1’ OF OPERATOR ‘PROCESS_CONSTANTS’
             SET PROPERTIES (DATATYPE, LENGTH, EXPRESSION)
             VALUES (‘VARCHAR2′, 30, ”’$SourceSystem”’)
         ADD ATTRIBUTE ‘WORKFLOW_ID’ OF GROUP ‘OUTGRP1’ OF OPERATOR ‘PROCESS_CONSTANTS’
             SET PROPERTIES (DATATYPE, EXPRESSION)
             VALUES (‘NUMBER’, ‘DWH_INTEGRATION_AREA.”FNC_GET_WORKFLOW_ID”()’)
             
      OMBALTER MAPPING ‘$MappingName’
           ADD CONNECTION FROM GROUP ‘INOUTGRP1’ OF OPERATOR ‘$TabName’
               TO GROUP ‘INOUTGRP1’ OF OPERATOR ‘$TgtTable’ BY NAME

      OMBALTER MAPPING ‘$MappingName’
           ADD CONNECTION FROM GROUP ‘OUTGRP1’ OF OPERATOR ‘PROCESS_CONSTANTS’
               TO GROUP ‘INOUTGRP1’ OF OPERATOR ‘$TgtTable’ BY NAME

      OMBCOMMIT
     
      puts ”   —> … mapping $MappingName generated”
      puts “”
    }
  }
}

puts “nGeneration ready”

OMBSAVE
OMBDISCONNECT

 
Roelant Vos

Roelant Vos

You may also like...

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.