Updated the open source Data Vault metadata tool TEAM to v1.5.5.0

We just finalised the new & improved version of the Taxonomy of ETL Automation Metadata (‘TEAM’) and published this as version v1.5.5.0. There are some major changes in this version that are not backwards compatible, so if you are using the interface views please have a look at the changes below. I have released the updated VDW code and components as well and will do so for the ‘simple ETL generation series’ shortly.

This TEAM version is a big step towards making sure all metadata required to automate Data Vault development is in a single place and easily managed. We have generated all kinds of SQL and ETL artefacts to make sure we can cover all the scenarios we know :-).

Repository changes were needed to achieve this. Downloads are from Github (send me a note if you would like to be added) or directly from here.

Don’t forget, there is an upcoming training on how to develop your own Virtual Data Warehouse using Data Vault techniques and (code generation as developed in TEAM) in Utrecht, The Netherlands (June 24-26). Check out the new site for this if interested: www.virtualdwh.com. Hope to see you there!

Changes in this version are:

  • Updated the back-end repository to version 1. 5.
  • Across-the-board name changes for repository tables and attributes (i.e. ATTRIBUTE_FROM_NAME => SATELLITE_ATTRIBUTE_NAME).
  • Addition of LOAD_VECTOR to capture the inferred direction of ETL (to support connection changes). I.e. Data Vault to Data Vault or source to Data Vault.
  • A snapshot of the physical model is now created during activation (MD_PHYSICAL_MODEL) and exposed as an interface (INTERFACE_PHYSICAL_MODEL). This prevents any database calls and enables a deterministic generation approach that can be integrated in DevOps.
  • Various cosmetic changes (auto-resize, removing versioning where not required).
  • Added support for saving interface outputs to disk as JSON files. This is another step towards becoming database-less.
  • Refactor of the activation logic, to reduce code duplication via the introduction of new metadata handling classes.
  • Added support for schema and database in the table names (fully qualified names). For instance adding [bdv].SAT_CUSTOMER_DERIVED.
  • Various underlying changes to enable true virtualisation based on the reverse-engineered physical model. If ignore version is unchecked then everything properly runs off the internal physical model (grid).
  • Minor changes to sample data, to resolve the record source when generating sample code for DIRECT.
  • Fixed some issues around case-sensitive databases. The metadata should be case-sensitive, but was accidentally enclosed with an UPPER statement. Works for Azure as well now.
  • Added INTERFACE_SOURCE_LINK_ATTRIBUTE_XREF interface view to expose degenerate attributes for links.
  • Many tiny bugs have been squashed!
  • Extended validation checks to work in true virtual mode.
 

Roelant Vos

Ravos Business Intelligence admin

You may also like...

4 Responses

  1. Martin Ekeblad says:

    Hi Roelant
    Great post, did see you mention Azure there!

    I am interested to know if you made your framework function in Azure, were you able to lift it with minor changes or did you build a new one?
    Have you tried to implement your framework on Azure SQL Databases with Elastic queries or did you try Polybase and Azure SQL Datawarehouse?
    And if you tried your framework with Azure SQL Databases, did it work with Elastic queries (External tables) performance wise?
    And if you tried Polybase and Azure SQL Datawarehouse how much of your framework was intact (META databases and so)?

    Cheers
    /Martin

     
  2. Ravos says:

    Hi Martin,

    Thanks for reaching out. While I use SQL Server for day to day development, I use Azure databases in the training workshops and this has been working fine without any changes.

    I have examples of using Polybase to ready/write to Azure blog storage but haven’t implemented that at scale. All I can say is that the logic works, which was my main focus. Once the connectivity works and the tools function properly the adaption of templates is relatively easy.

    Kind regards,
    Roelant

     
  3. Martin Ekeblad says:

    Hello Roelant

    My framework looks similar to yours, take this example that is your hub generation code (see below), mine looks almost the same.

    Here you insert into target (db), selects from source (db) and do a left join between source and target (I use NOT IN).
    For this to work you would need to define external tables (elastic queries) in either the source or target database.

    And this is what is taking forever for me, in my Wide World Importers test environment, the loading of the hubs takes one minute or less when I do it locally on On-Premise SQL Server, but when running the code in Azure SQL Database I had to stop it after 2 hours, the loading of the hubs was 30% completed.

    I have tested on a BASIC Azure SQL Database with 5 DTUI and no indexes on the external tables but still. …
    Do you have had similar experience when you have been running your framework in Azure or have you circumvented the creation of external tables somehow?

    As always, your site is a treasure trove of information.
    Cheers
    /Martin

    INSERT INTO ..
    (


    ,
    ,

    )
    SELECT  HASHBYTES(‘MD5′,    ISNULL(RTRIM(CONVERT(NVARCHAR(100), stg.)),’NA’)+’|’  ) AS ,  stg. AS ,  MIN(stg.) AS ,  -1 AS ,  stg.
    FROM .. stg
    LEFT OUTER JOIN .dbo. hub
    ON stg. = 
    WHERE stg. IS NOT NULL  AND hub. IS NULLGROUP BY  HASHBYTES(‘MD5′,    ISNULL(RTRIM(CONVERT(NVARCHAR(100),stg.)),’NA’)+’|’  ),  stg.,     stg.

     
    • Ravos says:

      Mhh I haven’t encountered that, very odd. I take it both the source and target are both plain Azure SQL database right? An alternative could perhaps be to move everything to a single database (and use schema) for processing? I have to admit, that’s what I usually do for the demos.

      The patterns do look the same :-).

       

Leave a Reply