The new version of the data automation metadata schema is ready!

by Roelant Vos · Published July 8, 2024 · Updated March 1, 2025

The latest version of the schema for data solution automation has been published on Github. This comprehensive collection includes examples, functions, and definitions crucial for metadata management in data solutions, particularly data warehouse systems.

You can find the details, code, and documentation here:

What has changed?

One significant update focuses on the Business Key Definition. Initially, it featured a property called ‘Business Key Component Mappings,’ which listed data item mappings for various business key components from source to target. However, it became evident through various Data Vault projects that not all Data Object Mappings required or had a target data item to map to. For instance, while mapping business key components in a Hub table was practical, it often didn’t apply in a Satellite due to the absence of a consistent target.

To refine this concept, the Business Key Definition now includes an ordered list of data items or queries known as Business Key Components. These components explicitly define the business key and replace the previous ‘Business Key Component Mappings.’

Here’s an example illustrating these changes (specifically focusing on the ‘businessKeyDefinitions’ segment):

{
  "name": "HUB_CUSTOMER",
  "dataObjectMappings": [
    {
      "name": "PSA_PROFILER_CUST_MEMBERSHIP to HUB_CUSTOMER",
      "sourceDataObjects": [
        {
          "name": "PSA_PROFILER_CUST_MEMBERSHIP"
        }
      ],
      "targetDataObject": {
        "name": "HUB_CUSTOMER"
      },
      "businessKeyDefinitions": [
        {
          "name": "PSA_PROFILER_CUST_MEMBERSHIP to HUB_CUSTOMER for CUSTOMER_SK",
          "surrogateKey": "CUSTOMER_SK",
          "businessKeyComponents": [
            {
              "ordinalPosition": 1,
              "dataItem":
                {
                  "name": "CODE"
                }
            },
            {
              "ordinalPosition": 2,
              "dataItem":
                {
                  "name": "SUFFIX"
                }
              ]
            }
          ]
        }
      ],
      "enabled": true
    }
  ]
}

The addition of ordering now allows for maintaining the sequence of business key components in complex keys. This is particularly useful when concatenating values where the order of concatenated elements holds significance.

At the Data Object Mapping level, individual business keys within the Business Keys array are now numbered and ordered using an ordinalPostion property. This simplifies mappings involving multiple business keys, such as those in Link tables.

You can now explore these enhancements in Agnostic Data Labs as well. The provided samples have been updated to showcase the latest advancements in data warehouse automation capabilities.

Relationships and cardinality

New additions to the schema include the Relationship and Cardinality objects, which conceptually replace the Related Data Objects property in the Data Object Mapping. This property, formerly a list of Data Objects, has been superseded by the new Relationship class.

The Relationships within the Data Object Mapping are now numbered to facilitate sorting and maintain object order, which can be crucial depending on its application.

A relationship signifies a connection between two Data objects, applicable at conceptual, logical, and physical levels. This class supports lineage relationships (e.g., parent/child), foreign keys, and sub- and supertypes.

One of the most notable features of a relationship is its cardinality. This object defines the uniqueness of data values within a column of a database table and specifies the number of occurrences of one entity associated with another through the relationship.

Each cardinality includes an ordinality aspect, captured through from- and to-range specifications, detailing the minimum and maximum allowed appearances of objects on both sides of the relationship.

For example, a one-to-one relationship could be defined in metadata as:

{
  "fromRange": {
    "min": "1",
    "max": "1"
  },
  "toRange": {
    "min": "1",
    "max": "1"
  }
}.

This makes it possible to define conceptual, logical, and physical models using the schema, and use your code generation templates to forward-engineer the complete solution.

The Git repository contains various examples on how to make this work. Give it a go!

The new version of the data automation metadata schema is ready!

What has changed?

Relationships and cardinality

You may also like...

Leave a Reply Cancel reply

Search this site

Upcoming Events

Recent Posts

The new version of the data automation metadata schema is ready!

What has changed?

Relationships and cardinality

You may also like...

Data Vault 2.0 – how to handle Referential Integrity?

Virtualising your Data Vault – Hubs and Links

Roelant Vos to join the Varigence team!

Leave a Reply Cancel reply

Search this site

Upcoming Events

Recent Posts