Remedies for Driving Keys

In the recent post on end-dating in Data Vault I referred to alternatives for the driving key pattern, but these were not yet documented. So, this post contains an overview of various ways to think about the implementation of driving keys.

The ‘driving key’ term comes from Data Vault methodology. It is a special mechanism to track relationship validity, by ‘forcing’ a relationship between two, or more, Hubs to be end-dated when a new relationship becomes known.

For example, if you take a ‘Driver’ Hub and a ‘Car’ Hub you might want to implement that a Car can only ever have one active Driver at a point in time. When a new relationship arrives where the Car has a new Driver, then the relationship from the Car to the previous Driver is end-dated.

The Car is the driving key in this example, the ‘static’ Hub in the relationship.

An example is provider below.

Note that the relationship is ‘reopened’ in this example. After all, after a while it’s perfectly reasonable that Sam is the driver of the car again.

Driving key implementations are used to ‘close’ and/or ‘reopen’ relationships when there is no suitable data point otherwise available to do so. Ideally, the data would inform you when certain relationships are no longer valid. But data is not always available for this, in which case a business rule (interpretation) can be implemented to interpret the validity of a relationship – the driving key mechanism.

Driving key (Link)Satellites can co-exist with ‘regular’ (Link)Satellites, as the context of each individual relationship may still change over time. For example, if we would monitor the position that the driver has in the car seat this might create multiple data points where the driver shifts in the seat as part of the drive. These are tracked separately from the relationship validity.

In the aforementioned post driving keys were considered in the context of end-dating, to create a date range for understanding between which points in time a relationship is valid. That can be somewhat complicated, because you have to ‘know’ what driving key to apply to correctly assert the end date.

Consider the ‘regular’ (Link)Satellite logic for end-dating:

SELECT
  <date> AS EFFECTIVE_DATEIME
 ,LEAD(<date>,1,'9999-12-31') OVER (PARTITION BY <key> ORDER BY <date) AS EXPIRY_DATETIME
FROM <(Link)Satellite>

This is a pretty straightforward window function, which can be used either to calculate the end date at runtime, or as part of a procedure that updates the table with the end dates.

Now, consider the same logic for a driving key (Link)Satellite:

SELECT
  <date> AS EFFECTIVE_DATEIME
 ,LEAD(<date>,1,'9999-12-31') OVER (PARTITION BY <driving key (from Link)> ORDER BY <date) AS EXPIRY_DATETIME
FROM <Link> lnk
JOIN <(Link)Satellite> lsat ON lnk.<Link key> = lsat.<Link key>

For the larger model, using the Car/Driver example and in a more ‘traditional’ Data Vault setup, this looks as follows:

A traditional Data Vault approach.

Driving keys continue to cause problems even for experienced Data Vault practitioners. Needless to say, keeping this open for interpretation and mistakes is an area of concern, and requires a better solution than simply doing ‘insert-only’.

Avoiding Link Satellites

‘Data Vault’ means many things to many people. Ideas change over time, and the understanding of what Data Vault is, and how it’s supposed to work usually depends on when people have been exposed to the methodology.

Traditionally, the idea has been that descriptive properties of a certain concept go into a Satellite and that any descriptive properties that describe a relationship go into a Link-Satellite.

Over the years, my appreciation of this has changed. Nowadays, I feel this still makes sense when talking about data at a more logical level – but that it is not ideal as a physical implementation. For the physical model, I believe the Ensemble Logical Modeling (ELM) interpretation on Data Vault works better.

In the ELM Data Vault philosophy, there are no Link-Satellites. Instead, relationships worth tracking things for are represented by a ‘relationship-describing Hub’ – formerly known as a ‘keyed instance’. This Hub represents the relationship, and any ‘relationship context’ goes into a regular Satellite for that Hub. The Hub would connect to the other Hubs through a Link, but no Link-Satellites are needed here.

This is an evolution of the Data Vault thinking that makes sense to me overall, but what about driving keys? There would be a Car/Driver relationship-describing Hub, but it would still be necessary to ‘know’ the driving key.

Consider the example below. Even though the relationship context is moved to a relationship-describing Hub, the issue is still fundamentally the same.

An Ensemble Logical Model (ELM) representation.

There is an advantage to this approach from an implementation perspective though. In the earlier example, you would need two Link-Satellite patterns and one Satellite one to develop the solution. In this approach you only need two Satellite patterns to do the job.

Can it be made easier still?

Foreign key Links

Building on the ELM concepts, there is another idea that can be used – a ‘foreign key link’ (pending better name):

Remodelling the driving key into a foreign key Link.

This example still implements a relationship-describing Hub for the Car/Driver relationship, but only regular context is tracked here. The driving key mechanism, however, is moved to the ‘Car’ Hub. This makes sense, because all it really does is track the (changes of) the other (foreign) keys for the car over time.

The result is that the driving key is made explicit. Users don’t need know know what key to partition by anymore for creating the date ranges. On top of that, the regular Satellite pattern can be used so there really is only one Satellite pattern required for all contextual data everywhere.

Are there other ways? Definitely, there are multiple variations on these themes. For example, you can use a convention-based approach and add a redundant driving key to the Satellite – so that users have a visual of which key to replace. And so on.

But, if you are going all-in on insert-only then this is one reliable way to achieve this while avoiding driving key complexities.

Does this break standards? Yes, some would say, of course. But where’s progress if we don’t challenge the status quo from time to time :-).

Roelant Vos

Ravos Business Intelligence admin

You may also like...

Leave a Reply