Joining tables in the Persistent Staging Area
Joining tables in the Persistent Staging Area (PSA) could be a practical solution that avoids downstream complexities. This post explains the pattern to do so.
Joining tables in the Persistent Staging Area (PSA) could be a practical solution that avoids downstream complexities. This post explains the pattern to do so.
By plotting, and then combining, bitemporal and historised data sets on a cartesian plane it’s really easy to understand bitemporal behaviour.
When delivering data from the integration layer (e.g. a Data Vault model) to the presentation layer (anything, but usually a dimensional model or wide table), a key requirement is re-organising data to the selected ‘business’ timeline for delivery.
During this process, we leave the safety of the assertion (technical) timeline behind and start using the real-world state timeline for delivery. This may create some unexpected results!
The data is wrong! No, it’s not wrong, we’re just looking at it from different points in time. This post shows how a data warehouse helps to manage this common topic on data interpretation.
Data Vault / ETL / General
by Roelant Vos · Published February 5, 2023 · Last modified February 7, 2023
When preparing Data Vault content for consumption in a dimensional model, dimension keys can be created to join the resulting fact- and dimension tables in a performant way. But what about for a truly virtual data mart? This post covers approaches to issue dimension keys that are fully deterministic.
To facilitate ongoing research in tweaking Data Vault patterns for various use-cases, I recently updated the open source data warehouse automation environments TEAM (source-to-target mapping management) and Virtual Data Warehouse (code generation). These updated versions make playing around with patterns even easier. If you’re interested in having a look at how different patterns work, or what it would mean to deploy a fully virtual data warehouse – have a look at the provided examples. With...
Having a certain model structure does not necessarily mean that a given methodology is implemented correctly, and the chosen modelling perspective has significant impact on your resulting Data Vault.
Here is a recording of my presentation on using BimlFlex for code generation based off a business model, for the Global Data Summit 2021. With a brief intro by Hans Hultgren.
A data logistics control framework tracks everything that happens in a data solution. A small number of key metrics can be used to keep informed of important exceptions.
Automation / Data Vault / ETL
by Roelant Vos · Published November 28, 2021 · Last modified November 11, 2024
This post shows how to configure an Azure DevOps pipeline to generate code using the schema for Data Warehouse Automation.
More
Moving to Europe (The Netherlands) - July 17th, 2025