Joining tables in the Persistent Staging Area
Joining tables in the Persistent Staging Area (PSA) could be a practical solution that avoids downstream complexities. This post explains the pattern to do so.
This category contains all documents regarding BI, DWH, ETL and Front-end architecture. Also includes datamodelling (Data Vault, Inmon, Kimball).
Joining tables in the Persistent Staging Area (PSA) could be a practical solution that avoids downstream complexities. This post explains the pattern to do so.
Recording a deleted flag is essential to delivering the correct data, and this post explains why.
An overview of the end-to-end process video demonstrating the Confluent Kafka pub/sub as available on YouTube.
Example code for creating a Confluent Kafka consumer using the .Net libraries.
A new set of improvements have been committed to the data logistics control (‘ETL process control’) framework Github. This framework, referred to DIRECT (thanks to acronimify.com), assumes the spot of the Data Logistics Process Control in the engine metaphor for flexible data platform management. In case you were wondering, it’s the Data Integration Run-time Execution Control Tool! This is one of my favourite components, as it’s both so simple but also hard to truly get...
The engine represents an ecosystem of data warehouse automation tooling and ideas, making it easier to shape data into the desired delivery formats.
The repository that contains work on the generic Data Warehouse Automation interface has been rebuild.
A Persistent Staging Area if often associated with database, but this is not the way it should be. This post covers alternative ways of thinking about what a PSA can be.
How to capture the date / time information that is received from different international locations, across different time zones, is a challenge that comes up from time to time. Recently I was involved in some conversations about this again, which prompted me to capture this once and for all and share this here. As outlined in the pattern for Data Mart delivery you should be able to deliver information according to the timeline that the...
Architecture / Data Vault / General
by Ravos · Published January 4, 2018 · Last modified May 13, 2019
What value do we get from having an intermediate hyper-normalised layer? Let me start by stating that a Data Warehouse is a necessary evil at the best of times. In the ideal world, there would be no need for it, as optimal governance and near real-time multidirectional data harmonisation would have created an environment where it is easy to retrieve information without any ambiguity across systems (including its history of changes). Ideally, we would not...
More
Data Vault Meetup - Germany (June 10, 2024)