Category: Architecture

Designing reference tables for the Data Warehouse 1

Designing reference tables for the Data Warehouse

In a typical Data Warehouse it is common to introduce additional descriptive information that is not provided by the operational systems feeding data into the Data Warehouse. However, the exact positioning and implementation of this reference data can cause confusion. Especially when applying this concept in the Integration Layer. Reference data is additional contextual or descriptive information that is not provided by the source system. Examples are descriptions for industry standard codes. This information can...

 
Implementing User Managed data (User Managed Staging Tables) 0

Implementing User Managed data (User Managed Staging Tables)

Related to the handling of reference data it is sometimes required to feed information into the Data Warehouse that does not have a formal source in the organisation’s information landscape. This information can be needed in the form of reference data (additional information about attributes provided by a source), relationships or really anything that is required to ultimately meet reporting requirements. It can be a vital element to ‘glue’ information together or to provide details...

 
Kimball versus Inmon: a peace offer? 5

Kimball versus Inmon: a peace offer?

Lately there were some interesting updates in the ever-existing ‘Kimball versus Inmon’ discussion. This this Bill Inmon wrote an article expressing his views on these things: http://www.b-eye-network.com/view/14115. A quote here sums it up for me: ‘If you want a tactical solution, then Kimball is the way to go. If you want a strategic solution, then Inmon is the way to go.’ All in all I think the Kimball approach is a good one if the goal is a stand...

 
0

Why a separate surrogate key table?

A question I often (and recently) get is why you should create a separate table to manage the surrogate keys. This is compared to star- or snowflaked models where the surrogate key (and distribution) is part of the table. For Data Vault the Hub entity is the place where the keys are managed. There is rationale for splitting this process into separate tables: A surrogate key table makes it easy to identify the records in...

 
Types of errors 0

Types of errors

There are two main categories of errors: technical errors and business rule errors (functional). Both types of errors should be classified separately because both provide different information about the nature of the error. A few examples are listed here:  –          Technical errors: invalid data type conversion, duplicate insert, unable to delete or move files or missing primary or foreign keys –          Functional errors: ‘the sum of attribute A and B does not equal C’ or...

 
Ideas for general error handling, why need error tables? 3

Ideas for general error handling, why need error tables?

The possible scenarios regarding error and exception handling are limited. You can either: –          Detect an error, stop the process and present the error code. –          Detect an error and write the record in an error table with the corresponding  code. –          Detect an error and write the record in both the target and the error table with the error code. –          Detect an error, flag the record but write it to the DWH table...