Kimball versus Inmon: a peace offer?
Lately there were some interesting updates in the ever-existing ‘Kimball versus Inmon’ discussion. This this Bill Inmon wrote an article expressing his views on these things: http://www.b-eye-network.com/view/14115. A quote here sums it up for me: ‘If you want a tactical solution, then Kimball is the way to go. If you want a strategic solution, then Inmon is the way to go.’ All in all I think the Kimball approach is a good one if the goal is a stand alone lightweight DWH or datamart, if there are no big plans to expand other than replace existing source systems it is a quick and effective option. If there is a big roadmap in time this model will be hard to maintain and support and the EDW approach will be better.
I like to think that by using Data Vault as the data modelling technique for the EDW removes some of the objections for creating an EDW for smaller implementations like the tactical, standalone datamart. A large part of the counter arguments for creating an EDW straight away is based on the fact that a lot of work is done (= creating an extra layer in the Data Warehouse) for little apparent value.
This can be omitted by using the design / architecture patterns and automating them using ETL generation. Following the same rules and guidelines (ETL framework) the EDW structure can be set up automatically, including data models and ETL mappings.
The Kimball methodology does not exactly specify how a staging area should be constructed but only that everything has to be done there, before the data is forwarded to the star schemas. So you will be perfectly compliant by pitching a Data Vault based EDW as the Kimball staging ‘layer’. Kimball refers to the staging area as ‘everything between the source and presentation layer’… Pending smarter tooling and more precise framework / algorithms the 3NF approach would still be a costly exercise which I think is best avoided currently. 3NF has definite advantages in terms of data storage (Data Vault ‘costs more space’) but requires active data modelling and is therefore harder to generate (presently).
By combining these concepts this way, you can still have a fast and sustainable Data Warehouse which uses the best of all worlds:
You still have your star or snowflake schemas
You have an automated and flexible ‘data staging area’ in line with the Kimball approach
You also have a full EDW following Data Vault rules
You also have a single source of information and history tracking which enables you to regenerate data marts if required
- Business rules are still calculated ‘on the way out’ from the EDW perspective and ‘into the star schemas’ from the Kimball perspective
Will this be the bridge which brings the two major approaches in harmony?