Data Warehouse versioning… for virtualisation
Recent discussions around Data Warehouse virtualisation made me realise I forgot to post one of the important requirements: version control. In the various recent presentations this was discussed at length but somehow it didn’t make it to the transcript.
Data Warehouse virtualisation needs versioning. Think of it this way – if you can drop and refactor your Data Warehouse based on (the changes in your) metadata then your upstream reports and analytics are very likely to not only change structurally but also report different numbers. To remain auditable, you need to make sure your metadata is versioned so you can ‘roll out’ your Data Warehouse in versions that correspond to Business Intelligence or analytics output.
Managing end-to-end version control which ties the model, the ETL and the output together has been in place in some environments but when you are able to deliver a new ‘view’ with the speed of a click this becomes a mandatory requirement.
Concepts can be borrowed from other areas such as message formats / canonicals or SOA processes. These architectures struggle with the same concepts. Most of these environments are able to allow at least a prior version active, allowing all subscribers some time to become compatible with the newer version. In any case, it is important to be able to ‘go back’ to previous versions in your virtualized Data Warehouse if numbers need explaining and this all comes down to metadata version control.
In any case, I added this to the overall story.