World Wide Data Vault Consortium key takeaways
Last week I attended the second iteration of the World Wide Data Vault Consortium (WWDVC) as hosted by Dan Linstedt in his home state Vermont. It was great to experience the uptake in Data Vault, going from a small group of practitioners last year to a bigger group with lots of new faces this year.
Especially engaging was a day prior to the conference of in-depth discussions about various use-cases and technical solutions and improvements related to implementation and design.
My personal key takeaways are:
- Adoption of virtualisation, at least in the Data Mart / Information Mart area, is commonly adopted. This means the Presentation Layer is a view (or series of views) instead of a physical table populated by ETL. It makes so much sense!
- From Scott Ambler – a lot of ETL to implement ‘business rules’ really means something is wrong in the data and / or data architecture / governance. Instead of fixing things in ETL, why not fix things in the source? Keep things simple? At least some effort should be directed towards this higher goal.
- This ties in with a required greater focus on data quality. Is data really treated as a corporate asset? Is sufficient testing in place?
Design / implementation focused:
- Zero records, or ‘ghost records’ in the Satellite can also be implemented in a Point In Time (PIT) table. Or, of course, a dedicated time sliced Satellite. Usually zero records are implemented in each Satellite to create an end-to-end timeline for each record. This means that every instance of a business key in the Satellite will at least get an additional ghost record – increasing the volume and processing effort. With this new approach you only build full time periods for the information (business keys) that are required, saving some storage and improving performance. And, in many case the current state of information (‘Type 1 SCD’) is sufficient anyway. Not loading zero records into the Satellites is the new standard, but it goes without saying this is just another option to the array of design / architecture decisions to choose from
- Point In Time (PIT) tables can now be part of the ‘business’ Data Vault which means you can store all kinds of information in there.
Specifically many thanks for Dirk Lerner and Christian Haedrich on some great explanations for time series and developments in this area. And Kent Graziano for providing improvements in the virtualization logic (LEAD instead of RANK). I presented the approach to virtualize not only the Data Marts but also the complete Data vault, hopefully it was entertaining enough.
Thanks Dan for organising a great get together!