Why you need an Enterprise BI framework
Over the last few months I have done a number of pre-sales presentations regarding a framework for designing and developing data integration programs. While the framework in question is a smaller scale (mainly ETL) I did encounter numerous discussion why such a framework is necessary in the first place. What would be achieved by using it? Because this blog serves as my own personal framework and collection of best practices I thought it would be a good idea to put my thoughts on this in writing. I found a great quote in one of the academical ETL papers my employer send to me:
‘If we want better performance we can buy better hardware, unfortunately we cannot buy a more maintainable or more reliable system’.
My answer to the question would then be… The design and implementation of data integration and ETL is largely a labour-intensive (manual) activity and typically consumes large fractions of the effort in data warehousing projects. Over time, when requirements change and enterprises become more data-driven, the EBI architecture faces challenges in the complexity, consistency and flexibility in the design (and maintenance) of the data integration flows. These changes can include changes in loading frequency and grain (latency), more sources or the introduction of (more) parallel processing to a previously rigid serial pipeline. All of this occurs when datawarehouseing and BI will become more and more mission critical and its information is integrated into the operational decision making process. Using a thought out and flexible approach to data intergration will meet these challenges by providing structure, flexibility and scalability for the design of data integration flows. Working in a more structured and consistent environment will greatly reduce the manual effort involved. This in turn will lead to improved delivery times and consistency.
Today’s EBI architecture is designed to store structured data for strategic decision making, where a small number of (expert) users analyse (historical) data and reports. Data is typically periodically extracted, cleansed, integrated and transformed in a data warehouse from a heterogeneous set of sources. The focus for ETL has been on correct functionality and adequate performance, but this focus misses key elements that are equally important for success. These elements, like the consistency, degree of atomicity, ability to rerun, scalability and robustness, are addressed by using the EBI framework.
Future ETL solutions for instance, should be able to cater for sending back cleansed or interpreted data to the operational systems. This allows the datawarehouse to grow into a more central and mission critical role. They should be able to cope with unstructured data next to the structured data and be able to fall back on a clean data source. To be ready for future changes, the next generation data integration designs must support a methodology which set the foundation for a flexible approach to data integration design. Without a structured approach to data integration design, the solution will risk becoming the spaghetti of code and rules that it was meant to replace in the first place. An EBI framework provides a structured approach to data modelling and data integration design for an easy, flexible and affordable development cycle. By setting the stage with document and mapping templates, design decisions and built-in error handling and process control the framework will provide the consistency and structure for future-proof ETL on any platform.