EAI, ESB and ETL and the ETL generation tool

It has been very quiet on the weblog / idea repository since November. This was partly due to the summer holiday period (in Australia) and the fact that I’ve finally picked up the work to develop the ETL generation tool based on the described EDW architecture. Work wise I’ve recently done a lot of strategy work with very little practical exposure to the technologies, and this is likely to continue for some time to come.

Developing this tool has a great number of prerequisites that need to be in place, not in the least my ASP.NET skills. So far the front-end has been created to collect the required information based on the OWB Data Vault demo. My first attempt is to create SSIS packages following the same development patterns. It will not be a data modelling tool, only a pure ETL generation (for a variety of tools) following the Data Vault and EDW concepts as demonstrated in earlier proof of concepts. The following weeks I’ll post the prerequisites such as the meta data model and updated design patterns.

Within the current work I’ve had a number of discussions about the use of EAI, ESB and Master Data Management concepts. In some cases comparisons between the concepts and underlying technologies are unjustly made and this is an attempt to structure my thoughts. ETL is just reading and transforming data (and putting it somewhere) and the ESB is a concept which includes a set of standards and technologies. They’re difficult to compare and I don’t think they should be put against each other. There is no merit in creating a ETL vs ESB or EAI comparison since they are complementary concepts and technologies.

The way I would state it is that ETL processes are used in various situations which includes (but not limited to):

  • receiving and processing data from a message bus
  • moving and transforming data between databases / platforms (i.e. implementing a DWH)
  • reading an application and mapping the data to the canonical format of the ESB

These are all valid cases of ETL, since it involves loading, transforming and writing of data. ETL is usually associated with Data Warehousing but I’ve seen many uses which have no relation whatsoever. For instance; I’ve seen implementations of the standard PowerCenter and OWB/script ‘ETL’ tools for the purpose of creating a service bus. The ETL tool would create the message, it is sent forward using various technologies and picked up by the ETL tool again.

A good overall definition would be: ESB is an abstraction layer on top of messaging infrastructure that uses message oriented, event driven and service oriented approaches to integrated applications using XML for data exchange.

Probably the only thing up for discussion is the XML format. While it’s still the defacto standard there are other less verbose standards around for specific purposes which can use the same concept. Additionally you can argue that in a central data integration platform does not always require a separate messaging broker since the actual load/unload is done by the data integration tool.

Regarding EAI, I’ve always thought it was an advanced integration with direct connections so data can be presented and used cross-applications because they’re essentially intertwined. In a way it’s a view over various applications. This has definite advantages, but does not solve the issue of distributing the data in a way that does not involve changing many similar interfaces when changes occur.

To make it as practical as possible:

  • ETL: implemented whenever data is transformed from one environment/system to another (this includes preparing the ESB adapter)
  • ESB: implemented to improve data distribution and definition enterprise wide, when bits of information are required by many other systems
  • EAI: implemented where there is a tight relationship between systems which requires complete up to date information available at any time. It’s the ideal point-to-point interface.

The perfect solution for any situation would require an implementation of all of the above to some degree, depending on the applications used, the requirements for data availability, the information applications use from others, latency and volumes. To give an example; in some cases an ESB would work for various situations where the same data is sent all over the place using point to point interfaces. That particular data would be a good candidate for a canonical / message model so it is distributed the same way in terms of frequency and definition. ‘ETL’ is required to create the adapter and message.

 
Roelant Vos

Roelant Vos

You may also like...

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.