Error handling, updated for the reference architecture
Over the last couple of weeks I’ve posted some general ideas about error handling, and preferably how to do it as less as possible. Today I would like to go into more detail on that with the reference architecture in mind. One of the earlier posts was about the idea not to use error tables at all if possible and I still favor that idea. However there are occasions where you just can’t get around that. There are some underlying assumptions on this however, all of which are described in the reference EBI architecture. One of these statements is that you should load as much as possible into the integration layer to be flexible enough to present the data any which way. Loading as much as possible does conflict rejecting records so it’s time for an overview of how error handling, rejection and recycling would work in combination with the described reference EBI architecture.
Reject tables are only really mandatory in the staging layer when loading data from dodgy sources of which you can’t depend on receiving the correct datatypes. For instance, if the source is a manually updated Excel sheet or text file the process from source to staging should include a datatype conversion. If this conversion fails (32nd of January?), the faulty record should be rejected and put in a recycling table. All other records can continue for further processing into the integration layer. Don’t forget to notify your users!
In the integration layer there is no need for any error tables using the provided EBI architecture approach. Timing issues aside there is no reason to reject a record because the main goal is to store as much data as possible without delay. In a normal process the surrogate key tables like the Hubs would be loaded first, then the Satellites and Links. A design decision could be to omit this (small bit of) dependancy by having the ETL processes add surrogate keys when a lookup fails. You want to keep going and need this data anyway. This also solves timing issues like early arriving transactions or late arriving reference data. This does require additional ETL work, but this can be generate automatically as we have seen before. A word of warning; depending on the ETL tool that is used be aware of different processes inserting the same keys, but this can be solved in a number of ways.
Regarding error handling in the datamart layer: it is optional and depends on the business requirements. The datamart layer is where the error handling concepts really start to pay off. Because no records have been rejected so far you will have the option to deliver on multiple views on the data including:
- ‘Raw’ data including errors (which you can identify and report easily as an extra).
- An all-or-nothing approach where you don’t load your datamart if any error occurs. Sometimes this is a business requirement.
- A detect and reject approach where you basically use the same method as for the staging area datatype detection.
- Select data based on weighted errors. Because you don’t reject records on error in the integration layer, but rather tag them with an error bitmap, you can load your datamart for a certain level of ‘quality’. Or at least skip on some serious errors and only let the correct and mostly correct records through.
This are some of the possibilities you have when you choose not to reject data but only detect errors and implement the error bitmap concept. It works both ways: you are flexible when loading the datamarts and have the automatic tracking of history on errors and corrections of the source data. Of course some errors, especially environmental errors, can’t be avoided but when you make all your ETL rerunnable (first design decision) this won’t be a problem.