Collaboration

Areas of collaboration

The following diagram provides an overview of areas that are being addressed on an ongoing basis. If you are interested feel free to reach out at roelant@ravos.nl. For now most collaborations are limited open source until we’re certain all IP has been sorted out properly.

The intent of sharing this code is to foster increased meritocracy in the BI/DWH community and generally work (together) on something that can be combined using agreed APIs. The idea is that various people / teams can chase their passion while knowing the work will fit in somewhere in the overall scope. This overview may drastically change over time, as will the composition and scope of the projects – but that’s the nature of the work.

Data Integration Framework

The Data Integration Framework provides the context of the collaboration, as well as an overview of the options & considerations across the various areas and layers.

Contents / functionality of the Github:

  • Overview and context for Enterprise Data Warehouse
  • Design Patterns (conceptual how-to’s)
  • Solution Patterns (implementation in specific technical context i.e. tooling and environments)

ETL Control Framework (DIRECT)

The Data Integration Run-time Execution Control Tool (DIRECT) is a generic execution and control framework that orchestrates the execution of ETL processes. It provides various hooks into an ETL process to manage topics such as restartability, recovery from failure, logging, ETL classification and event handling.

There are many ETL control frameworks, as they are needed in every project. Let’s make this the best one! Ideally this becomes a commodity.

  • The datamodel and sample code can be found here: http://bit.ly/2jR7xkJ
  • Documentation for the ETL Control Framework can be found here, this is a generic process control framework (happy reading!)
  • The DIRECT code and content is managed via Github here. This is a private Github for the time being, but more than happy to expand the circle of collaborators. Send me an email if interested.

Contents / functionality of the tool:

  • Runtime execution monitoring & logging
  • Disabling / enabling ETL in the control framework
  • Recovery, retries
  • Managing dependencies and parallelism
  • GUI
  • Supporting automation code (re-initialisation, zero key generation, generating process registration records)
  • Exception reports (SQL – currently integrated in Confluence)
  • SSIS, Powercenter and Oracle wrappers (SSIS fully up to date, others available). Probably have some Pebble as well.

Metadata Management (TEAM)

The Taxonomy for ETL Automation Metadata (TEAM) is a management tool for Data Vault metadata, a component also integrated in the VEDW software. It offers metadata mapping validation, data entry and visualisation. The metadata within TEAM is used to generate ETL (i.e. using Biml, SQL) using the interface / APIs.

  • The datamodel and sample code can be found here: http://bit.ly/2A609Nq
  • The API structure is available here: http://bit.ly/2kn4PqZ (WIP)
  • The TEAM code and content is managed via Github here. This is a private Github for the time being, but more than happy to expand the circle of collaborators. Send me an email if interested.

Contents / functionality of the tool:

  • Connectivity settings and configuration including opening and saving to file.
  • Metadata management (grid, mapping Source-to-Target Mapping – STM) including exporting to graph model etc.
  • Activation and validation of metadata (checking and pushing into target DV generation data model)
  • Repository creation, including standard interfaces for ETL automation
  • Physical model versioning and reverse-engineering, to enable virtualisation
  • Test data generator (also to support RI check, but that’s in the virtualisation tool)
  • Source system register (NOT DEVELOPED YET)

Virtual Enterprise Data Warehouse (VEDW)

VEDW is the virtualisation and rapid prototyping software for Data Vault that can be downloaded from this site. More information is available here.

Working on fixing it up after separating out the TEAM component. Will publish on Github when this is done.