Generating SQL using templating engines

In this second post of the ‘simple ETL generation series‘ we will expand on the Hub SQL example from the first post, and work towards achieving the same result with less code and greater flexibility: using a templating engine (library).

An introduction to templating engines

Templating engines are often used for web development as a way to separate logic from the presentation (user-interface). They enable developers to (re)use standard template files and let the engine replace any variables with runtime values .

By letting the code provide the variables in data, the presentation of the data is fully managed by the template. No business rules are required for the front-end template this way (client-side templating). You just add the libraries for the selected engine to your code and ‘compile’ a template using these libraries. ‘Compile’ in this context means preparing the template to accept the input in the code.

The input data can be defined and provided as a class. You can instantiate the class with the required values (data) and send it to the templating engine to create the output using the selected template.

Examples of templating engines are Mustache, Pebble and Handlebars (as an extension of Mustache). Syntax and breadth of functionality differs somewhat between these. Also, some are more strict on the principle to separate logic and presentation. Mustache for instance will not allow control flow within the templates (i.e. if-then-else or for-each conditions) and by virtue of this considers itself ‘logic-less’ -everything is data-driven.

For the examples here I have used Handlebars, which is an extension of Mustache. The reason I have opted for Handlebars is that I am OK with having a little bit of logic in the templates. The templates don’t need to be 100% logic-less for me. Handlebars also offers more functions (expression) to use, and is also quite easy to extend this to support your own expressions.

A simple example of a Handlebars template for HTML is shown below. The variables / expressions are enclosed with double ‘swirly brackets’ i.e. {{ }} – ‘the handlebars’. Also note that a literal (@) and additional double-quotes are used to handle the quoted values.

var template = 
@"<div class=""Example"">
<h1>{{header}}</h1>
<h2>{{subheader}}</h2>

<div class=""body"">
{{body}}
</div>
</div>";

The input (called context for templating) can be as follows as a JSON object or string (shown below):

var context = new
{
header: "Hello World",
subheader: "Today is about...",
body: "Examples of using templating engines to generate ETL."
};

Compiling and running this template with the context merges both and produces the outcome where content and presentation are combined.

// Compiling the template
var templateScript = Handlebars.Compile(template);

// Applying the context to the compiled template
var result = templateScript(context)

// Displaying the results
Console.WriteLine(results);
Console.ReadKey();

With the final output of:

<div class="Example">   
<h1>Hello World</h1>
<h2>Today is about...</h2>

<div class="body">
Examples of using templating engines to generate ETL.
</div>
</div>

These examples have been created in C# using the Handlebars.net library (download it with NuGet). It is easy to create in Visual Studio from scratch, but you can download this example as a C# console application below. The project can be opened with Visual Studio (it was created in VS 2017). The file is a rar file, so you need WinRar or similar to unzip.

What does this have to do with ETL generation?

Even though the templating engines mentioned here have their origins in JavaScript / web development, the output is essentially plain text.

This means we can use this to generate (any kind of) text output, such as SQL!

Even better: in applications (code) we can capture the output that is generate and use this for various further purposes. For instance: we can use the generated output in various Software Development Kits (SDKs), Application Programmable Interfaces (APIs), 3rd party compilers (i.e. Varigence Biml) or even native markup code.

What this means is that, in principle, we can generate the required artefacts for pretty much every ETL platform with relatively little effort.

What output format makes sense depends on the individual tool, but generating parameter files (Powercenter, ODI), SQL (as we have seen in the Hub SQL cursor example), Biml files or even the XML that constitutes Powercenter mappings or SSIS packages becomes an option.

Essentially, anything with a structured format in which the ETL objects are described can be ‘templated’ once the syntax is understood.

An example of generated SQL

In the introduction into templating engines we have used a simple text (HTML) template and combined this with a context value that was essentially as string.

Now, we can take this a step further and replace the hard-coded in- and outputs with files that contain the template and the metadata. We can use a semi-structured format such as JSON or Yaml to store this information. Both formats have some easy and readily available (C#) deserialisation libraries available for easy loading into the templating engine.

We can use this to generate SQL output.

The example metadata in Yaml format is as follows:

#The list of queries to generate (sourceMetadata.yaml)

queries:
-
table_schema: dbo
table_name: CUSTOMER_PERSONAL
columns:
- column_name: CustomerId
- column_name: Customer_Name
- column_name: Phone_Number
-
table_schema: tst
table_name: OFFER_DETAILS
columns:
- column_name: OfferId
- column_name: Offer_Description
- column_name: Offer_Reference_Number
-
table_schema: dbo
table_name: ADDITIONAL_DETAILS
columns: column_name: CustomerId
- column_name: Contact_Preferences
- column_name: Demographic_Segment

Yaml is in many ways similar to JSON, but uses indentation to indicate nesting. The above sample contains the metadata for three data sources, categorised as ‘queries’. Each query object contains a schema, a name and a list of columns. Pretty much everything we need for a simple SELECT statement.

The next step is to define the structure in a class / object model to load (deserialise) the Yaml file into.

To capture the above sample metadata we can define a simple class (I have named ‘QueryList’) that defines a list of queries (to generate). The instance of QueryList is a list collection of the type ‘individual queries’.

Each Individual Query has a schema and name property, as well as a list (collection) of columns. This is shown as code below.

using System.Collections.Generic;

class QueryList
{
public List<IndividualQueries> queries { get; set; }
}

class IndividualQueries
{
public string table_schema { get; set; }
public string table_name { get; set; }
public List<ColumnListing> columns { get; set; }
}

class ColumnListing
{
public string column_name { get; set; }
public string column_data_type { get; set; }
}

To use the metadata in the code we need to load (deserialise) the Yaml file into this class, and we can do so using standard Yaml.Net libraries (these can be added via NuGet).

We can do this in one go by creating an instance of our newly defined QueryList class from the Yaml file. In other words, we deserialise the Yaml file into an instance of the QueryList class.

// Load from yaml file
 var yamlInput = File.ReadAllText(AppDomain.CurrentDomain.BaseDirectory + @"..\..\SourceMetadata.yaml");
        
var deserializer = new DeserializerBuilder().Build();
var queryOutput = deserializer.Deserialize<QueryList>(yamlInput);

You could use Visual Studio’s debugging to look at the contents, but at this stage there is a class instance of the type QueryList containing all objects as specified in the Yaml file.

This class can be applied to / merged with the template, so the next step is to define what the template looks like. We’re creating a simple SQL example template, which can look like this:

{{#each queries }}
-- Working on {{table_schema}}.{{table_name}}
SELECT
{{#each columns }}
{{column_name}}{{#unless @last}},{{/unless}}
{{/each}}
FROM {{table_schema}}.{{table_name}}
{{/each}}

This Handlebars example introduces us to a few new components, most notably the #each expression, which iterates through the objects in the ‘queries’ class (the name of the List<IndividualQueries> in the object model).

For each item in the list, the table_schema and table_name are applied to the template. Also, for each item the list of columns is added through its own #each expression.

Note that the column_name has a specific expression added to remove the trailing comma for the last column. In the template this is added as ‘unless it’s the last column, add a comma’.

Similar to the HTML example at the start of this article, the template can be compiled in code to accept the metadata. After this, the results can be displayed back to the user.

// Return the result back to the user         
var result = template(queryOutput); Console.WriteLine(result);
Console.ReadKey();

Executing this code creates the output below. Adjusting the template i.e. what you want to do with the metadata is easy now. No compiling is required anymore, the template Handlebars file can be edited and rerun at any time.

This code can be downloaded below as well. The example has been compressed with WinRar again. I needed some better compression.

It is very easy to extend this approach, for instance by adding a data type to the input, and changing the template to be a create statement. Or, for example, by changing the template into a delete from statement.

In most cases the class / object model is quite stable so it easy to provide flexibility to users by creating their own templates based on their own personal views. The metadata in most cases is not the concern.

Next up

In the planned next post for the ‘simple ETL generation series’ we will combine the templating engine concept with the Hub pattern – and see how easy it is to replicate what we have done with the SQL cursor in simple template code.

 

Roelant Vos

Ravos Business Intelligence admin

You may also like...

1 Response

  1. April 4, 2019

    […] plain SQL to automate development of the loading processes using (TEAM generated) metadata. The second post explored alternative ways generate code, by providing an introduction in code templating engines in […]

     

Leave a Reply