Denormalise (flatten) XML using Xquery

ETL software have ways to interpret XML files or structures that vary in maturity, but recently I had to find another way to import XML structures that are received as blobs using (transact) SQL only. Because the upstream patterns (Staging to Integration) can easily handle denormalised source data sets the easiest way was to cast each blob as the XML file it is anyway and completely denormalise it (‘flatten’, more rows…). This is opposed to normalising each segment into a separate table (either in memory or physical) and (optionally) joining them back together.

There is not that much out there on the internet to accomplish this, so I thought I’d post an example here. I’ve tried this with fairly large XMLs and it works for me as the data delta is not too big – so the performance is fit for purpose. In this example a sample policy file is denormalised from a selected starting point. Essentially once the root node is defined every segment that needs to be included is added using an OUTER APPLY function. This is needed to return records for empty leaf nodes. In a given XML file you can basically navigate to your starting (root) node and denormalise from there.

declare @XML xml
set @XML ='
   <plan_name>CANCELLATION COVER - INTERNATIONAL</plan_name>
   <internal_name>CANCELLATION - SINGLE</internal_name>
   <plan_name>COMPREHENSIVE - INTERNATIONAL</plan_name>
   <internal_name>PLAN A - COMPREHENSIVE</internal_name>
   --segment1.premiums.value('local-name(.)','varchar(100)') as [RootTagName],
   segment2.premium.value('plan_id[1]','varchar(100)') as [PlanID],
   segment2.premium.value('plan_name[1]','varchar(100)') as [PlanName], 
   segment2.premium.value('internal_name[1]','varchar(100)') as [InternalName], 
   segment2.premium.value('cover_level_brief[1]','varchar(100)') as [CoverLevel], 
   segment2.premium.value('cover_type_brief[1]','varchar(100)') as [CoverType], 
   segment2.premium.value('value[1]','numeric(38,2)') as [Value]
 @XML.nodes ('premiums') segment1 (premiums) -- to set the root node
 OUTER APPLY premiums.nodes ('premium') segment2 (premium) -- needed for empty leaf nodes
--Add deeper segments here…

You can basically add as many segments as you want, either in one go (as above) or as separate segments (with varying levels) and join them in a CTE or something.

This example uses a variable to select from, but if you load from a XML stored as text or the XML datatype you can select from the table as well. In this case you need to start with a CROSS APPLY to get all the nodes you want. The FROM clause will then look like this:

   POLICY_ID -- for instance, 
 FROM <table>
) sub
 CROSS APPLY XML_OUTPUT.nodes ('premiums') segment1 (premiums) -- to set the root node
 OUTER APPLY premiums.nodes ('premium') segment2 (premium) -- needed for empty leaf nodes
--Add deeper segments here…

The root node (segment1) in this example needs to be specified completely. In the above example premiums is in fact the top-level node, but if you want to extract data with a deeper node as starting node the full path needs to be provided. E.g. if I want to start collecting data from the ‘bulk premiums’ node onward the segment 1 node needs to be fully pathed as for instance ‘policy_response/calculated_multi_values/bulk_premiums’.

In both cases the output needs to be declared so you need to know what information you want to retrieve from your XML and what the structure is. A screenshot from the results is posted below.

XML output

Roelant Vos

Roelant Vos

You may also like...

1 Response

  1. April 10, 2015

    […] Bruce and Glenn which made me think some additional information and considerations about the recent post to parse XML using XQuery would be a good […]


Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.