ETL generation using SSIS (baby steps)
Following last week’s discussions on the metadata models I’ve put the last prerequisites in the ETL generation tool and had a more detailed look at actually generating ETL for SSIS based on the available metadata. The ultimate goal is to generate ETL for a variety of ETL tools which have some sort of generation capability (Informatica, Data Stage, SSIS and ODI/OWB) but I’ve picked SSIS to start with creating similar results as demonstrated in the earlier OWB proof of concepts.
The ETL generation uses the data model and constraints to generate a Data Vault integration layer and optional staging and history areas. Last week I’ve made some small gains in getting SSIS packages created via .Net from what is essentially a completely separate web application. The ETL generation tool itself is developed using Visual Studio 2010 and VB .Net and the trick was to get access to the SSIS APIs through this generic Visual Studio development suite, as opposed to SQL Server’s own Visual Studio development tool for SSIS (Business Intelligence Development Studio).
Packages are aimed to be developed for SQL Server 2008.
Below is the process of creating SSIS elements separate from the BIVS. First the references to the following assemblies will need to be added to the Visual Studio project:
These DLL files can be found in the client installation of SQL Server 2008. After that the following code is all that is required to generate a dummy package and a sample dataflow task. I’ve chosen to export the result as a dtsx package file, but you can also save it to SQL Server directly. This is a small step, but it can be used to gradually add more components which are derived from the available metadata. And of course it gives you the complete flexibility in deciding when elements are added to the package.
The code is as follows:
'Import assemblies Imports Microsoft.SqlServer.Dts.Runtime Imports Microsoft.SqlServer.Dts.Pipeline.Wrapper Imports Microsoft.SqlServer.Dts.Pipeline Protected Sub Generate_SSIS() Dim package As Microsoft.SqlServer.Dts.Runtime.Package package = New Package package.Name = "MyCreatedPackage" Dim testapp As Microsoft.SqlServer.Dts.Runtime.Application testapp = New Application 'Sources and targets (destinations in dataflow) Dim oledbSource As IDTSComponentMetaData100 Dim oledbDestination As IDTSComponentMetaData100 'Add connection Dim conMgr As ConnectionManager = package.Connections.Add("OLEDB") conMgr.Name = "OLEDBConnection" conMgr.ConnectionString = "Provider=SQLOLEDB.1;Integrated Security=SSPI;Persist Security Info=False;Initial Catalog=<database>k;Data Source=<server name>;Auto Translate=False;" 'Add dataflow objects Dim th As TaskHost = package.Executables.Add("DTS.Pipeline") th.Name = "DataFlow" Dim dataFlow As MainPipe = th.InnerObject 'Source component info oledbSource = dataFlow.ComponentMetaDataCollection.New() oledbSource.ComponentClassID = "DTSAdapter.OLEDBSource" oledbSource.Name = "OLEDBSource" 'Generate and output oledbSource.Instantiate() oledbSource.RuntimeConnectionCollection.New() Dim currentDirectory As String = "C:\ETLoutput" testapp.SaveToXml(currentDirectory + "\test.dtsx", package, Nothing) End Sub