Copy Data Dynamically with BimlScript

Written by Varigence Blog on 5.31.2011

TAGS: BimlBISSIS

Share

This post is part 3 of a series on using Biml in BIDS Helper. This post builds on some of the information and the sample from the previous posts.

BimlScript enables some interesting scenarios for generating large numbers of SSIS packages automatically. This can come in handy when you need to copy data most or all of the data in one database to a different one. In this case, you could use something like theTransfer SQL Server Objects task, but it has a few problems. You can roll your own, but that might mean a fair amount of custom scripting. Or you could use the Import / Export Wizard. But in all these cases, you don’t have complete control of how the packages are produced. You could create all the packages by hand, which does give you full control, but then you are stuck doing a lot of repetitive work in SSIS.

BimlScript provides an alternative that lets you fully control the output, while automating the rote work of producing lots of packages that use the same pattern. Let’s take a look at a sample of this, using the scenario above (copying the data from one database to another).

<#@ template hostspecific="true"#>
<#@ import namespace="System.Data" #>
       
<Biml xmlns="http://schemas.varigence.com/biml.xsd">
      <Connections>
            <OleDbConnection ConnectionString="Provider=SQLNCLI10;Server=.;Initial Catalog=AdventureWorksDW2008R2;Integrated Security=SSPI;"/>
            <OleDbConnection ConnectionString="Provider=SQLNCLI10;Server=.;Initial Catalog=Target;Integrated Security=SSPI;"/>
      </Connections>
      <Packages>
            <# 
                string metadataConnectionString = "Provider=SQLNCLI10;Server=.;Initial Catalog=AdventureWorksDW2008R2;Integrated Security=SSPI;";
                DataTable tables = ExternalDataAccess.GetDataTable(metadataConnectionString, 
                    "SELECT '[' + s.name + '].[' + t.name + ']' FROM sys.tables t INNER JOIN sys.schemas s on t.schema_id = s.schema_id");
                foreach (DataRow row in tables.Rows)
                { #>
            <Package ConstraintMode="Linear" AutoCreateConfigurationsType="None">
                  <Tasks>
                        <Dataflow> 
                              <Transformations>
                                    <OleDbSource ConnectionName="Source">
                                          <DirectInput>SELECT * FROM <#=row[0]#></DirectInput>
                                    </OleDbSource>
                                    <OleDbDestination ConnectionName="Target">
                                          <ExternalTableOutput Table="<#=row[0]#>"/>
                                    </OleDbDestination>
                              </Transformations>
                        </Dataflow>
                  </Tasks>
            </Package>
                <# } #>
      </Packages>
</Biml>

This script is set up to copy all the data in the AdventureWorksDW2008R2 database to a second database named Target (very inventive, I know). One note – the script is not creating the tables in the target database. We could actually automate that portion as well, but it’s beyond the scope of this post. To ensure you are set up properly to run this script, you should create an exact structural copy of your source database under a different name. You can use the Generate Scripts Wizard to do this. Just script the entire database, and then update the generated script to use a different database name (don’t forget to change the USE statement to the new name).

The script will produce a package per table, with a simple data flow that copies all the data using an OLE DB Source and OLE DB Destination. The script leverages the metadata already contained in the database, in the sys.tables view, to drive the loop that creates the packages.

What if you don’t want to select all the rows from each table? Instead, perhaps you want to specify a WHERE clause to use to filter some of the tables. To handle this, we can create a table in the target database that holds our WHERE information.

<Biml xmlns="http://schemas.varigence.com/biml.xsd">
    <Connections>
        <OleDbConnection ConnectionString="Provider=SQLNCLI10;Server=.;Initial Catalog=Target;Integrated Security=SSPI;"/>
    </Connections>
    <Tables>
        <Table ConnectionName="Target">
            <Columns>
                <Column DataType="String" Length="255"/>
                <Column DataType="String" Length="4000"/>
            </Columns>
        </Table>
    </Tables>
</Biml>

You can use the steps shown in Part 2 of this series to create this table in the Target database. Once it’s been created, populate it with some data. Note that since we are using the schema-qualified name of the table, you’ll need to specify that in the table. There’s an example of data for this table that will work with AdventureWorksDW2008R2 below. This will filter the rows down to only sales where the amount is greater than 1000.

TableName   SelectSql
[dbo].[FactInternetSales]   
WHERE [SalesAmount] >= 1000
[dbo].[FactResellerSales]   
WHERE [SalesAmount] >= 1000

Now we need to alter the script to use the new information in this table. At the beginning of the block of script after the element, add the following code:

string targetConnectionString = "Provider=SQLNCLI10;Server=.;Initial Catalog=Target;Integrated Security=SSPI;";
DataTable whereClauses = ExternalDataAccess.GetDataTable(targetConnectionString, "SELECT TableName, WhereSql FROM WhereClause");

This retrieves the WHERE clauses from the WhereClause table, and stores them in the whereClauses variable.

Next, replace the line in the OleDbSource with this:

<# 
  var dataRow = whereClauses.Select(string.Format("TableName = '{0}'", row[0]));
  string whereSql = dataRow.Length == 0 ? string.Empty : dataRow[0][1].ToString();    
  string sql = string.Format("SELECT * FROM {0} {1}", row[0], whereSql);
#>
<DirectInput><#=sql#></DirectInput>

This code determines whether the whereClauses table has a row for the current table. If it does, it appends it to the end of the SELECT statement. The complete, final script looks like this:

<#@ template hostspecific="true"#>
<#@ import namespace="System.Data" #>
       
<Biml xmlns="http://schemas.varigence.com/biml.xsd">
      <Connections>
            <OleDbConnection ConnectionString="Provider=SQLNCLI10;Server=.;Initial Catalog=AdventureWorksDW2008R2;Integrated Security=SSPI;"/>
            <OleDbConnection ConnectionString="Provider=SQLNCLI10;Server=.;Initial Catalog=Target;Integrated Security=SSPI;"/>
      </Connections>
      <Packages>
            <# 
                string targetConnectionString = "Provider=SQLNCLI10;Server=.;Initial Catalog=Target;Integrated Security=SSPI;";
                DataTable whereClauses = ExternalDataAccess.GetDataTable(targetConnectionString, "SELECT TableName, WhereSql FROM WhereClause");
                 
                string metadataConnectionString = "Provider=SQLNCLI10;Server=.;Initial Catalog=AdventureWorksDW2008R2;Integrated Security=SSPI;";
                DataTable tables = ExternalDataAccess.GetDataTable(metadataConnectionString, 
                    "SELECT '[' + s.name + '].[' + t.name + ']' FROM sys.tables t INNER JOIN sys.schemas s on t.schema_id = s.schema_id");
                foreach (DataRow row in tables.Rows)
                { #>
            <Package ConstraintMode="Linear" AutoCreateConfigurationsType="None">
                  <Tasks>
                        <Dataflow> 
                              <Transformations>
                                    <OleDbSource ConnectionName="Source">
                                        <# 
                                            var dataRow = whereClauses.Select(string.Format("TableName = '{0}'", row[0]));
                                            string whereSql = dataRow.Length == 0 ? string.Empty : dataRow[0][1].ToString();    
                                            string sql = string.Format("SELECT * FROM {0} {1}", row[0], whereSql);
                                        #>
                                          <DirectInput><#=sql#></DirectInput>
                                    </OleDbSource>
                                    <OleDbDestination ConnectionName="Target">
                                          <ExternalTableOutput Table="<#=row[0]#>"/>
                                    </OleDbDestination>
                              </Transformations>
                        </Dataflow>
                  </Tasks>
            </Package>
                <# } #>
      </Packages>
</Biml>

You can see the results of this script by right-clicking on the Biml file, and choosing Expand. It may take a minute or two to process, but when it finishes, you should see a package for each table in your source database. The data flows will copy the data from Source to Target, and any WHERE clauses you add to the WhereClause table will be used.

There’s a lot more that could be done with this script (automating the recreation of the tables in the destination, or deleting existing data, for example), but it’s still a good example of what BimlScript can do. Instead of spending your time writing 10s or 100s of repetitive packages, automate it with BimlScript.

[cross-posted from http://agilebi.com/jwelch/2011/05/31/copy-data-dynamically-with-bimlscript/]

Comments