Moving data from one SQL Server to another is a fairly trivial task these days. You have many options – backup/restore the database, SSIS, detach/copy/attach, ADO.Net, BCP, blah blah blah. Moving data to SQL Azure is more challenging. Here is my experience and some details of the solution.
The Scenario
The application we are creating has a master application database and database per ‘client’. The per client database will be partitioned into many smaller databases – each being a replica of the source database with a filter on one or two tables. The creation of the partitions will require the client application to be shut down for a period of time – not ideal but acceptable.
SQL Azure does not (yet) include any support for partitioning or sharding so we knew we would have to create the solution ourselves.
Options
We wanted to have a flexible partitioning solution so from the outset I thought that SSIS would provide a mechanism that would allow the client to create their own partition rules as required. When I started creating the solution BCP was not available.
Trials and Tribulations
Before partitioning of the database could be performed it was necessary to move the source database to SQL Azure. Before moving the database it’s necessary to have a script that will create the empty database schema. When I first did this there where no tools so I used the database tools in VSTS to import from the local database and then manually tweaked the script to make it compatible with SQL Azure. These days there is the Migration Wizard on CodePlex which does pretty good (but not perfect) job of migrating your schema and optionally data.
Problem 0
Executing the schema script is very very slow. Locally it takes seconds. In SQL Azure it can take up to fifteen minutes. We have tested this from several connections (inside and outside our network and with different service providers) and it’s never been less than eight minutes. Our contacts at Microsoft can run the same script in three minutes. I have some ideas why this is slow and a possible solution… more on this later if I solve it… but for now this is a big problem – 15 x 42 (the number of partitions) = way too long!
It’s been a long time since I’ve had to do anything with SQL Server beyond creating databases for SharePoint or EPiServer and SSIS was completely new to me. However, I managed to piece together a package that copied the tables from one database to another. This package called other packages – 1 for each table – that would truncate the destination table and copy the records.
Problem 1
Because there we 30 or so packages I used xml package configuration files to specify the database connections so that I could change the source and destination in one place. When running the packages in Visual Studio this would constantly remind me that the connection had changed (when it hadn’t) so I had to turn this off. Not a big problem but annoying.
Problem 2
Running each separate table copy worked fine most of the time but running all of the packages in sequence (see problem 3) would fail in different places for various reasons. Sometimes I would get duplicate key errors – which was theoretically impossible.
Problem 3
All but two of the tables have identity columns. To move the data I had to SET IDENTITY_INSERT ON for the destination table. OLE DB Destination tasks in the dataflow let you specify identity inserts but these don't work in SQL Azure – you must currently use ADO.Net source and destinations tasks.
You can only have a single SET IDENTITY_INSERT ON per database so this means I had to run all these identity table packages sequentially which makes the whole process very slow.
Problem 4
Reliability. SSIS in Visual Studio seems to me to be very unreliable and flaky. I was plagued with performance issues and crashes.
In the end, frustration got the better of me and I gave up on SSIS. Thankfully, BCP for SQL Azure became available last week so that provided an option with more control and visibility. ADO.Net includes a SQLBulkCopy class. This will let you move data from many different sources to a database table. It provides a few options to control the behaviour of the operation and this seemed like the best option – the last time I used command line BCP was back in ‘95 with SQL Server 6.5 I think – or it might have been Sybase (which in those days was pretty much the same thing).
I whipped up a simple unit test and some classes to test SQLBulkCopy and it didn’t take long to realise that this wasn’t going to work either.
Problem 5
I kept getting errors about not having SET XACT_ABORT set ON. Binging for this led me down the DTC path. This is the same path that leads to hell! I couldn’t understand why DTC was getting involved because I had not invoked any transactions. I thought that BCP or my source data reader query was doing something funky so I just added the following SQLCommand query before I started the bulk copy:
SqlCommand xact = new SqlCommand("set xact_abort on", dst);
xact.ExecuteNonQuery();
This seemed to work, for a while.
(It turned out that the base class for my unit test was declaring a TransactionScope that wrapped everything in my unit test. I didn’t spot this until I have up with SQLBulkCopy so it may well be that this is not necessary now).
Problems 6, 7 & 8
I’m not exactly sure when I decided to give up on SQLBulkCopy but it was somewhere around the problems with timeouts, lockups and random results.
My options were now reduced to a short list of one – command line BCP. After a few false starts (I don’t read instructions as well as I could) I managed to get something working quite well.
Success! Moistly…
I created a batch file to test the various options with BCP. It’s really pretty simple to use but I wanted the simplest possible solution that would be easy to support and maintain. Using the native option (-n) provided this but it does mean that should and destination tables need to be almost identical. When creating the schema script I had re-ordered the columns on one table. This was enough to cause a problem for native mode which generated some cryptic errors. Thankfully the error file (-e option) provided enough clues to figure out what was wrong.
Throttling
SQL Azure will drop connections when/if there is a heavy load on a database. This it to enable high availability – which sounds silly – how can it be highly available if the connections keep dropping? – but in reality this just means you need to be prepared for a connection to drop and transactions to fail if the load on the database / server is exceeded.
Determining what the threshold is for throttling of the connections is difficult. So difficult in fact that one might say impossible. Throttling is triggered by activity by all users of a server. The server is also virtual (not as in a virtual machine but as in a virtual connection to one or more virtual machines).
My BCP commands on one table (the largest) would consistently cause the throttling to kick in. This table has about 230,000 rows – not huge by and stretch of the imagination. Thankfully, BCP allows to to specify the batch size and the first and last rows to use. For this table I found that 10,000 rows per batch worked most of the time and executing BCP on chucks of 50,000 records seems to have stopped the throttling. Using the first and last row options also provided the opportunity to retry a chuck of records should throttling kill the connection.
The final (so far) code for running command line BCP in a mode that SQL Azure can handle looks like this. This code is far from complete but should be sufficient to explain the process. Please feel free to adopt and enhance as you see fit.
private void copyTableWithCmdLineBCP(string sourceServer, string sourceDatabase,
string sourceTable, string sourceRule, string sourceUserName, string sourcePassword,
string destinationServer, string destinationDatabase, string destinationTable, string destinationUserName,
string destinationPassword, int rowsToCopy)
{
string ofile = Path.GetTempFileName();
string sourceQ = getSourceQuery(sourceDatabase + ".dbo." + sourceTable, sourceRule);
string countQ = getCountQuery(sourceDatabase + ".dbo." + sourceTable, sourceRule);
string args;
// export to a temp file
if (sourceUserName == null)
{
// no user name so use trusted security
args = string.Format("\"{0}\" queryout \"{1}\" -S {2} -T -n -E", sourceQ, ofile, sourceServer);
}
else
{
args = string.Format("\"{0}\" queryout \"{1}\" -S {2} -U {3} -P {4} -n -E",
sourceQ, ofile, sourceServer, sourceUserName, sourcePassword);
}
execProcessAndCheck(Properties.Settings.Default.BCPCMDPath, args, "Bulk copy export failed for table " +
sourceTable);
// import the file one chuck at a time
for (int startRow = 1; startRow <= rowsToCopy; startRow += MAX_ROWS_PER_BATCH)
{
int tries = 0;
if (destinationUserName == null)
{
// no user name so use trusted security
args = string.Format("{0}.dbo.{1} in {2} -S {3} -T -n -b 10000 -E -F {4}",
destinationDatabase, destinationTable, ofile, destinationServer, startRow);
}
else
{
args = string.Format("{0}.dbo.{1} in {2} -S {3} -U {4} -P {5} -n -b 10000 -E -F {6}",
destinationDatabase, destinationTable, ofile, destinationServer, destinationUserName,
destinationPassword, startRow);
}
int endRow = startRow + MAX_ROWS_PER_BATCH - 1;
if (endRow < rowsToCopy)
{
args += " -L " + endRow.ToString();
}
// execute with retries
while (!execProcessAndCheck(Properties.Settings.Default.BCPCMDPath, args,
"Bulk copy import failed for table " + destinationTable) && ++tries <= MAX_RETRIES)
{
Logger.WriteLog(LogCategory.Error, "Operation failed due to connection failure. Will retry.");
Thread.Sleep(WAIT_BETWEEN_RETRIES);
}
}
}
This uses a couple of helper methods. getSourceQuery and getCountQuery simply return queries for a specified table name and ‘rule’ The rule is a string which defines how to partition the table. A switch statement is used to determine the WHERE clause for a table based on the table name and rule value.
execProcessAndCheck executes a command line with arguments and captures the output.
private bool execProcessAndCheck(string cmdLine, string args, string errorMessage)
{
_processOutput = new StringBuilder();
ProcessStartInfo psi = new ProcessStartInfo(cmdLine, args);
psi.RedirectStandardError = true;
psi.RedirectStandardOutput = true;
psi.UseShellExecute = false;
psi.CreateNoWindow = true;
Process prs = new Process();
prs.OutputDataReceived += new DataReceivedEventHandler(prs_OutputDataReceived);
prs.StartInfo = psi;
prs.Start();
prs.BeginOutputReadLine();
if (prs != null)
{
prs.WaitForExit(MAX_PROCESS_TIME);
if (!prs.HasExited)
{
prs.Kill();
throw new Exception(errorMessage + " - command has exceeded the maximum allowable time and was exterminated.");
}
}
return checkProcessResults(prs, errorMessage);
}
The output is captured by a simple event handler:
void prs_OutputDataReceived(object sender, DataReceivedEventArgs e)
{
if (!string.IsNullOrEmpty(e.Data))
_processOutput.AppendLine(e.Data);
}
_processOutput is a class scoped StringBuilder.
The results from the process are checked by checkProcessResults:
private bool checkProcessResults(Process prs, string message)
{
if (prs.ExitCode != 0)
{
// look for error 10054 - this means that SQL Azure has throttled and we should retry
string output = _processOutput.ToString();
if (output.Contains("NativeError = 10054"))
{
return false;
}
message += ": Error Code = " + prs.ExitCode.ToString() + " : Message = " + output;
throw new Exception(message);
}
return true;
}
This is all very specify to the way BCP.exe and SQLCMD.exe work – if you want to use this for other command line tools then some tweaking will no doubt be required.
Summary
SQL Azure is an extremely powerful platform - despite the issues described above. It will definitely get better as the product gets closer to release. Most of my problems are caused by my recent inexperience with SQL tools and the general nature of an unfinished product. Learning how to do this stuff now will provide a better understanding of the realities of cloud based computing as opposed to hosting a SQL Server on some internet connected remote server.
Performance of the database creation, upload and partitioning could be enhanced in several ways. Threading could help enormously when creating the partition databases but throttling may reduce any benefit when it comes to uploading the data. Using multiple SQL Azure servers would also help a little. Moving the functionality into the cloud could be possible if SQLBulkCopy would work betterer and if I had a mechanism for creating the database schema other than SQLCmd.exe.
All of these options are possible given more time – but PDC09 is our hard deadline - so what I have so far will hopefully be good enough for a wicked kick-arse sexed-up demo of what is possible with this awesome platform! I’ll talk more about this application when I can… :)