Blog Home  Home Feed your aggregator (RSS 2.0)  
.Net Jonesie - Azure
A simple programmers blog
 
# Thursday, November 19, 2009

The worlds most expensive hair dryer!

http://jonesie.net.nz/video/M2U00303.MPG

Thursday, November 19, 2009 4:58:41 AM (New Zealand Daylight Time, UTC+13:00)  #    Comments [0]   Azure | General | PDC09  | 
# Wednesday, November 18, 2009

I’m sitting in a session at PDC 09 on SQL Azure futures.  Here’s some notes for me and anyone else who cares.

Backups and Clones

  • Backups are coming.  Can now(?) create a clone of a database.  Something like CREATE DATABASE fred AS CLONE OF bob.

Operations

  • API for provisioning – useful for SaaS providers.  Great for Multi-tenant too.
  • Better deployment and upgrades.
  • Data Sync.
  • Upgrade and downgrade options between db sizes  (1 & 10 GB).

Scale-Out

  • Woa!  TicketDirect mentioned again!
  • Dynamic database splits.  And merge back of course.
  • Additional db size options. Maybe up to 50 GB.
  • Multiple db connections.
  • Fan out query for multiple db’s.

Other

  • Support for profiler
  • Full Text Search
  • CLR
  • BI

Sensitive Data - Codename Vidalia – woosh – this went over my head.

Wednesday, November 18, 2009 12:50:52 PM (New Zealand Daylight Time, UTC+13:00)  #    Comments [0]   Azure | General | PDC09  | 

SQL Azure will throttle your connections to protect resources in the data centre.  When SQL Azure does this is non-deterministic – or at least, Microsoft cannot specifically tell you when it will do this because there are many complicated factors. It could be the number of concurrent requests, the CPU or IO usages, the weather – who knows!  In the end it doesn’t really matter – you, THE DEVELOPER, must but be aware of this and deal with it.

SQL Throttling is manifested by a dropped connection.  This may appear as a Native Error 10054 when you try to open a connection or some other error if the connection is dropped after being opened.

To cope with this throttle a retry will normally be all that is required.  At the connection and command level this is easy but it can be complicated by unstructured code.  So, having a nice clean data layer will make the solution much easier to implement, but this applies equally to ‘messy’ code :)

We reused the Retry Policy mechanism that is provided with the Storage Client.  In the old sample storage client the syntax was a lot simpler but in the interest of less confusion I wont talk about that.  The new November SDK works just as well, but you need to do a tad more work.  The following sample is a connection manager that is used by the data layer.

using System;

using System.Data;

using System.Data.SqlClient;

using Microsoft.WindowsAzure.StorageClient;

 

namespace MyApp.DataLayer

{

  public class ConnectionManager

  {

 

    /// <summary>

    /// Maximum number of retries

    /// </summary>

    private const int MAX_RETRIES = 3;

    /// <summary>

    /// The retry policy for operations that can be retried (sql etc)

    /// </summary>

 

    private RetryPolicy RetryPolicy { get; set; }

 

    /// <summary>

    /// Wait 5 seconds between retries

    /// </summary>

    private const int WAIT_BETWEEN_RETRIES = 5000;

 

 

    public ConnectionManager()

    {

      this.RetryPolicy = RetryPolicies.Retry(MAX_RETRIES, TimeSpan.FromSeconds(5));

    }

 

 

    private SqlConnection OpenConnection(string connectionString)

    {

      SqlConnection con = null;

      try

      {

        DieHard.RequestWithRetry(RetryPolicy, () =>

            {

 

              SqlConnection connection = new SqlConnection(connectionString);

              connection.Open();

              con = connection;

            });

        return con;

      }

      catch (Exception ex)

      {

        throw new Exception("Error connecting to " + connectionString, ex);

      }

    }

  }

}

The interesting part is the DieHard helper class.

using System;

using System.Collections.Generic;

using System.Linq;

using System.Text;

using Microsoft.WindowsAzure.StorageClient;

using System.Threading;

 

namespace MyApp.DataLayer

{

  public static class DieHard

  {

    public static T RequestWithRetry<T>(RetryPolicy retryPolicy, Func<T> action)

    {

      ShouldRetry shouldRetry = retryPolicy();

 

      int retryCount = 0;

      while (true)

      {

        TimeSpan delay = TimeSpan.FromSeconds(0);

 

        try

        {

          return action();

        }

        catch (Exception ex)

        {

          if (!shouldRetry(retryCount, ex, out delay))

            throw;

        }

        retryCount++;

        Thread.Sleep(delay);

      }

    }

 

    public static void RequestWithRetry(RetryPolicy retryPolicy, Action action)

    {

      ShouldRetry shouldRetry = retryPolicy();

 

      int retryCount = 0;

      while (true)

      {

        TimeSpan delay = TimeSpan.FromSeconds(0);

 

        try

        {

          action();

          return;

        }

        catch (Exception ex)

        {

          if (!shouldRetry(retryCount, ex, out delay))

            throw;

        }

 

        retryCount++;

        Thread.Sleep(delay);

      }

    }

  }

}

 

DieHard has 2 variants of the same Retry invoker – 1 for void actions and the other for a generic return value.

There is a couple of points to mention.  The timeout and the number of retries to attempt may vary depending on your situation and other factors – don’t use my 5 second 5 retries settings as the best possible option for you.  You need to be aware of transactions.  Retries should encapsulate the transaction as dropping a connection will rollback the transaction.  So, except for this example of retrying the open connection you should use the retry mechanism at the higher domain level.

Enjoy!

Wednesday, November 18, 2009 10:20:39 AM (New Zealand Daylight Time, UTC+13:00)  #    Comments [0]   Azure | General | SQL  | 
# Tuesday, October 27, 2009

This is something I have wanted for a while now.  Today I finally got around to doing something about it. I created a simple console application that can execute a SQL script file that you would normally use in Management Studio or SQLCMD.  You might well ask why I would do such a thing. 

  1. SSMS and SQLCMD is not always available or in the right place
  2. I don't have the code for SSMS and SQLCMD so I cant change the way they work

In you read my previous post on deploying to SQL Azure then the first point will make sense.  The next step is to make this script processing code work in Azure so I don't have to create databases from my client.  Hopefully this will speed things up a little or a LOT.

Anyways, since I've always wanted someone to write this code and never found any, here it is in case you are in the same boat.

class Program

  {

    static int Main(string[] args)

    {

 

      string connectionstring = args[0];

      string sql;

      try

      {

        using (FileStream fs = new FileStream(args[1], FileMode.Open, FileAccess.Read))

        {

          byte[] bytes = new byte[fs.Length];

          fs.Read(bytes, 0, (int)fs.Length);

          sql = Encoding.UTF8.GetString(bytes);

        }

      }

      catch (Exception ex)

      {

        Console.WriteLine("Unable to read sql script: " + ex.Message);

        return 1;

      }

 

      string batchTerminator = @"##GO##";

 

      //Regex regex = new Regex(@"(^|\s+)" + batchTerminator + @"\s", RegexOptions.IgnoreCase | RegexOptions.Singleline | RegexOptions.Compiled);

      string[] strings = sql.Split(new string[] {batchTerminator}, StringSplitOptions.None);

 

      using (SqlConnection con = new SqlConnection(connectionstring))

      {

        con.Open();

 

        SqlCommand cmd = new SqlCommand();

        cmd.CommandType = System.Data.CommandType.Text;

        cmd.Connection = con;

 

        for (int i = 0; i < strings.Length; i++)

        {

 

          Console.Title = "Processing " + i.ToString() + " of " + strings.Length.ToString();

 

          //if ((!regex.IsMatch(strings[i])) && (!string.IsNullOrEmpty(strings[i].Trim())))

          if (!string.IsNullOrEmpty(strings[i].Trim()))

          {

            Console.WriteLine(strings[i]);

            try

            {

              cmd.CommandText = strings[i];

              cmd.ExecuteNonQuery();

              Console.ForegroundColor = ConsoleColor.Green;

              Console.WriteLine("O.K.");

              Console.ForegroundColor = ConsoleColor.Gray;

            }

            catch (Exception ex)

            {

 

              Console.ForegroundColor = ConsoleColor.Red;

              Console.WriteLine("Error: " + ex.Message);

              Console.ForegroundColor = ConsoleColor.Gray;

 

 

              return 2;

            }

            Console.WriteLine();

            Console.WriteLine();

          }

        }

        return 0;

        //Console.WriteLine("Press ENTER when complete");

        //Console.ReadLine();

 

      }

    }

  }

 

Ok, so it’s not the prettiest or most bullet proof code on the planet, but it works.  The RegEx is the key to the process.  It’s not perfect – it expects  GO commands to separate the batches and you need to have GO on a line by itself.  It’s easy to extend this should you wish. [Update] Regex don't work!  It’s just too hard to find all the GO’s accurately.  Instead, I manually search and replace Go with ##GO##.  This is a lot easier/safer to split with.  I also removed all PRINT statements from my scripts but you could easily parse these out when processing and have then display as progress messages (no need to send them to the server).

Enjoy!

Tuesday, October 27, 2009 2:20:37 PM (New Zealand Daylight Time, UTC+13:00)  #    Comments [0]   Azure | General | SQL  | 
# Monday, October 26, 2009

Moving data from one SQL Server to another is a fairly trivial task these days.  You have many options – backup/restore the database, SSIS, detach/copy/attach, ADO.Net, BCP, blah blah blah.  Moving data to SQL Azure is more challenging.  Here is my experience and some details of the solution.

The Scenario

The application we are creating has a master application database and database per ‘client’.  The per client database will be partitioned into many smaller databases – each being a replica of the source database with a filter on one or two tables. The creation of the partitions will require the client application to be shut down for a period of time – not ideal but acceptable. 

SQL Azure does not (yet) include any support for partitioning or sharding so we knew we would have to create the solution ourselves.

Options

We wanted to have a flexible partitioning solution so from the outset I thought that SSIS would provide a mechanism that would allow the client to create their own partition rules as required. When I started creating the solution BCP was not available. 

Trials and Tribulations

Before partitioning of the database could be performed it was necessary to move the source database to SQL Azure. Before moving the database it’s necessary to have a script that will create the empty database schema.  When I first did this there where no tools so I used the database tools in VSTS to import from the local database and then manually tweaked the script to make it compatible with SQL Azure.  These days there is the Migration Wizard on CodePlex which does pretty good (but not perfect) job of migrating your schema and optionally data.

Problem 0

Executing the schema script is very very slow.  Locally it takes seconds.  In SQL Azure it can take up to fifteen minutes. We have tested this from several connections (inside and outside our network and with different service providers) and it’s never been less than eight minutes. Our contacts at Microsoft can run the same script in three minutes.  I have some ideas why this is slow and a possible solution… more on this later if I solve it… but for now this is a big problem – 15 x 42 (the number of partitions) = way too long!

It’s been a long time since I’ve had to do anything with SQL Server beyond creating databases for SharePoint or EPiServer and SSIS was completely new to me.  However, I managed to piece together a package that copied the tables from one database to another.  This package called other packages – 1 for each table – that would truncate the destination table and copy the records. 

Problem 1

Because there we 30 or so packages I used xml package configuration files to specify the database connections so that I could change the source and destination in one place.  When running the packages in Visual Studio this would constantly remind me that the connection had changed (when it hadn’t) so I had to turn this off.  Not a big problem but annoying.

Problem 2

Running each separate table copy worked fine most of the time but running all of the packages in sequence (see problem 3) would fail in different places for various reasons.  Sometimes I would get duplicate key errors – which was theoretically impossible.

Problem 3

All but two of the tables have identity columns.  To move the data I had to SET IDENTITY_INSERT ON for the destination table.  OLE DB Destination tasks in the dataflow let you specify identity inserts but these don't work in SQL Azure – you must currently use ADO.Net source and destinations tasks.

You can only have a single SET IDENTITY_INSERT ON per database so this means I had to run all these identity table packages sequentially which makes the whole process very slow.

Problem 4

Reliability.  SSIS in Visual Studio seems to me to be very unreliable and flaky.  I was plagued with performance issues and crashes. 

In the end, frustration got the better of me and I gave up on SSIS.  Thankfully, BCP for SQL Azure became available last week so that provided an option with more control and visibility. ADO.Net includes a SQLBulkCopy class.  This will let you move data from many different sources to a database table.  It provides a few options to control the behaviour of the operation and this seemed like the best option – the last time I used command line BCP was back in ‘95 with SQL Server 6.5 I think – or it might have been Sybase (which in those days was pretty much the same thing).

I whipped up a simple unit test and some classes to test SQLBulkCopy and it didn’t take long to realise that this wasn’t going to work either.

Problem 5

I kept getting errors about not having SET XACT_ABORT set ON.  Binging for this led me down the DTC path.  This is the same path that leads to hell! I couldn’t understand why DTC was getting involved because I had not invoked any transactions.  I thought that BCP or my source data reader query was doing something funky so I just added the following SQLCommand query before I started the bulk copy:

    SqlCommand xact = new SqlCommand("set xact_abort on", dst);
    xact.ExecuteNonQuery();

 

This seemed to work, for a while.

(It turned out that the base class for my unit test was declaring a TransactionScope that wrapped everything in my unit test.  I didn’t spot this until I have up with SQLBulkCopy so it may well be that this is not necessary now).

Problems 6, 7 & 8

I’m not exactly sure when I decided to give up on SQLBulkCopy but it was somewhere around the problems with timeouts, lockups and random results.

My options were now reduced to a short list of one – command line BCP.  After a few false starts (I don’t read instructions as well as I could) I managed to get something working quite well.

Success! Moistly…

I created a batch file to test the various options with BCP.  It’s really pretty simple to use but I wanted the simplest possible solution that would be easy to support and maintain.  Using the native option (-n) provided this but it does mean that should and destination tables need to be almost identical. When creating the schema script I had re-ordered the columns on one table.  This was enough to cause a problem for native mode which generated some cryptic errors.  Thankfully the error file (-e option) provided enough clues to figure out what was wrong.

Throttling

SQL Azure will drop connections when/if there is a heavy load on a database.  This it to enable high availability – which sounds silly – how can it be highly available if the connections keep dropping? – but in reality this just means you need to be prepared for a connection to drop and transactions to fail if the load on the database / server is exceeded.

Determining what the threshold is for throttling of the connections is difficult.  So difficult in fact that one might say impossible. Throttling is triggered by activity by all users of a server. The server is also virtual (not as in a virtual machine but as in a virtual connection to one or more virtual machines). 

My BCP commands on one table (the largest) would consistently cause the throttling to kick in.  This table has about 230,000 rows – not huge by and stretch of the imagination.  Thankfully, BCP allows to to specify the batch size and the first and last rows to use.  For this table I found that 10,000 rows per batch worked most of the time and executing BCP on chucks of 50,000 records seems to have stopped the throttling.  Using the first and last row options also provided the opportunity to retry a chuck of records should throttling kill the connection.

The final (so far) code for running command line BCP in a mode that SQL Azure can handle looks like this.  This code is far from complete but should be sufficient to explain the process. Please feel free to adopt and enhance as you see fit.

private void copyTableWithCmdLineBCP(string sourceServer, string sourceDatabase,

  string sourceTable, string sourceRule, string sourceUserName, string sourcePassword,

  string destinationServer, string destinationDatabase, string destinationTable, string destinationUserName,

  string destinationPassword, int rowsToCopy)

    {

      string ofile = Path.GetTempFileName();

      string sourceQ = getSourceQuery(sourceDatabase + ".dbo." + sourceTable, sourceRule);

      string countQ = getCountQuery(sourceDatabase + ".dbo." + sourceTable, sourceRule);

      string args;

 

      // export to a temp file

      if (sourceUserName == null)

      {

        // no user name so use trusted security

        args = string.Format("\"{0}\" queryout \"{1}\" -S {2} -T -n -E", sourceQ, ofile, sourceServer);

      }

      else

      {

        args = string.Format("\"{0}\" queryout \"{1}\" -S {2} -U {3} -P {4} -n -E",

                 sourceQ, ofile, sourceServer, sourceUserName, sourcePassword);

      }

 

      execProcessAndCheck(Properties.Settings.Default.BCPCMDPath, args, "Bulk copy export failed for table " +

                sourceTable);

 

 

      // import the file one chuck at a time

      for (int startRow = 1; startRow <= rowsToCopy; startRow += MAX_ROWS_PER_BATCH)

      {

        int tries = 0;

 

        if (destinationUserName == null)

        {

          // no user name so use trusted security

          args = string.Format("{0}.dbo.{1} in {2} -S {3} -T -n -b 10000 -E -F {4}",

            destinationDatabase, destinationTable, ofile, destinationServer, startRow);

        }

        else

        {

          args = string.Format("{0}.dbo.{1} in {2} -S {3} -U {4} -P {5} -n -b 10000 -E -F {6}",

            destinationDatabase, destinationTable, ofile, destinationServer, destinationUserName,

            destinationPassword, startRow);

        }

        int endRow = startRow + MAX_ROWS_PER_BATCH - 1;

        if (endRow < rowsToCopy)

        {

          args += " -L " + endRow.ToString();

        }

 

        // execute with retries

        while (!execProcessAndCheck(Properties.Settings.Default.BCPCMDPath, args,

           "Bulk copy import failed for table " + destinationTable) && ++tries <= MAX_RETRIES)

        {

          Logger.WriteLog(LogCategory.Error, "Operation failed due to connection failure.  Will retry.");

          Thread.Sleep(WAIT_BETWEEN_RETRIES);

        }

      }

    }

This uses a couple of helper methods.  getSourceQuery and getCountQuery simply return queries for a specified table name and ‘rule’  The rule is a string which defines how to partition the table.  A switch statement is used to determine the WHERE clause for a table based on the table name and rule value.

execProcessAndCheck executes a command line with arguments and captures the output.

    private bool execProcessAndCheck(string cmdLine, string args, string errorMessage)

    {

      _processOutput = new StringBuilder();

 

      ProcessStartInfo psi = new ProcessStartInfo(cmdLine, args);

      psi.RedirectStandardError = true;

      psi.RedirectStandardOutput = true;

      psi.UseShellExecute = false;

      psi.CreateNoWindow = true;

 

      Process prs = new Process();

      prs.OutputDataReceived += new DataReceivedEventHandler(prs_OutputDataReceived);

      prs.StartInfo = psi;

      prs.Start();

      prs.BeginOutputReadLine();

 

 

      if (prs != null)

      {

        prs.WaitForExit(MAX_PROCESS_TIME);

        if (!prs.HasExited)

        {

          prs.Kill();

          throw new Exception(errorMessage + " - command has exceeded the maximum allowable time and was exterminated.");

        }

      }

 

      return checkProcessResults(prs, errorMessage);

 

    }

 

The output is captured by a simple event handler:

    void prs_OutputDataReceived(object sender, DataReceivedEventArgs e)

    {

      if (!string.IsNullOrEmpty(e.Data))

        _processOutput.AppendLine(e.Data);

 

    }

 

_processOutput is a class scoped StringBuilder.

The results from the process are checked by checkProcessResults:

    private bool checkProcessResults(Process prs, string message)

    {

      if (prs.ExitCode != 0)

      {

        // look for error 10054 - this means that SQL Azure has throttled and we should retry

        string output = _processOutput.ToString();

        if (output.Contains("NativeError = 10054"))

        {

          return false;

        }

        message += ": Error Code = " + prs.ExitCode.ToString() + " : Message = " + output;

        throw new Exception(message);

      }

 

      return true;

    }

 

This is all very specify to the way BCP.exe and SQLCMD.exe work – if you want to use this for other command line tools then some tweaking will no doubt be required.

Summary

SQL Azure is an extremely powerful platform - despite the issues described above.  It will definitely get better as the product gets closer to release.  Most of my problems are caused by my recent inexperience with SQL tools and the general nature of an unfinished product.  Learning how to do this stuff now will provide a better understanding of the realities of cloud based computing as opposed to hosting a SQL Server on some internet connected remote server.

Performance of the database creation, upload and partitioning could be enhanced in several ways.  Threading could help enormously when creating the partition databases but throttling may reduce any benefit when it comes to uploading the data.  Using multiple SQL Azure servers would also help a little. Moving the functionality into the cloud could be possible if SQLBulkCopy would work betterer and if I had a mechanism for creating the database schema other than SQLCmd.exe. 

All of these options are possible given more time – but PDC09 is our hard deadline - so what I have so far will hopefully be good enough for a wicked kick-arse sexed-up demo of what is possible with this awesome platform! I’ll talk more about this application when I can… :)

Monday, October 26, 2009 11:07:49 AM (New Zealand Daylight Time, UTC+13:00)  #    Comments [1]   Azure | General | SQL  | 
# Thursday, August 20, 2009

Here’s the slides and code from my user group session last night – Getting Started with Azure Development.

I covered the basics of Web and Worker roles, storage and a bit of the service bus.

Use as you see fit.

Drizle.zip (3.53 MB)

WindowsAzureStartingUp.zip (229.92 KB)
Thursday, August 20, 2009 2:00:33 PM (New Zealand Standard Time, UTC+12:00)  #    Comments [3]   Azure | NZ .Net User Group  | 
# Thursday, July 23, 2009

After posting about using TCP with the .Net Service Bus and making it look sooooo simple, I’ve lost most of today trying to get the fracking thing working!  I doesn’t matter what I do in the config, the service host keeps prompting me for a Card (from CardSpace) even though I had everything setup for UserNamePassword creds.

After much stuffing around I noticed that my solution had the wrong version of the service bus DLL (0.15.0.0 – March CTP I think).  This was my bad – I had copied the DLL to a solution folder.  So I grabbed the '”new” version (0.16.7.0) from the July CTP Assemblies folder and tried that.

Same result!  Arg!

Looking in the GAC I could see it had version 0.16.0.0.  What the ?!!

Time for a reinstall me think.

Download from MS.  Install. Chug chug….

Ok, now I have version 0.16.71.1.  Right. Fine.

But the GAC still has 0.16.0.0 and my app still is broked.  Lord - give me strength!

I re-added the project assembly reference to my local non gac copy and set copy local and now it all works again.

I can’t delete the old version from the GAC without uninstalling the SDK completely so it will have to stay there.

Weird!  The Joy of CTP.

Thursday, July 23, 2009 2:54:08 PM (New Zealand Standard Time, UTC+12:00)  #    Comments [2]   Azure  | 
# Wednesday, July 08, 2009

Having a blog is like having a nagging wife – not blogging makes me feel guilty.  The thing is, I just don’t have the time to to write the type of blog posts I like to write – long and accurate.  Not that I’m ever very accurate, but well …

Anyways, I’ve been ranting about how stupid twitter is and what a complete time waster it is without actually trying it.  Then it occurred to me that twitter might actually be a useful alternative to long exhaustive blog posts.

So, I’ve signed up again and am tweeting a little about very development focused things.  You wont catch me tweeting about the weather or Michael Jackson etc.  Currently I’m into Azure dev so I’m finding lots of little things to chirp about.  I haven’t found a lot of tweets to follow yet apart from the obvious candidates – scottgu etc – but I did get my first reply today when I sent a message to a Microsoft tweet.  Very sweet!

Follow me if you love me :)  jonesienz

Wednesday, July 08, 2009 4:39:01 PM (New Zealand Standard Time, UTC+12:00)  #    Comments [1]   Azure | General  | 
# Tuesday, June 16, 2009

I’ve been doing quite a bit of fun work with Azure lately – and it’s great! – mostly :)  It’s early days yet and there are a couple of (hundred?) things to sort out before they go live I guess.  The main problem with anything new like this is the lack of documentation and real world expertise when you strike a problem.  So, I thought I should blog solutions to the problems we have struck lately.  Here’s a little one.

The .Net Service Bus allows you to expose in-house systems to the wide wide web without all that pesky firewall configuration.  Essentially, it sites between the client and server and routes (WCF) messages.  You can read all about it here.   It only took us a morning to figure out how to configure our WCF’s but we had problems when it came time to deploy the web-role client to Azure. 

Required permissions cannot be acquired.

Description: An unhandled exception occurred during the execution of the current web request. Please review the stack trace for more information about the error and where it originated in the code.

Exception Details: System.Security.Policy.PolicyException: Required permissions cannot be acquired.

So, after some research, the solution was very simple.  You just need to allow full trust for the web role which is achieved by setting the web role’s enableNativeCodeExecution to true in the ServiceDefinition.csdef file.  See here for the full story.

More on Azure soon…

Tuesday, June 16, 2009 10:46:01 AM (New Zealand Standard Time, UTC+12:00)  #    Comments [0]   Azure  | 
Copyright © 2010 Peter G Jones. All rights reserved.
DasBlog 'Portal' theme by Johnny Hughes.
Pick a theme: