Search This Blog

Sunday, February 7, 2010

When Moving to the Cloud, Slow and Steady Wins the Race

For years, migrating mailboxes involved a laser-focus on three things: speed, speed, and speed. The faster you can migrate mailboxes, the more users you can migrate in an evening, driving a perception of reduced costs. Regrettably, this modern cultural mindset often leads to unfulfilled customer expectations compounded by unforeseen issues resulting in disappointed end-users and overall reduced productivity. More importantly, customer's confidence that crucial information remains easily available whenever needed is compromised by even the perception of one lost email. I will begin by discussing what factors drive the speed of mailbox migration:
Complexity, Latency, Humanity.

COMPLEXITY
Complexity refers to not only a mailbox's complexity, but to the content in the individual mail messages. There are two types of stereotypical users who treat their inbox as digital formaldehyde, Pilers and Filers. My wife will argue there are actually three types since she both piles and files depending upon the amount of email in a given day, the environment she is in and the number of interruptions she experiences while checking email, the need for actual replying and her mood before and after reading her mail. While I acknowledge the "hybrid" Piler/Filer type and and fully expect to see even more of this type due to increased availability for people to access their mail from a multitude of devices and locations, I shall stick with the two types for purposes of this discussion. Pilers tend to have less folders and pile up everything in their inbox using the read flag to determine which mail to work on. Filers generally have a clean inbox with hundreds of folders in complex hierarchies that help them find items with "human powered" searches. You're probably guessing that Filers have the more complex mailbox and you would be wrong. Pilers tend to have a higher quantity and complexity of mail in their inbox dating back several years where Filers delete more mail and save what they identify as useful. As each user is some combination of a Piler and Filer, we have to predict that they will have a fairly complex mailbox. Also, as mail formats have progressed to support richer content types, the complexity of the mail stored in a mailbox has only grown over the past few years. The main thing to consider is that complexity affects speed much more than raw mailbox size, although larger mailboxes tend to be more complex and hence take more time to move.


LATENCY
Next let's look at Latency which governs the effective bandwidth available for migration. Migrations done between servers on the same network are generally restricted more by the horsepower of the servers than by the speed of the network as latency is less than 10ms. However, as servers move to a more distributed model or the migration is being done over a WAN, the latency becomes the most important throttle on speed. Migrations to the cloud are wholeheartedly affected by the latency, and the laws of physics do not allow us to do much to overcome this issue.

Effective bandwidth can be calculated with the following formula:

TCP-Window-Size-in-bits / Latency-in-seconds = Bits-per-second-throughput

So for a standard windows server transferring data over a link with 30ms of latency the effective bandwidth is:

64KB = 65536 Bytes. 65536 * 8 = 524288 bits

524288 bits / 0.030 seconds = 17476266 bits per second throughput = 17.4 Mbps maximum effective bandwidth

One way to increase total throughput is to increase the TCP Window size using the following formula:

Bandwidth-in-bits-per-second * Round-trip-latency-in-seconds = TCP window size in bits / 8 = TCP window size in bytes

Be careful when using this formula to increase TCP size because it will affect ALL applications on the server and will put a heavy load on system memory. A more effective way to leverage a larger TCP window size is to use a WAN accelerator on both ends of the connection as they will offload the work from the server and can adjust to things like jitter and retransmits.

HUMANITY
One cannot underestimate the affect Humanity has on migrations... before, during and after. Asking end-users to perform "housekeeping" on their mailboxes or run tools before a migration only results in confusion and a 5% or more failure rate for mailboxes that don't meet migration criteria. During the migration, humans are driving the tools and doing validation checking which will result in a guaranteed 1% failure rate. After the migration errors from the previous steps or missing mail items are taken into account, there will be an additional 5-10% call-in rate to the help desk. As you can see, faster migrations result in errors, so the faster you go and the more mailboxes you migrate, the more errors with which you have to address. If you are migrating 200 users per night, then having less than 20 mailboxes with issues is not a big deal and your help desk can handle the 20 calls the next day. But when you go to 2,000 or 20,000 mailboxes per night, the numbers overwhelm the support capabilities and the overall speed of the migration will be dragged down by the human factors.


Solution
The quest to go faster will consistently produce a negative outcome of unnecessary errors. This will result in some great (and time-consuming) engineering to try and work around them, but why? These factors are even more prevalent when dealing with migrations to the cloud as they are increased many-fold. So you might be wondering, "What can we do about it?" I have a proposed solution based upon sound, old-fashioned (and most likely, familiar) advice: Go slower - take the time to do it right the first time so you won't have to waste your time doing it over. The Outlook client has been doing this for years with the offline store. When faced with replicating hundreds of megabytes of data, the engineers at Microsoft had to prioritize which three components required replication first for what the end-user actually needed to maintain optimum job performance. They came up with the following items: Calendar, Contacts, and 30-days worth of mail in the inbox. If you think about it, this is a simple and elegant solution. Users probably won't notice that old mail is trickling in as they will be dealing with, for the most part, existing and new mail in their inbox. Their calendar and contacts are critical items that they need to do their job. But that mail from three years ago squirreled away in a folder three levels down can wait a day or possibly two. The Outlook client also replicates down mail to the OST by folder, from the newest to the oldest. So, if we can "take a page from the Outlook client OST replication" and follow their example, we can be much more successful in doing mass migrations with little to no impact on end users. I'll be writing more about how to take this approach in the future, as we need better tools and processes to make this feasible.

We should take note of who won the race in Aesop's fable, The Tortoise and the Hare... slow and steady wins the race, folks.

Here are links to articles that I referenced in this BLOG

3 comments:

  1. Actually knowing about Cloud Computing is a heavenly information. I like to know much more about it. I am going to attend the upcoming cloud Conference too.

    ReplyDelete
  2. Directly relating to your key points, we see one of the key challenges when it comes to cloud adoption as how to balance the contradicting requirements for speed and controlling risk. The higher the speed and pace of change, the less control and higher risk. On the other hand if you increase control that translates into slower speed of achieving change.

    When it comes to the cloud, this notion is amplified even further as enterprises are required to move even faster, and at the same time any failure or outage is highly visible - so control cannot afford to be an oversight.

    We recently had a webinar with a top analyst from EMA talking about this change dilemma:
    Learn How to Drive Change While Remaining in Control http://www.evolven.com/webinar-change-ema.html

    So, is achieving both control and speed at the same time realistic? Yes, but it requires a different approach. Especially when it comes to configuration management. The existing configuration management tools (e.g. CMDB) hit their limits: they were designed and architectured for physical, static environments and are too rigid to handle the pace of change and dynamics of modern IT including migrating to the cloud.

    Best,

    Alex Gutman
    Technology Evangelist
    Evolven Software, Inc.
    alexg@evolven.com
    http://www.evolven.com

    ReplyDelete
  3. Great and really appreciable Post.Cloud migration will help in transferring data easily from one place to another. If you have more ideas on Cloud Migration and cloud computing please share with us.

    ReplyDelete