Get in touch with us today! Call us toll-free at 1.866.754.4111 or email us at sales@remwebsolutions.com
Web Design Development Kitchener Waterloo Guelph Cambridge E-commerce
This is a headshot of Jamie McBurney.

 

Depending on when you read this, I'm either just about to leave for my "Lights Out / Technology Off" vacation to discover my heritage in Newfoundland or I'm there right now, or I've already come back.

 

I'll provide some insights on the trip another time.

 

In the meantime, I'm consumed with the preparation tasks.  At the time of me writing this article, I have 2 weeks until I leave.

 

It's another great time to review and practice our contingency plans.

 

Like any company worth it's salt in the technology arena, REM has a simplistic set of redundancies on our core technology *and* our human components.  Notice the choice to use the word "simplistic" instead of "complex" set of redundancies.  One of my well known standard operating procedures around REM is to remove complexity wherever possible, and nowhere is that more evident than in our failover procedures for when I'm gone.

 

We have a clear set of resolutions for vast array of predictable problems that range from "an email was accidentally erased by user" to "power supply blows up on a database server" that all involve reading through simple 10 step (or less) recovery steps that are designed to be followed by the lowest experienced person if required, and everyone knows their part from top to bottom.

 

The solutions don't rely on "vendor promises" or propriety "swiss army knife" solutions.  They are practical steps that can be understood  by anyone with basic computer knowledge.  They have a recovery times that are up to 15% longer than those promised by vendors and cost a little more money but my experience is that vendor promises of instant recoveries rarely line up when a disaster truly happens.

 

Want an example of one of our methods?

 

I have a complete "running" clone of our live server environment poised to take over any failing component.

 

What are the steps to recover from a completely failed database server - let's say power supply blew up?

  1. Power off faulty database server.
  2. Log into the cloned emergency database server
  3. Change the IP address of the emergency server to that of the powered off machine.
  4. Grab latest database files (from any one of our 3 backup locations) for affected sites.  (If newer versions exist)
  5. Restore newer files.
  6. Done.

What are the steps to recover from a completely failed mail server - let's say all hard drives crashed?

  1. Power off faulty mail server.
  2. Log into the cloned emergency mail server
  3. Change the IP address of the emergency server to that of the powered off machine.
  4. Grab the latest email files from any one of our 3 backup locations if newer files exist.
  5. Restore newer files.
  6. Done.

Obviously, I'm picking some cut and dry scenarios to illustrate my point, but suffice it to say that simple solutions are by far the safer bet when it really matters - getting things running again.

 

Now we'll spend the next 10 days or so with fire drills so that I can answer questions now, and not when I'm in the forests of Canada's most easterly province.

 

Photo courtesy Paul Shaw.

Subscribe to this Blog Like on Facebook Tweet this! Share on LinkedIn

Contributors

Brad Anderson
132
November 14, 2019
Show Brad's Posts
Christine Votruba
25
October 31, 2019
Show Christine's Posts
Ryan Covert
48
July 26, 2019
Show Ryan's Posts
Sean Sanderson
63
July 23, 2019
Show Sean's Posts
Matt Stern
4
July 16, 2019
Show Matt's Posts
Sean Legge
1
June 28, 2019
Show Sean's Posts
Sean McParland
17
June 28, 2019
Show Sean's Posts
Rob Matlow
84
April 17, 2019
Show Rob's Posts
Todd Hannigan
47
November 13, 2018
Show Todd's Posts