Welcome!

Apache Authors: Yeshim Deniz, Pat Romanski, Lacey Thoms, Sandi Mappic, Michael Bushong

Related Topics: Cloud Expo, Java, SOA & WOA, Virtualization, Web 2.0, Apache

Cloud Expo: Blog Feed Post

That Other Single Point of Failure

Unless you’re in publishing or high-tech, it is likely that our entire organization is a single point of failure

When you’re a kid at the beach, you spend a lot of time and effort building a sand castle. It’s cool, a lot of fun, and doomed to destruction. When high tide, or random kids, or hot sun come along, the castle is going to fall apart. It doesn’t matter, kids build them every year by the thousands, probably by the millions across the globe. Each is special and unique, each took time and effort, and each will fall apart.

The thing is, they’re all over the globe, and seasons are different all over the globe, so it is conceivable that there is a sand castle built or being built every minute of every day. Not easily provable, but doesn’t need to be for this discussion. when it is night and middle of winter in the northern reaches of North America, it is summer and daytime in Australia. The opportunity for continuation of sand castles is amazing.

Unless you’re in publishing or high-tech, it is likely that our entire organization is a single point of failure. Distributed applications make sense so that you can minimize risk and maximize uptime, right? The cloud is often  billed as more resistant to downtime precisely because it is distributed.

And your organization? Is it distributed? Really, spread out so that it can’t be impacted by something like Sandy?

There are a good number of organizations that are nearly 100% off-line right now because there is no power in the Northeast. That was not a possibility, it was an inevitability. Power outages happen, and they sometimes happen on a grand scale (remember the cascading midwest/northeast/Canada outage a couple years back – that was not natural disaster, it was design and operator error). And yet, even companies with a presence in the cloud clustered their employees in one geographic area. There is a tendency amongst some to want face-to-face meetings, assuming those are more productive, which leads to desiring everyone be on-site. With increasing globalization, and meetings held around the world - long before I became a remote worker, I held meetings with staff in Africa, Russia, and California, all on the same (very long) day, and all from my home in Green Bay – one would think this tendency would be minimized, but it does not seem to be.

The result is predictable. I once worked as a Strategic Architect for a life insurance company. They had a complete replica of the datacenter in a different geographic region, on the grounds that a disaster so horrible as to take out the datacenter would be exactly the scenario in which that backup would be needed. But guess where the staff was? Yeah,  at the primary. The systems would have been running fine, but the IT knowledge, business knowledge, and claims adjustment would all have been in the middle of a disaster.

Don’t make that mistake. Today, most organizations with multiple datacenters have DR plans that cover shifting all the load away from one of them should there be a problem, but those organizations with a single datacenter don’t have that leisure, and neither of them necessarily have a plan for continuation of actual work. Consider your options, consider how you will get actual business up to speed as quickly as possible. Losing their jobs because the business was not viable for weeks is not a great plan for helping people recover from disaster.

Even with the cloud, there is critical corporate knowledge out there that makes your organization tick. It needs to be geographically distributed. It matter not what systems are in the cloud if all of the personnel to make them work are in the middle of a blackout zone.

In short, think sand castles. If you have multiple datacenters, make certain your IT and business knowledge is split between them well enough to continue operations in a bad scenario. If you don’t have multiple locations, consider remote workers. Some people are just not cut out for telecommuting (I hate that phrase, since telecomm has little to do with the daily work, but it’s what we have), others do fine at it. Find some fine ones that have, or can be trained to have, the knowledge required to keep the organizations’ doors open. It could save the company a lot of money, and people a lot of angst. And your customers will be pleased too.

The key is putting the right people and the right skills out there. Spread them across datacenters or geographies, so you’re distributed as well as your apps. And while you’re at it, broadening the pool of available talent means you can get some hires you might never have gotten if relocation was required.

And all of that is a good thing.

Like sand castles.

Meanwhile, keep America’s northern east coast in your thoughts, that’s a lot of people in a little space without the amenities they’re accustomed to.

Read the original blog entry...

More Stories By Don MacVittie

Don MacVittie is Founder of Ingrained Technology, LLC, specializing in Development, Devops, and Cloud Strategy. Previously, he was a Technical Marketing Manager at F5 Networks. As an industry veteran, MacVittie has extensive programming experience along with project management, IT management, and systems/network administration expertise.

Prior to joining F5, MacVittie was a Senior Technology Editor at Network Computing, where he conducted product research and evaluated storage and server systems, as well as development and outsourcing solutions. He has authored numerous articles on a variety of topics aimed at IT professionals. MacVittie holds a B.S. in Computer Science from Northern Michigan University, and an M.S. in Computer Science from Nova Southeastern University.