Welcome!

Apache Authors: Pat Romanski, Liz McMillan, Elizabeth White, Christopher Harrold, Janakiram MSV

Blog Feed Post

Move over Reliability, Resilience has arrived

[This article was originally written as a guest post for Puppet Labs and published at their blog on January 9th, 2014.]

If you haven’t yet noticed that prioritization of non-functional requirements (NFRs) is changing amongst your user base, you will soon. For decades, we have held to the same familiar set of NFRs. Every team had its own definition and particular spin on NFRs, but the usual suspects are accessibility, availability, extensibility, interoperability, maintainability, performance, reliability, scalability, security, and usability.

But new priorities have surfaced, as IT has experienced a sea change over the past few years. Some organizations have even adopted completely new NFRs. The rise of DevOps has coincided with these changes, and the movement’s principles enable IT teams to more readily adapt to rapidly changing requirements.

Your grandfather’s mainframe was very reliable

Historically, IT system designs were praised for reliability. Robust and stable systems could “take a licking and keep on ticking.” As computing became more pervasive, scalability became the watchword. Systems should be able to grow and expand to meet increasing demands.

Scalability as an NFR priority represents just a slight shift from reliability as an NFR. Both operated off the mindset that the original system design was valid. Reliability ensures that the system continues to provide the stated functionality over time, and scalability ensures that you can do so for an increasing demand set.

Roughly 10 years ago, things began to shift as more and more organizations embraced movements like agile or XP, and architectural models like Service Oriented Architecture (SOA). These initiatives promoted adaptation and response to change as desirable system qualities. Next, cloud computing introduced us to the notion of elasticity, further promoting the values of flexibility and responsiveness to change.

A resilient system is a happy system

The state of the art for system design is always evolving, and we see noticeable leaps forward every few years. The current phase of evolution is toward resilient systems.

Legacy system designs relied upon expensive infrastructure with multiple-redundant-hot-swappable-live-backup-standby-continuity-generators (or whatever vendors are peddling lately). In contrast, resilient system designs embrace failure and promote the use of cheap, commodity hardware, coupled with distributed data management, parallel processing, eventual consistency, and self-healing operational nodes.

Some portion of your system is likely to go down at some point, and resilient systems are designed with that expectation. Resilient systems and resilient processes are able to continue operation (albeit at diminished capacity) in the face of failure.

The prioritization of resilience over reliability as an NFR can be seen within the DevOps movement, the development of the Netflix Simian Army, and the rise of NoSQL data management solutions.

DevOps and resiliency

DevOps is a multi-headed beast, more a movement guided by a set of principles than a tangible and well-defined construct. While organizations are free to adopt aspects of DevOps that suit their needs, one common thread is that of resilience. Failure is seen as an opportunity to improve processes and communication, rather than as a threat.

The principles of continuous integration and continuous delivery that are core to most DevOps practices exemplify a resilient mindset. Where the classic waterfall model relies upon detailed front-end design and planning with an all-or-nothing development phase and late-stage testing, DevOps teams are more agile, embracing a “fail early, fail often” model. This approach results in more resilient and adaptable applications.

Netflix Simian Army

Netflix gained world renown when the company broadcast details of its Simian Army work in 2010 and 2011. Through the automated efforts of Chaos Monkey, Chaos Gorilla, and a slew of other similar utilities, failure is simulated in order to develop more resilient processes, tools, and capabilities.

John Ciancutti of Netflix writes, “If we aren’t constantly testing our ability to succeed despite failure, then it isn’t likely to work when it matters most — in the event of an unexpected outage.”

NoSQL

A third illustration of the growing fascination with resilient, self-healing systems is the transformation now going on in the data realm. Data and metadata management have evolved considerably from the relational databases of yore. Modern data management strategies tend to be distributed, fault-tolerant, and in some cases even self-heal by spawning new nodes as needed. Examples include Google FS / Bigtable, in-memory datastores like Hazelcast or SAP’s HANA, and distributed data management solutions like Apache Cassandra.

Miko Matsumura of Hazelcast notes, “Virtualization and scale-out power new ways of thinking about system stability, including a shift away from ‘reliability,’ where giant expensive systems never fail (until they do, catastrophically), and towards ‘resiliency,’ where thousands of inexpensive systems constantly fail—but in ways that don’t materially impact running applications.”

Keeping pace with the cool kids

It’s often said that the only constant is change. The DevOps movement positions organizations to embrace change, rather than fear it. Continuous integration, continuous delivery, and continuous feedback loops between dev teams and ops teams facilitate an enhanced degree of agility and responsiveness.

As business and society evolve, our system design priorities must adapt in parallel. The cool kids will change the game again at some point, but for right now, “change” means designing systems and supporting processes that are responsive and adaptable by prioritizing resilience over reliability.

Read the original blog entry...

More Stories By Kyle Gabhart

Kyle Gabhart is a subject matter expert specializing in strategic planning and tactical delivery of enterprise technology solutions, blending EA, BPM, SOA, Cloud Computing, and other emerging technologies. Kyle currently serves as a director for Web Age Solutions, a premier provider of technology education and mentoring. Since 2001 he has contributed extensively to the IT community as an author, speaker, consultant, and open source contributor.

@ThingsExpo Stories
Multiple data types are pouring into IoT deployments. Data is coming in small packages as well as enormous files and data streams of many sizes. Widespread use of mobile devices adds to the total. In this power panel at @ThingsExpo, moderated by Conference Chair Roger Strukhoff, panelists looked at the tools and environments that are being put to use in IoT deployments, as well as the team skills a modern enterprise IT shop needs to keep things running, get a handle on all this data, and deliver...
In his session at @ThingsExpo, Eric Lachapelle, CEO of the Professional Evaluation and Certification Board (PECB), provided an overview of various initiatives to certify the security of connected devices and future trends in ensuring public trust of IoT. Eric Lachapelle is the Chief Executive Officer of the Professional Evaluation and Certification Board (PECB), an international certification body. His role is to help companies and individuals to achieve professional, accredited and worldwide re...
The current age of digital transformation means that IT organizations must adapt their toolset to cover all digital experiences, beyond just the end users’. Today’s businesses can no longer focus solely on the digital interactions they manage with employees or customers; they must now contend with non-traditional factors. Whether it's the power of brand to make or break a company, the need to monitor across all locations 24/7, or the ability to proactively resolve issues, companies must adapt to...
IoT solutions exploit operational data generated by Internet-connected smart “things” for the purpose of gaining operational insight and producing “better outcomes” (for example, create new business models, eliminate unscheduled maintenance, etc.). The explosive proliferation of IoT solutions will result in an exponential growth in the volume of IoT data, precipitating significant Information Governance issues: who owns the IoT data, what are the rights/duties of IoT solutions adopters towards t...
With the introduction of IoT and Smart Living in every aspect of our lives, one question has become relevant: What are the security implications? To answer this, first we have to look and explore the security models of the technologies that IoT is founded upon. In his session at @ThingsExpo, Nevi Kaja, a Research Engineer at Ford Motor Company, discussed some of the security challenges of the IoT infrastructure and related how these aspects impact Smart Living. The material was delivered interac...
With major technology companies and startups seriously embracing Cloud strategies, now is the perfect time to attend 21st Cloud Expo October 31 - November 2, 2017, at the Santa Clara Convention Center, CA, and June 12-14, 2018, at the Javits Center in New York City, NY, and learn what is going on, contribute to the discussions, and ensure that your enterprise is on the right path to Digital Transformation.
No hype cycles or predictions of zillions of things here. IoT is big. You get it. You know your business and have great ideas for a business transformation strategy. What comes next? Time to make it happen. In his session at @ThingsExpo, Jay Mason, Associate Partner at M&S Consulting, presented a step-by-step plan to develop your technology implementation strategy. He discussed the evaluation of communication standards and IoT messaging protocols, data analytics considerations, edge-to-cloud tec...
New competitors, disruptive technologies, and growing expectations are pushing every business to both adopt and deliver new digital services. This ‘Digital Transformation’ demands rapid delivery and continuous iteration of new competitive services via multiple channels, which in turn demands new service delivery techniques – including DevOps. In this power panel at @DevOpsSummit 20th Cloud Expo, moderated by DevOps Conference Co-Chair Andi Mann, panelists examined how DevOps helps to meet the de...
When growing capacity and power in the data center, the architectural trade-offs between server scale-up vs. scale-out continue to be debated. Both approaches are valid: scale-out adds multiple, smaller servers running in a distributed computing model, while scale-up adds fewer, more powerful servers that are capable of running larger workloads. It’s worth noting that there are additional, unique advantages that scale-up architectures offer. One big advantage is large memory and compute capacity...
The Internet giants are fully embracing AI. All the services they offer to their customers are aimed at drawing a map of the world with the data they get. The AIs from these companies are used to build disruptive approaches that cannot be used by established enterprises, which are threatened by these disruptions. However, most leaders underestimate the effect this will have on their businesses. In his session at 21st Cloud Expo, Rene Buest, Director Market Research & Technology Evangelism at Ara...
"When we talk about cloud without compromise what we're talking about is that when people think about 'I need the flexibility of the cloud' - it's the ability to create applications and run them in a cloud environment that's far more flexible,” explained Matthew Finnie, CTO of Interoute, in this SYS-CON.tv interview at 20th Cloud Expo, held June 6-8, 2017, at the Javits Center in New York City, NY.
Artificial intelligence, machine learning, neural networks. We’re in the midst of a wave of excitement around AI such as hasn’t been seen for a few decades. But those previous periods of inflated expectations led to troughs of disappointment. Will this time be different? Most likely. Applications of AI such as predictive analytics are already decreasing costs and improving reliability of industrial machinery. Furthermore, the funding and research going into AI now comes from a wide range of com...
Internet of @ThingsExpo, taking place October 31 - November 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA, is co-located with 21st Cloud Expo and will feature technical sessions from a rock star conference faculty and the leading industry players in the world. The Internet of Things (IoT) is the most profound change in personal and enterprise IT since the creation of the Worldwide Web more than 20 years ago. All major researchers estimate there will be tens of billions devic...
SYS-CON Events announced today that MobiDev, a client-oriented software development company, will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place October 31-November 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. MobiDev is a software company that develops and delivers turn-key mobile apps, websites, web services, and complex software systems for startups and enterprises. Since 2009 it has grown from a small group of passionate engineers and business...
SYS-CON Events announced today that GrapeUp, the leading provider of rapid product development at the speed of business, will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place October 31-November 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Grape Up is a software company, specialized in cloud native application development and professional services related to Cloud Foundry PaaS. With five expert teams that operate in various sectors of the market acr...
SYS-CON Events announced today that Ayehu will exhibit at SYS-CON's 21st International Cloud Expo®, which will take place on October 31 - November 2, 2017 at the Santa Clara Convention Center in Santa Clara California. Ayehu provides IT Process Automation & Orchestration solutions for IT and Security professionals to identify and resolve critical incidents and enable rapid containment, eradication, and recovery from cyber security breaches. Ayehu provides customers greater control over IT infras...
In this presentation, Striim CTO and founder Steve Wilkes will discuss practical strategies for counteracting fraud and cyberattacks by leveraging real-time streaming analytics. In his session at @ThingsExpo, Steve Wilkes, Founder and Chief Technology Officer at Striim, will provide a detailed look into leveraging streaming data management to correlate events in real time, and identify potential breaches across IoT and non-IoT systems throughout the enterprise. Strategies for processing massive ...
SYS-CON Events announced today that Cloud Academy named "Bronze Sponsor" of 21st International Cloud Expo which will take place October 31 - November 2, 2017 at the Santa Clara Convention Center in Santa Clara, CA. Cloud Academy is the industry’s most innovative, vendor-neutral cloud technology training platform. Cloud Academy provides continuous learning solutions for individuals and enterprise teams for Amazon Web Services, Microsoft Azure, Google Cloud Platform, and the most popular cloud com...
In his session at Cloud Expo, Alan Winters, an entertainment executive/TV producer turned serial entrepreneur, presented a success story of an entrepreneur who has both suffered through and benefited from offshore development across multiple businesses: The smart choice, or how to select the right offshore development partner Warning signs, or how to minimize chances of making the wrong choice Collaboration, or how to establish the most effective work processes Budget control, or how to ma...
SYS-CON Events announced today that Enzu will exhibit at SYS-CON's 21st Int\ernational Cloud Expo®, which will take place October 31-November 2, 2017, at the Santa Clara Convention Center in Santa Clara, CA. Enzu’s mission is to be the leading provider of enterprise cloud solutions worldwide. Enzu enables online businesses to use its IT infrastructure to their competitive advantage. By offering a suite of proven hosting and management services, Enzu wants companies to focus on the core of their ...