Welcome!

Apache Authors: Pat Romanski, Liz McMillan, Elizabeth White, Christopher Harrold, Janakiram MSV

Blog Feed Post

Move over Reliability, Resilience has arrived

[This article was originally written as a guest post for Puppet Labs and published at their blog on January 9th, 2014.]

If you haven’t yet noticed that prioritization of non-functional requirements (NFRs) is changing amongst your user base, you will soon. For decades, we have held to the same familiar set of NFRs. Every team had its own definition and particular spin on NFRs, but the usual suspects are accessibility, availability, extensibility, interoperability, maintainability, performance, reliability, scalability, security, and usability.

But new priorities have surfaced, as IT has experienced a sea change over the past few years. Some organizations have even adopted completely new NFRs. The rise of DevOps has coincided with these changes, and the movement’s principles enable IT teams to more readily adapt to rapidly changing requirements.

Your grandfather’s mainframe was very reliable

Historically, IT system designs were praised for reliability. Robust and stable systems could “take a licking and keep on ticking.” As computing became more pervasive, scalability became the watchword. Systems should be able to grow and expand to meet increasing demands.

Scalability as an NFR priority represents just a slight shift from reliability as an NFR. Both operated off the mindset that the original system design was valid. Reliability ensures that the system continues to provide the stated functionality over time, and scalability ensures that you can do so for an increasing demand set.

Roughly 10 years ago, things began to shift as more and more organizations embraced movements like agile or XP, and architectural models like Service Oriented Architecture (SOA). These initiatives promoted adaptation and response to change as desirable system qualities. Next, cloud computing introduced us to the notion of elasticity, further promoting the values of flexibility and responsiveness to change.

A resilient system is a happy system

The state of the art for system design is always evolving, and we see noticeable leaps forward every few years. The current phase of evolution is toward resilient systems.

Legacy system designs relied upon expensive infrastructure with multiple-redundant-hot-swappable-live-backup-standby-continuity-generators (or whatever vendors are peddling lately). In contrast, resilient system designs embrace failure and promote the use of cheap, commodity hardware, coupled with distributed data management, parallel processing, eventual consistency, and self-healing operational nodes.

Some portion of your system is likely to go down at some point, and resilient systems are designed with that expectation. Resilient systems and resilient processes are able to continue operation (albeit at diminished capacity) in the face of failure.

The prioritization of resilience over reliability as an NFR can be seen within the DevOps movement, the development of the Netflix Simian Army, and the rise of NoSQL data management solutions.

DevOps and resiliency

DevOps is a multi-headed beast, more a movement guided by a set of principles than a tangible and well-defined construct. While organizations are free to adopt aspects of DevOps that suit their needs, one common thread is that of resilience. Failure is seen as an opportunity to improve processes and communication, rather than as a threat.

The principles of continuous integration and continuous delivery that are core to most DevOps practices exemplify a resilient mindset. Where the classic waterfall model relies upon detailed front-end design and planning with an all-or-nothing development phase and late-stage testing, DevOps teams are more agile, embracing a “fail early, fail often” model. This approach results in more resilient and adaptable applications.

Netflix Simian Army

Netflix gained world renown when the company broadcast details of its Simian Army work in 2010 and 2011. Through the automated efforts of Chaos Monkey, Chaos Gorilla, and a slew of other similar utilities, failure is simulated in order to develop more resilient processes, tools, and capabilities.

John Ciancutti of Netflix writes, “If we aren’t constantly testing our ability to succeed despite failure, then it isn’t likely to work when it matters most — in the event of an unexpected outage.”

NoSQL

A third illustration of the growing fascination with resilient, self-healing systems is the transformation now going on in the data realm. Data and metadata management have evolved considerably from the relational databases of yore. Modern data management strategies tend to be distributed, fault-tolerant, and in some cases even self-heal by spawning new nodes as needed. Examples include Google FS / Bigtable, in-memory datastores like Hazelcast or SAP’s HANA, and distributed data management solutions like Apache Cassandra.

Miko Matsumura of Hazelcast notes, “Virtualization and scale-out power new ways of thinking about system stability, including a shift away from ‘reliability,’ where giant expensive systems never fail (until they do, catastrophically), and towards ‘resiliency,’ where thousands of inexpensive systems constantly fail—but in ways that don’t materially impact running applications.”

Keeping pace with the cool kids

It’s often said that the only constant is change. The DevOps movement positions organizations to embrace change, rather than fear it. Continuous integration, continuous delivery, and continuous feedback loops between dev teams and ops teams facilitate an enhanced degree of agility and responsiveness.

As business and society evolve, our system design priorities must adapt in parallel. The cool kids will change the game again at some point, but for right now, “change” means designing systems and supporting processes that are responsive and adaptable by prioritizing resilience over reliability.

Read the original blog entry...

More Stories By Kyle Gabhart

Kyle Gabhart is a subject matter expert specializing in strategic planning and tactical delivery of enterprise technology solutions, blending EA, BPM, SOA, Cloud Computing, and other emerging technologies. Kyle currently serves as a director for Web Age Solutions, a premier provider of technology education and mentoring. Since 2001 he has contributed extensively to the IT community as an author, speaker, consultant, and open source contributor.

@ThingsExpo Stories
NHK, Japan Broadcasting, will feature the upcoming @ThingsExpo Silicon Valley in a special 'Internet of Things' and smart technology documentary that will be filmed on the expo floor between November 3 to 5, 2015, in Santa Clara. NHK is the sole public TV network in Japan equivalent to the BBC in the UK and the largest in Asia with many award-winning science and technology programs. Japanese TV is producing a documentary about IoT and Smart technology and will be covering @ThingsExpo Silicon Val...
New competitors, disruptive technologies, and growing expectations are pushing every business to both adopt and deliver new digital services. This ‘Digital Transformation’ demands rapid delivery and continuous iteration of new competitive services via multiple channels, which in turn demands new service delivery techniques – including DevOps. In this power panel at @DevOpsSummit 20th Cloud Expo, moderated by DevOps Conference Co-Chair Andi Mann, panelists will examine how DevOps helps to meet th...
SYS-CON Events announced today that Hitachi Data Systems, a wholly owned subsidiary of Hitachi LTD., will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City. Hitachi Data Systems (HDS) will be featuring the Hitachi Content Platform (HCP) portfolio. This is the industry’s only offering that allows organizations to bring together object storage, file sync and share, cloud storage gateways, and sophisticated search an...
SYS-CON Events announced today that Hitachi, the leading provider the Internet of Things and Digital Transformation, will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Hitachi Data Systems, a wholly owned subsidiary of Hitachi, Ltd., offers an integrated portfolio of services and solutions that enable digital transformation through enhanced data management, governance, mobility and analytics. We help globa...
SYS-CON Events announced today that Hitachi, the leading provider the Internet of Things and Digital Transformation, will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Hitachi Data Systems, a wholly owned subsidiary of Hitachi, Ltd., offers an integrated portfolio of services and solutions that enable digital transformation through enhanced data management, governance, mobility and analytics. We help globa...
The explosion of new web/cloud/IoT-based applications and the data they generate are transforming our world right before our eyes. In this rush to adopt these new technologies, organizations are often ignoring fundamental questions concerning who owns the data and failing to ask for permission to conduct invasive surveillance of their customers. Organizations that are not transparent about how their systems gather data telemetry without offering shared data ownership risk product rejection, regu...
SYS-CON Events announced today that SoftLayer, an IBM Company, has been named “Gold Sponsor” of SYS-CON's 18th Cloud Expo, which will take place on June 7-9, 2016, at the Javits Center in New York, New York. SoftLayer, an IBM Company, provides cloud infrastructure as a service from a growing number of data centers and network points of presence around the world. SoftLayer’s customers range from Web startups to global enterprises.
With major technology companies and startups seriously embracing IoT strategies, now is the perfect time to attend @ThingsExpo 2016 in New York. Learn what is going on, contribute to the discussions, and ensure that your enterprise is as "IoT-Ready" as it can be! Internet of @ThingsExpo, taking place June 6-8, 2017, at the Javits Center in New York City, New York, is co-located with 20th Cloud Expo and will feature technical sessions from a rock star conference faculty and the leading industry p...
SYS-CON Events announced today that Juniper Networks (NYSE: JNPR), an industry leader in automated, scalable and secure networks, will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Juniper Networks challenges the status quo with products, solutions and services that transform the economics of networking. The company co-innovates with customers and partners to deliver automated, scalable and secure network...
Five years ago development was seen as a dead-end career, now it’s anything but – with an explosion in mobile and IoT initiatives increasing the demand for skilled engineers. But apart from having a ready supply of great coders, what constitutes true ‘DevOps Royalty’? It’ll be the ability to craft resilient architectures, supportability, security everywhere across the software lifecycle. In his keynote at @DevOpsSummit at 20th Cloud Expo, Jeffrey Scheaffer, GM and SVP, Continuous Delivery Busine...
Bert Loomis was a visionary. This general session will highlight how Bert Loomis and people like him inspire us to build great things with small inventions. In their general session at 19th Cloud Expo, Harold Hannon, Architect at IBM Bluemix, and Michael O'Neill, Strategic Business Development at Nvidia, discussed the accelerating pace of AI development and how IBM Cloud and NVIDIA are partnering to bring AI capabilities to "every day," on-demand. They also reviewed two "free infrastructure" pr...
SYS-CON Events announced today that T-Mobile will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. As America's Un-carrier, T-Mobile US, Inc., is redefining the way consumers and businesses buy wireless services through leading product and service innovation. The Company's advanced nationwide 4G LTE network delivers outstanding wireless experiences to 67.4 million customers who are unwilling to compromise on ...
SYS-CON Events announced today that Super Micro Computer, Inc., a global leader in compute, storage and networking technologies, will exhibit at SYS-CON's 20th International Cloud Expo®, which will take place on June 6-8, 2017, at the Javits Center in New York City, NY. Supermicro (NASDAQ: SMCI), the leading innovator in high-performance, high-efficiency server technology, is a premier provider of advanced server Building Block Solutions® for Data Center, Cloud Computing, Enterprise IT, Hadoop/...
NHK, Japan Broadcasting, will feature the upcoming @ThingsExpo Silicon Valley in a special 'Internet of Things' and smart technology documentary that will be filmed on the expo floor between November 3 to 5, 2015, in Santa Clara. NHK is the sole public TV network in Japan equivalent to the BBC in the UK and the largest in Asia with many award-winning science and technology programs. Japanese TV is producing a documentary about IoT and Smart technology and will be covering @ThingsExpo Silicon Val...
In his general session at 19th Cloud Expo, Manish Dixit, VP of Product and Engineering at Dice, discussed how Dice leverages data insights and tools to help both tech professionals and recruiters better understand how skills relate to each other and which skills are in high demand using interactive visualizations and salary indicator tools to maximize earning potential. Manish Dixit is VP of Product and Engineering at Dice. As the leader of the Product, Engineering and Data Sciences team at D...
The 20th International Cloud Expo has announced that its Call for Papers is open. Cloud Expo, to be held June 6-8, 2017, at the Javits Center in New York City, brings together Cloud Computing, Big Data, Internet of Things, DevOps, Containers, Microservices and WebRTC to one location. With cloud computing driving a higher percentage of enterprise IT budgets every year, it becomes increasingly important to plant your flag in this fast-expanding business opportunity. Submit your speaking proposal ...
The age of Digital Disruption is evolving into the next era – Digital Cohesion, an age in which applications securely self-assemble and deliver predictive services that continuously adapt to user behavior. Information from devices, sensors and applications around us will drive services seamlessly across mobile and fixed devices/infrastructure. This evolution is happening now in software defined services and secure networking. Four key drivers – Performance, Economics, Interoperability and Trust ...
SYS-CON Events announced today that CollabNet, a global leader in enterprise software development, release automation and DevOps solutions, will be a Bronze Sponsor of SYS-CON's 20th International Cloud Expo®, taking place from June 6-8, 2017, at the Javits Center in New York City, NY. CollabNet offers a broad range of solutions with the mission of helping modern organizations deliver quality software at speed. The company’s latest innovation, the DevOps Lifecycle Manager (DLM), supports Value S...
With billions of sensors deployed worldwide, the amount of machine-generated data will soon exceed what our networks can handle. But consumers and businesses will expect seamless experiences and real-time responsiveness. What does this mean for IoT devices and the infrastructure that supports them? More of the data will need to be handled at - or closer to - the devices themselves.
Web Real-Time Communication APIs have quickly revolutionized what browsers are capable of. In addition to video and audio streams, we can now bi-directionally send arbitrary data over WebRTC's PeerConnection Data Channels. With the advent of Progressive Web Apps and new hardware APIs such as WebBluetooh and WebUSB, we can finally enable users to stitch together the Internet of Things directly from their browsers while communicating privately and securely in a decentralized way.