After some discussion of Netflix and the Chaos Monkey on our DevOps blog, I thought I would offer some detail of how Chaos Monkey and the Simian Army works. It's a great case study, posted on April 30th by C. Aaron Cois, from the SEI Institute at CMU. I did not think to discuss until it was brought up. Maybe next semester, we'll start with it.
Anyway, Netflix's streaming service is a large distributed system hosted on Amazon Web Services (AWS). Since there are so many components that have to work together to provide reliable video streams to customers across a wide range of devices, Netflix engineers needed to focus heavily on the quality attributes of reliability and robustness for both server- and client-side components. In short, they concluded that the only way to be comfortable handling failure is to constantly practice failing. To achieve the desired level of confidence and quality, in true DevOps style, Netflix engineers set about automating failure.
Basically, you may have noticed that while the software is impressively reliable, occasionally the available streams of videos change. Sometimes, the 'Recommended Picks' stream may not appear, for example. When this happens it is because the service in AWS that serves the 'Recommended Picks' data is down. However, your Netflix application doesn't crash, it doesn't throw any errors, and it doesn't suffer from any degradation in performance. Netflix software merely omits the stream, or displays an alternate stream, with no hindered experience to the user, thus, exhibiting ideal, elegant failure behavior.
To achieve this result, Netflix dramatically altered their engineering process by introducing a tool called Chaos Monkey, the first in a series of tools collectively known as the Netflix Simian Army. Chaos Monkey is basically a script that runs continually in all Netflix environments, causing chaos by randomly shutting down server instances. Thus, while writing code, Netflix developers are constantly operating in an environment of unreliable services and unexpected outages. This chaos not only gives developers a unique opportunity to test their software in unexpected failure conditions, but incentivizes them to build fault-tolerant systems to make their day-to-day job as developers less frustrating.
This is DevOps at its finest: altering the development process and using automation to set up a system where the behavioral economics favors producing a desirable level of software quality. In response to creating software in this type of environment, Netflix developers will design their systems to be modular, testable, and highly resilient against back-end service outages from the start.
Thursday, November 30, 2017
Wednesday, November 15, 2017
Simple example of State Diagram
As with the Sequence Diagram, the same author provides a partial example of a state diagram. I hope this video helps to explain the process without having to go into great detail on all parts.
Sequence Diagram Video
Here is a video from Udacity explaining the System Sequence Diagram in a way that may make more sense for how it works. Feel free to ask any questions.
Sunday, November 5, 2017
Bitcoin Basics
I know we talked about Blockchain, but the question is always asked about why bitcoin prices fluctuate so much. I found this link helpful to understanding digital currency basics at Coinbase. You can also ask a question there, if these links are not helpful.
DevOps: What is DORA and Why You Should Care
Here is an introduction to DORA, and Dr. Nicole Forsgren, who was a Ph.D. student in MIS here.
Gene Kim, Jez Humble and Dr. Nicole Forsgren launched a new company called DORA. DORA stands for DevOps Research and Assessment.
DORA and the individuals behind it have been providing a lot of the science and analysis behind the State of DevOps survey and report for a number of years now (I posted the 2016 State of DevOps report on D2L). Here is the 2017 State of DevOps report (with new measures).
But what is DORA really about? What is the business model to generate revenue? According to DORA CEO, Dr. Nicole Forsgren, after developing the annual State of DevOps Survey for years, they now have something like 25,000 responses over the past years. DORA has been able to create baselines and models from which they can compare how your organization compares to others who have taken the survey. They can pinpoint where you are lacking or not performing up to par, as well as where you are over performing. The entire process is built on rock solid statistical modeling and has already proven itself with several large enterprises.
Subscribe to:
Posts (Atom)