Chaos Engineering

The systems are becoming more complex as the adoption of microservices and distributed cloud architecture grows. Though there are a wide range of tools to help build robust and fault tolerant systems, ultimately this complexity is making it difficult to predict failures. Failures are always a burden on finances. One method for identifying failures before they cause a costly outage is through chaos engineering. It uses the concept of breaking things on purpose and proactively keeps doing the stress testing in order to identify the failures. It involves basic steps, i.e., planning to experiment, which means thinking of a situation that could go wrong, testing it on a smaller area, and then keeping increasing its impact, until it is fully tested or the failure is identified. Chaos engineering has been well practiced at Expedia Group where a continuous delivery platform has been used to run experiments and contains a set of demo resources and applications to debug and test the applications.

A good article explaining about this experimenting: https://medium.com/expedia-group-tech/chaos-engineering-at-expedia-group-e51a0288ee2