On February 28th Amazon’s S3 service (cloud storage) had a major outage which took down many parts of the internet for about four hours.
Outages happen and it’s part of doing business online but Amazon’s was interesting because of what caused it.
From the write up of the outage,
Unfortunately, one of the inputs to the command was entered incorrectly and a larger set of servers was removed than intended.
It sounds like an employee didn’t understand how to use the tool and its impact, or there was an honest mistake and tpyo.
The end result was cascading failures that ended up taking down the majority of services and applications online. It even affected Shopify and merchant stores.
As a software developer I think about these things all the time in systems I build.
But when was the last time you thought about these sorts of failures with your business?
Is there a critical process that you or your employees are running every day that relies on entering data exactly right? One where the mistakes could cost lose you time and money?
What checks and safety systems do you have in place for that?
These are questions you need to ask yourself in order to debug your business processes. With the level of software and automation in place now, it’s easy for a single mistake to creep in and cause major disruptions.
This is why I make Repeat Customer Insights only a reporting app. While it wouldn’t be very hard to have it emailing and contacting your repeat customers automatically on your behalf, that would introduce risk into your store’s customer relationship.
Instead, it’s a reporting app that you use to evaluate your repeat customers, make decisions, and put those decisions into action in other places (e.g. your email campaigns).