What is AWS Outage

AWS Articles

AWS Quiz

Test and Explore your knowledge

You might have heard about the AWS outage, right?

But, most of you might not know the exact reason for the AWS outage and how Amazon fixed this issue.

This blog gives you the perfect answer to your question with insightful information.

In this article, we are going to discuss what is Amazon S3, what is AWS outage is, how it happened, and how Amazon resolved the issue.

Want to become a Certified AWS Solution Architect? Visit here to learn AWS Training

What is Amazon S3

Amazon's simple storage service (S3) is a storage that can store and search for big data from any data sourcing centers like mobile applications, websites, and other devices. With the help of Amazon S3, any developer can access data that is durable, secure, and highly scalable.

AWS outage is a service interruption that occurred in the AWS cloud platform due to which a large volume of data was lost.

Now, we will have a look into how the interruption occurred.

In the early hours of February 28th, 2017, in the Northern Virginia region, the AWS S3 team was working on debugging an S3 Billing system issue. One of the team members while doing so entered a wrong command. Due to the wrong command, a large amount of data was lost. The regions that got affected by the outage were the Northern Virginia region, East Ashburn region, and many other parts of the world.

You may think that it is not a big deal as it's just a command, and you can wipe it off with a backspace key, am I right?

This is actually a very big issue as that single wrong command has swiped off a large set of servers supporting two S3 subsystems. Removing a server means losing data.

However, AWS offers a data recovery feature, but this time, it doesn’t work.

This is because the index subsystem manages the metadata and data source location information is lost in one subsystem, and the second subsystem that manages the allocation of new data storage objects is also lost.

The worst part is, there is another region that is relying on the S3 Service. It also got impacted as S3 is not responding to the service requests even though the system gets restarted. All the S3 APIs associated are not available.

To back up the lost data, both S3 subsystems should restart, and this takes a lot of time.

All of this wasn’t just affecting Amazon S3 customers but a few AWS services as well, such as CloudWatch, WorkSpaces, Simple Email Service, Cognito, and DynamoDB. Some of them have suffered complete disruption creating an error like the one mentioned below.

Frequently Asked AWS Interview Questions

How does Amazon Solve It?

Amazon said that it designed the system to work even if a big part of it gets failed. Also, it acknowledged that the S3 subsystem had not fully restarted because the subsystem was in the offline state for many years.

So, Amazon changed its system tool and rewritten the code so that even its engineers cannot make the same mistake again, and also, it is doing safety checks in the system to avoid such problems.

After four hours of hard work, Amazon got back all the lost data and it apologized to the customers for the trouble caused by the storage system.

Exploring the structured curriculum of AWS certification training in Ahmedabad can help in identifying and fixing specific gaps in one's understanding of multi-cloud operations.

Conclusion

We hope you have got the answers to all your questions. Now, there are no issues with AWS S3 and even if any issue raises in the system, Amazon is ready to clear the issues without affecting its customers' storage within no time.

Name	Dates
AWS Training	Jul 18 to Aug 02	View Details
AWS Training	Jul 21 to Aug 05	View Details
AWS Training	Jul 25 to Aug 09	View Details
AWS Training	Jul 28 to Aug 12	View Details

AWS Outage

What is Amazon S3