Earlier this month, I got started with Amazon Web Service, the golden goose egg of the retail giant of the same name.
Outside of the IT sphere, most people have no clue how massive their favorite online retailer’s cloud business is (hint, it was $2.12 Billion in just the second quarter of 2019). In fact, it’s so massive, in 2017 a typo in Amazon’s S3 code caused significant chunks of the internet to go dark.
AWS has a ton of services to run any organization’s infrastructure — great or small — and it’s all virtually unlimited.
And in most cases, it’s very cheap.
It’s cheap enough to justify offloading a lot of my digital needs onto AWS while this website simultaneously generates no revenue. Because I hate ads.
However, AWS is not for the faint of heart. If you struggle setting up a WordPress site, I would seek help in understanding how AWS works and what it will mean for your wallet. There is no “Easy” mode in AWS.
These are the most pressing lessons I’ve learned over the last month that might be useful if you’re a beginner (like me) at AWS. Nothing here is ground-shaking info and is all well documented by AWS, but who reads documentation, right?
1. S3 objects are not files. And there are no folders.
S3 (Simple Storage Service) is AWS’s solution for bulk storage of objects. A lot of objects.
How many objects? Technically, it’s an unlimited amount of objects occupying an unlimited amount of space. It’s enough space for AWS to specify that your price goes down after you’ve used 4.5 petabytes!
But let’s get one thing straight. While the AWS Console makes managing these objects appear like a traditional file system, it’s not.
Normally we think of files and folders in a hierarchal structure, such as:
User/Documents/Spreadsheets/Finances.xlsx
Logically, we see “Users”, “Documents”, and “Spreadsheets” as separate, nested directories, with “Finances.xlsx” being the file name.
However, in S3, the full name of the object is actually User/Documents/Spreadsheets/Finances.xlsx. Everything before Finances.xlsx is also part of the object name, and are considered the object’s prefixes.
Why does this matter?
Well, on your computer, you can rename the folder Spreadsheets to Important and safely assume that all files formerly under Spreadsheets are still there.
Before:
User/Documents/Spreadsheets/Finances.xlsx
User/Documents/Spreadsheets/VacationPlan.docx
User/Documents/Spreadsheets/DishwasherManual.pdf
After:
User/Documents/Important/Finances.xlsx
User/Documents/Important/VacationPlan.docx
User/Documents/Important/DishwasherManual.pdf
In S3, this simply isn’t the case because directories don’t exist. Theoretically, if you renamed the prefix before Finances.xlsx, it only affects Finances.xlsx. VacationPlan.docx and DishwasherManual.pdf still have User/Documents/Spreadsheets in their prefixes.
Therefore, you can’t rename prefixes. If you want to store an object under a different prefix, you must copy to a new name and delete the old object.
Confusingly, the S3 console does lets you create these prefix and presents them like folders, but in reality it’s creating a zero-byte object named /User/Documents/Important/ .
To clarify, that zero-byte object that is NOT a file. If you don’t have that zero-byte object, and you delete only the “files” that you see under Spreadsheets, the “folder” ceases to exist.
2. CloudFront can be very expensive if you get hit with a lot of bandwidth.
Scenario: Let’s say you purchase a dashcam to protect yourself in the event of an accident, with the added benefit of capturing less-than-great drivers. Now let’s also say you upload some amusing clips to S3, serve them through CloudFront, and embed them on your site, then share your site to reddit.
Your bill:
As tempting as it may be to leave behind the snooping nature of Google products, it’s inevitable. Just host your videos on YouTube and save yourself a lot of money.
On an unrelated note, anyone wanna spot me lunch?
3. Glacier and Glacier Deep Archive are essentially permanent.
…because that’s how they were designed.
Think very carefully before you set the storage class of any object to Glacier or Glacier Deep Archive (GDA). Once you do, you cannot change anything about that object: it’s name (including prefixes), encryption method, tags… nothing.
The only way to change the metadata of a Glacier/GDA object is to restore it (which cost money), copy it, and delete the object that’s in Glacier/GDA. If you’re deleting or overwriting said object before it’s been in Glacier for 90 days or Glacier Deep Archive for 180 days, that cost money, too.
Therefore, a last-minute decision to reorganize your prefixes, or to split or consolidate your buckets once an object goes into Glacier/GDA will cost you. Think ahead and plan how you’re going to organize your objects.
The only way to cheaply remove a lot of Glacier/GDA objects is to use Lifecycle policies to expire them after the minimum 90/180 days are up.
Reserve Glacier for long-term backups that you probably won’t need for a long time, and GDA for data that you most likely will never need to see again, but really shouldn’t trash.
4. You should be using Intelligent-Tiering.
S3’s storage classes can be intimidating, but in my opinion there’s only three (maybe four) that a beginner should be using: Standard, Intelligent-Tiering, and Glacier/GDA (when appropriate).
Standard is the most expensive storage class, but that sentence alone is a little deceiving. As of this post, Standard was $0.023/GB-month. That’s pretty cheap. Keeping 100GB highly available costs you $2.30/month.
But what if you have more than 100GB? What if you have 100TB? Suddenly you’re staring at a $2,300/month bill for object storage.
Luckily they have a Standard-Infrequent tier for data that’s accessed less… well, frequently… and that’s only $0.0125/GB-month.
But what if you don’t know what will be frequently accessed or not? Luckily for you, there’s Intelligent-Tiering, which automatically switches your objects — individually — between Standard and Standard-Infrequent.
AWS charges a fee for monitoring and tiering your objects, but at $0.0025 per 1,000 objects, that 100TB would have to be spread out over 200,000,000 500KB files before you came close to negating the cost-savings.
For all my larger objects — such as full-resolution images and video files — I use Intelligent-Tiering, and standard for everything else.
5. CloudFront caches S3 objects for 24 hours.
This one burned me. I had uploaded a large PDF to my public bucket and sent the CloudFront link to someone. The file was too big to email so this was the best option for me. They needed some revisions, so I obliged and re-uploaded the PDF to my bucket and told them to try the link again.
Nothing had changed. This repeated five times before I finally figured it out what was happening.
CloudFront performs a lot better than a static S3 website for a lot of reasons, but one of them is built-in caching.
The first time my PDF was downloaded, CloudFront reached out to S3 to retrieve the object, stored it in edge cache, then served it. For the next 24 hours, CloudFront would not reach back out to S3.
If you find yourself in this predicament, you can invalidate edge caches against a single object, a prefix for multiple objects, or a batch operation for many objects.
6. Lifecycle rules only run periodically.
S3 Lifecycles are a great way to automatically downgrade an object’s storage-class to something cheaper, or to automatically delete it after a certain time period.
Let’s say you have a /dbbackups prefix where you keep nightly database dumps. You or your organization determines its unlikely you’ll need to go back further than 90 days for something like this. You can apply a lifecycle rule for just that prefix to tell S3 to automatically delete those objects after 90 days. Neat.
For another example, let’s say you have a /training prefix containing a bunch of training videos that occupy a decent amount of space, but if you don’t have many new hires, these videos are accessed infrequently. You can set a rule to downgrade all /training prefixed objects to Intelligent-Tiering 0 days after uploading, essentially meaning immediately.
It’s documented that these lifecycles process at approximately midnight (0000) UTC. However, in my experience, it can happy at virtually anytime.
If you apply a rule, wait, and find your objects either aren’t transitioning immediately or aren’t expiring once their time is up, and you’re certain your rules are applied correct, just keep waiting. Eventually the transition or expiration will happen, and your bill will honor when the transition or expiration should have occurred.
Amazon pinky-promises on that one.
That’s the gist of what has caused me a little head-scratching or grief over the last month. Do you have any pain-points or lessons learned the hard way with AWS? Let me know below!