Air gapped backup strategy for smaller Aurora AWS databases

Hiding behind seven proxies

This sponsored article over at The Register is actually a pretty good overview of why Aurora is great if you aren't interested in doing a lot of database maintenance and need something that will scale. I've been using Aurora for a few years now because I'm one of those guys who just isn't super enthusiastic about server maintenance tasks. I do them, but I just can't get excited about it.

My setup includes keeping 30 days of daily snapshots, which is OK for my risk management plans but occasionally I like to have a fallback that isn't in the cloud. In the unlikely event that somebody hacked into your AWS account and started wreaking havoc, you need some way of making sure your data can be recovered. If, like most of us, you are trying to keep a lid on your cloud costs, you need to minimise the amount of data that is transferred out of AWS.

Overkill?

Sure, it's overkill, but I also like to have a copy of the production database available locally (in my case, using MariaDB on Debian) so this strategy kills two birds with one stone. The storage requirements aren't massive for me so the following approach works pretty well:

Inside AWS

  1. A nano ec2 server running Amazon linux, with the mysql client tools and the s3fs-fuse file system installed. You can always use the aws CLI to do this but the s3fs file system just makes things feel more natural and I don't have to allocate a pile of extra storage on the ec2 instance. Having a small, general purpose ec2 instance in your setup is actually quite handy for other jobs. You can always shut it down when it's not in use although if you are going to use cron for your backup scheduling, you will need to leave it running.
  2. An S3 bucket for your backups. Mount it to your ec2 instance.
  3. A backup script. Make it as fancy as you want, there are plenty of great examples for rotating backups. Mine is rather more simple as I only run it once a week. I also make sure everything is gzipped so that when the data is transferred outside of AWS the costs are minimised.
  4. A cron entry that runs your script.

Outside AWS

  1. An old server with a tape drive. I'm using a retired HP Proliant ML330 G6 for this. It has a HP LTO Ultrium 2 tape drive (max capacity: 400gb). Tapes are still available quite cheaply. Why tape? Tape is awesome. SSDs and good old fashioned mechanical hard drives have potentially much larger capacity, but I'd bet on tape lasting longer. It's more portable, you can move them around without damaging them and they have stood the test of time.
  2. A small script that can run your aws s3 sync command and a place to keep the files
  3. Another script that dumps stuff to your tape. In my case this is very simple: tar cvf /dev/st0 ./backups. Makes a tar(1) archive on the tape. Could and should compress it I guess but that's up to you.

General strategy

  1. Set up your nano ec2 instance to regularly back up your database to s3
  2. Set up a local script in your premises to aws s3 sync your backup s3 bucket.
  3. Write your sql dumps to tape. I do this as a manual job on Monday morning (the weekly backup is set to run Sunday evening).
  4. Rotate your tapes. Here is a sample strategy. Best practise is to keep them in something fire proof.
  5. Regularly test that the backup tapes work. In my case, I have that local MariaDB instance I can refresh. The refresh strategy is to take a tape, tar xvf it back into the local file system, load it up into the local database and run your application tests.

Things to be aware of

You may have to modify your mysql dump from Aurora to work with MariaDB. There are functions/stored procedures where Aurora is happy to have statement terminators (usually a semicolon) scattered through them which MariaDB doesn't like very much when you feed it a backup script via the mysql client. When doing a restore you may have to recreate these by modifying the create scripts somewhat.

Old servers are awesome

There is no better time to stock up on good quality server parts for your lab. Industrial quality small business servers from HP or Dell are easy to maintain, have good support and they are much easier for things like RAID for drives and tape storage. They often support multiple CPUs and gargantuan amounts of memory, both of which are at rock bottom prices as the cloud migration is starting to hit even small businesses. The mini towers like the ML330 also makes a perfectly good workstation. Unlike a lot of consumer grade PCs they are designed to be worked on which makes them ideal for a lab.

Popular posts from this blog

Tailscale ate my network (and I love it)

Stupid stunts with WSL2 , Python3 and AWS ECS

Time machine: Solaris 2.6 on QEMU