Running Backups via Docker and Cron

Published: 2022-09-06

One of the hardest parts of self-hosting your services is the fear of making mistakes. Luckily, there are tools available to make you sleep well at night such as password managers and reproducible configurations via Docker Compose. As the next step, let's setup automated backups.

There are multiple ways to do it. One way would be to setup a scheduled CI job at your favourite Git forge. For example, you could setup an FTP server at your server and backup the files from one location to the other from within GitHub Actions. The drawback here is that the data will flow via the scheduled job and that the job requires access to the data.

Instead of that, it's probably better to run something on the server. The server is, in fact, a Turing complete system with orders of magnitude more compute power than the system that executed the first moon landing. (Specifically, the moon landing used a 0.043 MHz processor and a modern one goes above 2000 MHz.) So, the server should be capable enough for running a scheduled backup.

As a backup tool, I'll use rclone here assuming that the backups aren't too big. If your backups are big, take a look at borg. Here, we'll stick to rclone which has a lot of storage systems, such as Google Drive or Backblaze, are built-in. As a backup location, I've been going through the list from rclone and would probably advice Microsoft Azure as a backup target. Azure is reasonably easy to setup, they are unlikely to go bankrupt any time soon, and the blob storage has some additional options such as versioning to make your backup even more secure. Of course, you can pick your own back end or send the data to a server that you own. I would advice against sending backups to one of your own cloud servers since the price per GB is often quite high and it's good to have your backup data in a completely separate system.

Anyway, back to the task at hand. To setup a reproducible system, let's setup the backups with Docker Compose. The idea of the setup is to run a scheduled task which copies all the important data from the server and moves it to the remote location. To do so, we combine rclone with cron. Cron is a unix job scheduler available in most Linux distributions including the very small Alpine distribution.

Let's define a docker-compose.yml:

version: '3'

services:
  backup:
    # This image is Alpine-based.
    image: 'rclone/rclone:1.65'
    container_name: 'backup'
    # Reset the entrypoint, rclone set it to `rclone`.
    entrypoint: ''
    # Thanks to https://stackoverflow.com/a/47960145/5056635.
    command: 'crond -f -d 8'
    logging:
      driver: 'json-file'
      options:
        max-size: '10m'
        max-file: '10'
    volumes:
      - './backup.sh:/backup.sh'
      - './cronjobs:/etc/crontabs/root:ro'
      - './log:/backup/log:rw'
      - '/data:/data:ro'
    env_file:
      - 'AZURE_KEY.env'
    restart: 'unless-stopped'

This defines a backup service which runs the cron jobs defined in ./cronjobs. Place the following cronjobs file in the same directory as docker-compose.yml:

20 3 * * * /backup.sh >> /backup/log 2>&1
# End this file with an empty new line

This runs the /backup.sh script every night at 3:20. For more information about cron jobs, see the Wikipedia page. The 2>&1 ending is something about not sending email reports. I'm not sure to be honest, I just copied it like that and it works.

Finally, put the following in backup.sh:

#!/bin/sh

set -e

cp -r /data /data-copy

date > /data-copy/last-run.txt

rclone \
    --azureblob-account="myaccount1234" \
    --azureblob-key="$AZURE_KEY" \
    --verbose \
    copy /data-copy :azureblob:/server-data

Here, the azureblob-account is the name of your storage account in Azure and the azureblob-key is one of the "Access keys" that you can find under the "Security + networking" heading. Rclone can also encrypt data, but I'll skip that here because it makes the verification more difficult.

Next, let's do a test run in the Debugging section and talk about Verifying.

Debugging

To be fair, I hope that the configuration above works for you, but if it doesn't then things are a bit tricky. Small typos will result in weird errors. So, let's talk about debugging too.

To test your configuration, change the line in cronjobs to:

* * * * * /backup.sh >> /backup/log 2>&1
# End this file with an empty new line

and run

$ docker compose up

Once the Docker container is running, it should show:

Starting backup ... done
Attaching to backup
backup    | crond: crond (busybox 1.35.0) started, log level 8

and after a minute also:

backup    | crond: USER root pid   6 cmd /backup.sh >> /backup/log 2>&1

at that point, you can smash CTRL + C a few times to stop the container again. This log means that the job runs, so that's good. If it shows something else, then probably the command is correct. Double check that you didn't accidentally type four stars (* * * *) instead of five.

Next, you can check the logs in the ./log/ folder. Those should show the problem if there is one.

Verifying

The most important part is to regularily verify the backup. As can been seen in the backup.sh file that we've created, the remote location will contain a last-run.txt file. This file contains the date of the most recent backup. Make it a habit to verify that this date is recent. Preferably, use this check to download the data to your system. Having a local copy makes it even more unlikely that you'll lose the data. Even a cryptolocker won't get you as long as you don't manually copy the locked data to your system. So, always before downloading check that the systems are working.

Enough about cryptolockers. To download the data locally, put your key in a file so that only sudo cat /your/file can read it. In other words, use chown 0:0 /your/file and chmod 600 /your/file, so that the file permissions become -rw-------.

Then, create the following script:

#!/usr/bin/env bash

set -e

DIR="/dir/to/your/config/backup"
AZURE_KEY="$(sudo cat $DIR/AZURE.secret)"

rclone \
    --azureblob-account="serverbackup2341" \
    --azureblob-key="$AZURE_KEY" \
    --verbose \
    copy :azureblob:/server-data $DIR/data

echo "Downloaded data from $(cat $DIR/data/last-run.txt)"

You can now run this script to download the data and read the timestamp from last-run.txt. This should show a date which is at most 24 hours ago. If the date is from before that, then there is something wrong.

How often you check this is up to your risk tolerance. Also check whether the files are correctly copied by manually comparing the data on the server with the backup. See the post about Server Maintence for more information.

If you managed to get this far, then congratulations! You have reliable backups now.

The text is licensed under CC BY-NC-SA 4.0 and the code under Unlicense.