Container Orchestration and Self-Hosting

Published: 2022-08-26

When talking about containers, software engineers usually mean Docker containers. Docker containers are basically tiny systems with an application built-in. They add relatively little computation and memory overhead. In this tiny system, everything is configured correctly so that the application can run. By everything, I mean the correct files, file permissions, networking configurations, and dependencies. This solves the hassle of setting up those things correctly yourself. It also avoids hard-to-find bugs caused by the configuration from one application affecting the configuration of another. Or, updating a package to fix one application causing another application to break.

One problem that containers do not solve is how to set them up. Usually, containers do need to be started and need some persistent file storage or you lose your data once they reboot. This is the role of tools such as Docker Compose and container orchestration tools.

A few container orchestration tools are well-known, namely Kubernetes, HashiCorp Nomad, and Docker Swarm. These products are all aimed at large companies with complex setups. They assume that a service needs to run on multiple servers worldwide and that servers should never be offline. For our self-hosting purposes, those assumptions are unnecessary and cause the configurations to be overly complex.

For example, HashiCorp Nomad is often praised for being much simpler than Kubernetes. Still, they have to distinguish between a server, a job, a group, and a task. Each server may contain zero or more jobs, each job contains one or more groups, and each group contains one or more tasks. All tasks inside the same group will be put on the same server, but jobs don't necessarily always run on the same server. You probably see that this is complicated. In turn, this complication causes the configurations to become complex. As an example, to specify that some container needs some persistent storage, you have to specify the path three times. First on the server level in the server config config.hcl, specify that a certain path should be made available for tasks running on the server:

client = {
    enabled = true

    host_volume "gitea-data" {
        path = "/var/lib/gitea/data"
        read_only = false
    }
}

Then at the group level and the task level in the job configuration file gitea.nomad, make the volume available for the group and the task:

job "gitea" {
    datacenters = ["dc1"]
    type = "service"

    group "git" {
        count = 1

        volume "gitea-data" {
            type = "host"
            read_only = "false"
            source = "gitea-data"
        }

        task "gitea" {
            volume_mount {
                volume = "gitea-data"
                destination = "/etc/gitea"
                read_only = false
            }
        }
    }
}

This is clearly verbose and unwieldy. This doesn't mean that Nomad is bad software; it's great if you have to deal with multiple servers. However, if you just want to self-host a few services on a server near you, then container orchestration tools are too much.

Instead, this site will mainly stick to Docker Compose.

Futher reading:

The text is licensed under CC BY-NC-SA 4.0 and the code under Unlicense.