System designers may be required to allow third-party untrusted applications to run on high-impact security category systems. They may want to isolate these applications from all resources on the host besides a limited set, to limit the blast radius if they are compromised. The designer would prefer to run untrusted applications on an entirely separate operating system kernel (i.e., a virtual machine), or better yet, on entirely separate hardware. However, imagine that business requirements mandate that those options are unavailable – instead, these untrusted applications must share a kernel which has access to high-impact information outside of the applications’ purview. What can be done?

Linux containerization controls may help with that. The controls can be combined to contain the application process. But there are a lot of control options, including namespaces, capabilities, cgroups, mandatory access controls, and seccomp. Where to start – what is the best way to secure a container?

Many containerization projects exist, all of which are abstractions that operationalize the same available set of Linux kernel containerization controls. One specific containerization solution that some of our clients have used is LXC. They deploy it on resource-constrained devices such as small routers, chosen for its relatively small installation footprint compared to other options such as Docker, and also chosen for its integration with ecosystems such as OpenWRT and its derivatives. This post compares security guidance from the LXC project against guidance from the Docker project, the latter chosen for comparison due to its popularity and therefore increased likelihood of reader familiarity and use.1

The Linux concept of ‘root’ in two different ways that it can apply to containers

First, know that by default, LXC and Docker container processes run as the ‘root’ system user. But this statement has nuance2:

  1. While a container main process may be started as root, that process may itself fork and run the primary container application as a non-root user. For practical intents and purposes, in this situation, the “container” refers to the forked processes, where if these do not run as root, then it is said that the container itself does not run as root.
  2. Linux namespace controls can be used to create a new ‘user namespace’ for a container process, mapping ‘root’ for the container to a ‘nobody’ user on the host. This can be confusing because from within a user-namespaced process, a process may be running as a user identified as root (ID 0), and that process may indeed have effective root-level permissions over any accessible resources within the process’ various namespaces – but, outside of those namespaces, that process can be powerless.

While controls (1) and (2) above are both container security best-practices, they are independently implementable. Consider:

  • A container that does not have its own user namespace – i.e., where the root user in the container corresponds to root on the system3 – may still run its subprocesses as non-root users.
  • A container that does have its own user namespace – i.e., where the root user in the container is not root on the system – may also run its subprocesses as non-root users.

Table 1 shows the four possible configuration-permutations for the two above-listed configurations. it shows that all permutations are possible.

Table 1: Illustrating that applying a user-namespace isolation control for a container is independent from whether a container runs its subprocesses as root

     
  Has own user-namespace Shares system user-namespace
Runs subprocesses as root :yes: possible :yes: possible
Runs subprocesses as non-root :yes: possible :yes: possible

LXC and Docker have different best-practice recommendations for containers

Considering just the two container isolation controls of (1) running the container application as a non-root user and (2) creating a user namespace for containers, LXC and Docker have different emphases on the importance and priority of each. In fact, the situation is more stark: LXC puts one first, while Docker the other.

Docker’s guidance

The Docker project focuses on using a low-privilege (i.e., non-root) user to run a container’s primary application. If that is not possible, then use a user namespace.

From https://docs.docker.com/engine/security/userns-remap/:

The best way to prevent privilege-escalation attacks from within a container is to configure your container’s applications to run as unprivileged users. For containers whose processes must run as the root user within the container, you can re-map this user to a less-privileged user on the Docker host. The mapped user is assigned a range of UIDs which function within the namespace as normal UIDs from 0 to 65536, but have no privileges on the host machine itself.

In summary, for Docker, the user namespace is secondary to the process run-as user.

LXC’s guidance

Contrastingly, the LXC project focuses entirely on whether a container shares the system’s user namespace. Containers that do are called “privileged”. For the LXC definition of privileged vs unprivileged, what user the container processes run as does not matter.

From https://linuxcontainers.org/lxc/security/:

Privileged Containers

Privileged containers are defined as any container where the container uid 0 is mapped to the host’s uid 0. In such containers, protection of the host and prevention of escape is entirely done through Mandatory Access Control (apparmor, selinux), seccomp filters, dropping of capabilities and namespaces.

[…snipped…]

Unprivileged Containers

Unprivileged containers are safe by design. The container uid 0 is mapped to an unprivileged user outside of the container and only has extra rights on resources that it owns itself.

With such container, the use of SELinux, AppArmor, Seccomp and capabilities isn’t necessary for security. LXC will still use those to add an extra layer of security which may be handy in the event of a kernel security issue but the security model isn’t enforced by them.

A tabular comparison of the two projects’ positions is shown in Table 2. Note that of the four possible configuration-permutations shown, there are three that pass the Docker guidance, while only two pass the LXC guidance.

Table 2: Comparing whether containers with different isolation configurations pass the LXC and Docker best-practice guidance.

     
  Has own user-namespace Shares system user-namespace
Runs subprocesses as root Docker :yes:
LXC :yes:
Docker :no:
LXC :no:
Runs subprocesses as non-root Docker :yes:
LXC :yes:
Docker :yes:
LXC :no:

Now consider three warnings when configuring LXC and Docker containers.

Warning 1: For both projects, the defaults fail their own best-practice guidelines

System designers should be aware that regardless of recommended best-practice, the default configuration for both Docker and LXC is that containers:

  • share the system user namespace3. That is, for both, root in the container is root on the host.
  • run the main processes as root.

That is, unless specifically configured and run otherwise, defaults for both Docker and LXC containers fail their own projects best-practice security guidance. Neither unprivileged containers nor non-root main processes are out-of-the-box for either project.

To be fair, the LXC project is very up-front about the risks inherent in the default of privileged containers. Even the LXC quickstart documentation strongly nudges users towards safer, unprivileged containers. However, the Docker project’s installation instructions only makes a passing reference to the general security principle of Docker and root (not specifically “privileged” vs “unprivileged” containers) as a “note” on its “post-installation steps” page, linking to its documentation on “rootless mode”. The same “rootless mode” guidance is linked on a nested page under Docker’s security pages area. No nudge-towards-secure guidance anywhere near as strong as LXC’s was found in Docker’s quickstart documentation.

Warning 2: An attacker can achieve system-root in a guidance-compliant Docker container

Designers should be aware that if a container runs as a non-root user but does not have its own user namespace, it is still possible that if that container is compromised and an attacker elevates their privileges within the container context to the root user, that this is the same user as the system root user. If the system designer has configured a container assuming that compromise of it would not give an attacker system-root, and if no other isolation controls have been applied to a container (an admittedly uncommon situation), then at this point, compromise and privilege-escalation within the container equates to compromise of the entire system. This is true even if a container complies with the quoted Docker best-practice guidelines.

Warning 3: Containers that pass the recommendations can still be vulnerable in other ways

Designers should also note that there are many other isolation configuration settings possible for containers. Even if a container meets the baseline above discussed recommendations for LXC and Docker, there are still countless configurations that could enable an attacker to escape a compromised container. A future post will further explore default container isolation settings.

  1. We have seen and are familiar with other combinations of containerization solutions such as Podman, raw use of runc and crun, or other homemade solutions, and we are also familiar with orchestrators such as kubernetes. Future blog posts may explore these. 

  2. There are other Linux kernel isolation controls which can limit the powers of a process running as root, including dropping capabilities; creating namespaces for other resources like net, mounts, and PIDs; mandatory access controls; cgroups; and seccomp filters. But these are not explored in this post, in part because the guidance for these is not as dissonant between the LXC and Docker projects. 

  3. The term “system user namespace” is used here for convenience to refer to the kernel’s highest-level user namespace.  2