Privilege Escalation in Docker
Docker’s architecture is complex. It serves to pay the utmost attention to the configuration of authentication mechanisms during implementation. Attackers will attempt to leverage the complicated maze of controls and potential misconfigurations to escalate privileges; once in command of a container, they may be able to break out of isolation, compromise the underlying host, and move laterally in the network.
Escalation via Docker Daemon
The Docker daemon mostly runs with root privileges (even if an experimental rootless mode has been made available in the latest versions). It does not actively limit the containers’ access rights to the host, meaning containers can be launched to access the host operating system’s resources, such as its file system or network stack.
For this reason, giving someone access to the Docker daemon is equivalent to giving them root access to the host operating system, i.e., handing them an escalated privilege.
Suppose an attacker has managed to infiltrate a container and has access to the Docker daemon. In that case, they can misuse the legitimate escalation techniques to escape the containerization and compromise the host.
Docker provides a REST API for Docker daemon interaction (known as the Docker Engine API) that can be accessed via a UNIX socket or a TCP port. It is commonly used by the Docker CLI to communicate with the Docker daemon.
UNIX socket
The non-networked /var/run/docker.sock
UNIX socket is the default method of accessing the Docker Engine API locally. In the default Docker configuration in Linux, only “root” and users in the “docker” group can access the socket.
TCP socket
The Docker Engine API can be optionally exposed on the network, meaning that everyone who can access the TCP port also has full access to the Docker daemon.
It is strongly recommended to avoid exposing the Docker Engine API to the network. Even if the TCP port is not reachable from the network, the daemon is still prone to Server-Side Request Forgery attacks, Privilege Escalation, and Container Breakout attacks staged from within compromised containers.
If exposing Docker in the network configuration is absolutely necessary, securing the API endpoints with HTTPS and certificates and ensuring that it is reachable only from trusted networks is crucial.
It is conventional to use port 2375 for un-encrypted and port 2376 for encrypted communication with the daemon.
Privileged Containers
Running a container with the --privileged
flag effectively disables all isolation features. A privileged container has all available capabilities and complete access to all the host’s devices. It also runs with all available isolation techniques, such as cgroups, AppArmor, and SECcomp as disabled.
In other words, an attacker on a privileged container can get “root” access on the underlying host with little effort.
This flag exists solely to enable fringe use-cases, like running Docker in Docker. Realistically, this lack of isolation should almost always be avoided in favor of more fine-grain oversight and control of capabilities; using cap-add
and cap-drop
flags is highly recommended in this case.
Capabilities
There are a variety of Linux capabilities that can be leveraged as isolation techniques by Docker to restrict the privileges of the process running in a container. By default, Docker starts containers with a restricted set of Linux capabilities, shown below.
Capability Key | Capability Description |
---|---|
AUDIT_WRITE | Write records to kernel auditing log. |
CHOWN | Make arbitrary changes to file UIDs and GIDs (see chown(2)). |
DAC_OVERRIDE | Bypass file read, write, and execute permission. checks. |
FOWNER | Bypass permission checks on operations that normally require the file system UID of the process to match the UID of the file. |
FSETID | Don’t clear set-user-ID and set-group-ID permission bits when a file is modified. |
KILL | Bypass permission checks for sending signals. |
MKNOD | Create special files using mknod(2). |
NET_BIND_SERVICE | Bind a socket to internet domain privileged ports (port numbers less than 1024). |
NET_RAW | Use RAW and PACKET sockets. |
SETFCAP | Set file capabilities. |
SETGID | Make arbitrary manipulations of process GIDs and supplementary GID list. |
SETPCAP | Modify process capabilities. |
SETUID | Make arbitrary manipulations of process UIDs. |
SYS_CHROOT | Use chroot(2), change root directory. |
Other Linux capabilities that are not granted by default may be added, and any default capabilities may be dropped by running containers with the cap-add
and cap-drop
options. Privileged containers have all capabilities.
Adding capabilities reduces container isolation and may pose security risks. Before adding capabilities, carefully evaluate the potential impact on the underlying environment.
References
MITRE - CWE 285 - Improper Authorization
OWASP Top 10 2021 - Broken Access Control
OWASP Top 10 2021 - Security Misconfiguration
MITRE - ATT&CK - Privilege Escalation
Docker - Protect The Daemon Socket