Docker Security In Production

Overview of the options available for securing Docker in production environments

Delve Labs was present during the GoSec 2016 conference, where our lead infrastructure architect presented an overview of the current options available for securing Docker in production environments.

Text from the slides follows:

Docker builds on Kernel & Host Security

Grsecurity kernel

Randomization++, Bound checking,
Fork delay, Hardened seccomp BPF

SELinux / AppArmor

Complex execution profiles, {White,Black}-listing

Sysctl settings

fd limit, IP stack, sysrq, buffers, etc.

Unattended-upgrades

And all the typical hardening
& distro compile flags!

Docker Daemon

Limit docker group : docker.sock

Access to socket = root

Authorization plugin API

Docker 1.10+: –authorization-plugin
should help mitigate previous issue soon

docker-machine & TLS

Use –tls-verify (port 2376)

SELinux / AppArmor Profile

apparmor.d/docker + restrictions
limit path, resources, etc.

Export logs outside of host

–log-driver= (syslog, fluentd, …)

cgroups hardware resource limits

Mitigate potential DoS attacks

Limit memory, disk, network I/O & CPU share

cgroups only limit resources share, not access

Not blocking access to:
kcore, modprobe, sysrq, mknod, eth0, …

You can define your own initial cgroup

–cgroup-parent to inherit a previous context

Limiting CPU usage

Limit the total or relative amount of CPU time share

–cpu-shares relative weight (== cpu_shares: 100)
–cpu-period CFS (QoS) period
–cpu-quota CFS (QoS) quota

Limit which CPU or RAM node can be used

–cpuset-cpus CPU affinity (== cpu_set: 0,1)
–cpuset-mems Memory NUMA node (ie: 0-3, 0,1)

Limiting memory usage

Limit a container’s memory usage

Limit: –memory=1g (== mem_limit:)
Soft Limit: –memory-reservation

Limit swap usage

Total Limit: –memory-swap (== memswap_limit:)
Swapiness: –memory-swapiness

** GRUB_CMDLINE_LINUX = “cgroup_enable=memory swapaccount=1” **

Limit container’s kernel memory usage

–kernel-memory limit
Verify the Out Of Memory kernel policy
–oom-kill-disable & –oom-score-ad

Device I/O & Filesystems

Put docker on its own partition

/var/lib/docker as a ZFS/BTRFS volume (snapshots, quotas)

Minimum rights

“rwm” options, i.e: –device=/dev/zero:/dev/zero:r

Mount root & volumes as read-only

For volumes: /path:roz (Zz = SELinux label)
for root (/): read_only: true
Use with –shm-size & /dev/shm for pid files, scratch, tmp, etc.
–tmpfs /run:rw,noexec,nodev,nosuid,size=8m

Limit allocated I/O bandwidth

–device-read-bps, –device-write-bps
–device-read-iops, –device-write-iops
–blkio-weight-device 10 -> 1000

Networking

Create an internal N-Tier architecture

networks: ( docker-compose 1.6+ & version: ‘2’ ) || –net=

Think about inter-container communication

–icc=false + –link= (but deprecated), –ip-forward=

Disable userland-proxy

–userland-proxy=false … saves memory & faster

Use iptables and tc

Limit access and use QoS if necessary.

System resources & ulimits

Set your typical soft & hard limits

Daemon: –default-ulimit nofile=50:100
Container: –ulimit nofile=50:100
compose 1.6+: ulimit: nofile: soft:50 hard:100

Prevent fork bombs: threads / process limits

compose 1.6+: ulimits: nproc: soft:32 hard:64
Docker 1.11+
& Kernel 4.3+: –pids-limit (cgroup support)

Think about your restart policy

restart: always? no?

Namespaces

Currently namespaced resources

Audit, cgroups, IPC, mount, NET, PID, Syslog, UID, UTS
–userns-remap=default (new in 1.10+), *but*:
Per daemon, not per container (–userns=host not yet in compose)
Volumes UID/GID also remapped…
Incompatible with IPC/PID/NET NS sharing…
i.e. –net=container:app1, –readonly filesystem…

NOT (yet) Namespaced

The Kernel, LSM, UID (by default), keyring,
ring buffer (dmesg), /proc/{sys}, /sys, /dev/{shm} …

A lot of work & cleanup still required for namespaces

Many holes over the years:
CVE-2010-0006, CVE-2011-2189, CVE-2013-1858, CVE-2013-1956, CVE-2013-4205,
CVE-2014-4014, CVE-2014-5206, CVE-2014-5207, CVE-2014-8989, CVE-2015-8709, (!)

Capabilities

Useful but incomplete security model

Some are very granular: MKNOD
Others give you root: SYS_ADMIN

Use whitelisting: –cap-drop=all

Then –cap-add=SETUID etc, until it runs

RUN setcap cap_mknod /bin/mknod

Use instead of suid binaries

Default Capabilities are inadequate

SETUID, SETGID, MKNOD, …

Seccomp (Secure Computing)

Extremely granular filter

BPF filters of syscalls + arguments
Docker default blacklist (whitelist in the future)

Use tools to create profiles

dockersl.im, genSeccomp.sh, etc.
strace -c -f -S name ls 2>&1 >/dev/null | tail -n +3 | head -n -2 | awk ‘{print $(NF)}’

–seccomp:/path/profile.json

Disable default Seccomp filtering –seccomp:unconfined

Use security_opt: – no-new-privileges

Keeps UID, GID & LSM Labels + can’t gain Capabilities/SUID

Swarm Networking [1.12+]

Swarm init / join

Expose master nodes carefully (hold cluster’s secrets)
Mutually auth. TLS, AES-GCM, 12 hours key rotation (Gossip / Raft)

Use overlay network encryption

docker network create -d overlay -o encrypted mynet
– Keys shared with tasks & services, but not «docker run»

Mutually authenticate your microservices too

Microservices should not rely on overlay encryption:
Authenticate & Encrypt [container ↔ container] communications

«docker-compose bundle» – experimental status

Lacks support for most useful runtime security options, maybe in 1.13+?

Containers Runtime Security

Never use –privileged

Use granular solutions previously described

Run process as a user

Don’t run inside container as root: use nobody
Remove SUID, strip unused files, etc.

Layer as many security features

Not all of them will apply, work, be enabled, etc.

Don’t forget to harden applications!

NGINX configs, exposed services, databases, etc.

References

Related