Docker has been the most attractive way to virtualize applications. It offers fast, lightweight, secure, isolated (yet connected) containers that run where you put them.

That’s perfect, isn’t it? You write a piece of code, put it in a container, and ta-da! It is fast and secure now! Right… Is that so?

Well, there might be lots of things that have been thought for you, but don’t think that it would be enough. It’s never enough! Let’s go over the simple matters to keep in mind while deploying software over Docker:

TL;DR

  • Keep your image simple
  • Know what to include and exclude in the image
  • Monitor your container

1. Performance

You’ve written your code with well-thought optimizations, efficient algorithms, simple data protocols, etc… Good for you! That is what we call Source Code Level Optimization. And there is this concept, Deployment Level Optimization. This concept is the union of Build Level, Compile Level, Assembly Level and Run Time Level Optimizations, see Program Optimization (Wikipedia) for more. Anyway, let’s get to the point. Disasters may occur if you don’t take this subject seriously.

TL;DR

  • Prefer using smaller image bases
  • Include only the necessary artifacts in the image
  • Don’t wrap your image within countless layers
  • Don’t install unnecessary packages
  • Write your Dockerfile effectively

1.1 Create Lightweight Images

Performance is mostly correlated with being lightweight. And there are dead-simple ways to create lightweight images in Docker.

Base Image, the base of our container. It differs from application to application, obviously. But there is always a lighter, better image than you use. For example, instead of ubuntu or debian image, you can use alpine image. Instead of using python:3.8, you can use python:3.8-alpine. You can take a look at this example for lightweight image building.

1.2 Artifacts to Include in Image

COPY . /app

Hope you don’t write this line in your Dockerfile unless you know what you’re doing. You might be passing irrelevant artifacts like caches, .git/, tests, README.md, and these artifacts might contain secrets of application, hints about your application or system, etc… And if it doesn’t, it enlarges the image size for nothing.

Being explicit about COPY is a better practice. But it’s even better to be explicit and use just enough amount of COPY at the same time, no more (but still might be less). This will prevent creating lots of layers, which would harm performance. Layers will be mentioned in the next matter.

#COPY <src> <dest>
#COPY ["<src 1>", "<src 2>", ..., "<dest>"]
COPY ["./app", ".env", "/app"]
COPY ./sample_data /data

You should use a .dockerignore file to ignore files that you don’t want to include in your image. Example .dockerignore:

# Git
.git
.gitignore

# Docker
docker-compose.yml
.docker

# Other
**/*.md
LICENSE

1.3 Don’t End Up With a Heavy Onion

Layers in a docker image can be simply defined as changes on an image. Take a look at these commands:

  • FROM, creates a layer over the layers of the base image you use
  • COPY, adds files from your choice of the source to destination in a new layer
  • ADD, similar to COPY, except this command can unpack local packages
  • RUN, runs commands in a new layer
  • CMD, executes specified commands in a new layer which is the last layer.

When you use these commands, you stack up layers in your image and increase your image size.

You would use FROM and CMD only once, so they won’t stack layers up. But COPY and RUN can do that. Therefore, use these commands with caution.

Instead of doing this:

RUN apt-get update
RUN apt-get -y install git

do this:

RUN apt-get update && apt-get -y install git

1.4 Don’t Install Unnecessary Packages

If you’re not trying to reinvent the wheel, you must be using some packages additional to your base image, right? Good, okay. Now, the thing about installing packages is, sometimes they bring extra promotions with themselves. To prevent installing unused and good to have but doesn’t have to be installed packages, is to use some options of the package manager of your base image and/or leverage multi-stage build.

RUN apt-get update && \
    apt-get -y install --no-install-recommends git

For detailed information, check out my other blog post about Multi-Stage Builds.

1.5 A Good Dockerfile

Firstly, you need to learn docker key concepts to write a good Dockerfile. Take a look at my blog post Docker Key Concepts and Definitions.

If you’re already familiar with the key concepts, you can use Dockerfile linters like hadolint (not an affiliate link, just a FOSS project).

2. Security

Now again, let’s imagine together. You’ve written your code (seems like you’re very productive). You’ve used linters, security checks, all kinds of tests; you’ve done daily updates of softwares, packages. Everything looks okay, you’re good to go.

BUT, if you don’t think about your deployment security, you’re still as vulnerable as any application.

TL;DR

  • Minimal Base Images (again)
  • Use COPY instead of ADD if possible
  • You don’t need root privileges, use non-root user
  • Use benchmarking tools for security
  • Monitor your container

2.1 Prefer Minimal Base Images

Minimal base image means fewer libraries, fewer functions, fewer capabilities. And this results in a smaller attack surface.

Check out my blog on How to Create Minimal Docker Images to see a comprehensive example.

2.2 Use COPY Instead of ADD

COPY and ADD are basically the same, except that ADD can unpack .tar files. Thus, if you don’t want to include any local packages or unpack them, don’t use ADD.

2.3 Create a Non-root User

Create a non-root user and group in your container which has just enough permissions and use that user to run your program.

FROM python:3.8-alpine

COPY ./app /app

# Create `app` user group and then create...
#...`testuser` user under the group `app`
#Own `/app` directory with `testuser:app` user
RUN groupadd --gid 1000 app && \
    useradd --shell /bin/bash testuser app && \
    chown -R testuser:app /app

USER testuser

CMD ["python", "/app/app.py", "start"]

2.4 Use Benchmarking Tools

Alongside the linting tools, you can use benchmarking tools too, like Docker Bench Security (not an affiliate link just a FOSS project). You can easily automate your lints and benchmarks to achieve high-quality docker images.

2.5 Monitor Your Container

You’ve gone through every step of the best practices, you have the most secure and performant application in the world. It’s a beautiful feeling isn’t it, feeling safe?

How can you feel safe!? It’s never enough remember? You have to think from every aspect in terms of security and stability. Therefore, you have to keep an eye on your container after the deployment. Watch for some metrics like System Resource Usage, Network Bandwith Usage, Error Logs, System Performance, and any other metric that you could use.

Watching these metrics will give you insights about what to do better and how, where to fix in the code (you need to trace back your error logs), what optimizations should you implement, etc… You may consider these tools for monitoring (no affiliate links, all FOSS):

3. Maintainability

TL;DR

  • Use tags when pulling images
  • Pull your own images from your own private Docker Registry
  • Add metadata to Dockerfile

3.1 Use Tags When Pulling Images

Tags specify the version of the image. If you do not specify any tags like FROM python, docker daemon will pull the latest tag by default. So, when this python:latest image gets updated, your application and/or environment may break, some parts of your code may become deprecated, and so on… You have to specify precise tags like python:3.8, python:3.8.7 or even with SHA python@sha256:e2d2eff041f001de0db61ce386022a0df957868f29b26c0351e04413fe6059a2.

3.2 Your Own Docker Registry

Even if you pull your images from public registries with SHA digest, that image might get deleted or restricted from public usage. Therefore, the best way to make sure your image remains safe and unchanged is to keep them in your private registry. To do this, you have to deploy your own registry, make some security adjustments on the registry side and pull images with some security policies like TLS, Signing & Verifying.

3.3 Add Metadata to Dockerfile with LABEL

You can pass useful information in your Dockerfile to be readable (both human-readable and machine-readable), reachable and explicit.

LABEL maintainer="mail@bilalozdemir.me" \
      version="1.2.0" \
      stage="test" \
      license="MIT" \
      keywords="python, api, crud" \
      multi-word-key1="Some data"

And you can use this information later to filter or organize your images.

1
2
3
$ docker images --filter "label=version=1.2.0"

$ docker rmi $(docker images -f "label=stage=test" -q)

You can take a look at Docker’s official documentation for more detailed information about Label Formatting Recommendations and Filtering Images by Label.

TODO

This list of tips is not comprehensive at all, obviously. It’s intended to be an introduction anyway. So, there are things to be added into this list:

  • Only one main process per container
  • Specify precise versions of dependencies
  • Use HEALTHCHECK instructions
  • Be explicit in Dockerfile
  • Verify image signs
  • File & Directory permissions
  • Unused commands are removed
  • Restrict Linux kernel capabilities
  • Limit memory usage
  • Make your container read-only if possible

Further Reading

References