Can you really trust your Docker images?
Just pulling a Docker image from the Docker Hub is like pulling an arbitrary binary blob from somewhere, and without really knowing what’s in it, execute it, and hope for the best!
At least, for some images. How can we decide if we trust Docker images?
Background: At our customers, we use containers heavily, why once a while we are using a Docker image from the Docker Hub as an offset for infrastructure or build automation. Thus, I then have to consider - do I trust the Docker image enough to give it access to all the secrets and intellectual property of my customer? I might not be the one to take the decision, but I’m definitely the one supplying the information to make the decision.
I’m not trying to explain how to do an elaborate and exhausting security review of a Docker image, as the scope for such review varies a lot. What matters is that you as the user have the information you need to also gain the level of trust that you need. That might not be to the extend of each individual binary file of the image, but at least on a high-level that you find reasonable for your business.
No matter how deep you need to go into details of analyzing a Docker image, the simple answer to gain the level of trust you need, is to make sure that you have full traceability of how the Docker image was created. Without this you can’t trust anything. However, given that you have full traceability, you can gain the insight you need to the level that fits you.
To trust the content of a Docker image I have three requirements:
I will look into the first, as my level of trust goes fine with the two last requirements, given that I trust the Docker tools in these matters - but I still want to know what’s inside the image for sure.
Before I run a Docker container, I always look at the Docker file behind it - assume for now I know which.
Here are my thoughts on what I look for, using the example from Praqma/Yocto-build
FROMbecause if the Dockerfile uses a base image, I would need to start my evaluation there.
ENTRYPOINTis interesting, as this is what the container does. Especially I’m concerned with scripts here.
Depending on the level of trust I need to have in the image, I need to evaluate each and every part of the Docker file into the next level. In the above example read the scripts and evaluate those, and look at the downloaded content.
Let’s now assume that I have evaluated all the content that goes into the Docker file to an extent where it fits my level of trust. How would I know if these things are actually what go into the final image?
The build environment becomes important now, and again this leads back to traceability. This time it’s not the content, but the process.
Almost any Docker image uses the
FROM statement, and many uses the base Linux distro images e.g.
FROM ubuntu:14.04. Personally, lets say I trust Ubuntu base image on Docker hub, but there is no guarantee that it actually contains what it says. If I build the Docker file on my local machine, my
ubuntu:1404 could be tricked to be anything. So I need to trust the build environment to ensure my base image is what the name says it is.
A similar concern related to the wget command, which downloads my script that I now have reviewed and decided that I trust. When executing the build environment, the DNS could be fiddled with or even the script might be different and replaced. I would need to trust the DNS is correct in the build environment.
In the above example, I would basically also like to ensure the script I evaluated myself, has matching checksum with the downloaded one, so it matches what is reviewed.
One thing you need to require from your Docker images is that they are build using the Docker automated builds. Then the trust of traceability is improved. With Docker automated builds you get traceability between the source of the Docker file, the version of the image, and the actual build output. You can easily follow that as I will show in the following.
When configuring automated builds you ensure that the
Full Description is always correct, as it point to the latest readme file from the repo. Moreover, you can trust the repository URL under
Short Description is up for edits from the authors.
Feels good for the trust to know these are correct, at least for “latest”.
Next step is to look at the build process, and as stated earlier the traceability is important.
With automated builds you get the
Build Details tab, with list of automated builds. You typically see several
latest tagged builds, but also hopefully specific version builds, like here
Now click one of the lines in the build details, and you get all the details about how the image was build as shown briefly in the image below:
Each build´s details show me a lot of information that I can use for improved trust in the image, among others I find these important:
Require automated builds from your Docker images, so you can get traceability to the level you need for trusting it.
If there is no automated builds, you should build the image yourself.
You could do even more than what I’ve proposed, the level of depth just needs to match your security level. There are a few other easy steps related to container security that are worth mentioning. The searches will show many other good effort towards security.
Even though we might assume that we can trust the Docker tools, content can be changed after my traceability analysis, so I would need to look into Content trust in Docker.
If you want to evaluate security, not only the specific image, you should definitely look at the Docker Bench for Security and the related Docker blog post about understanding Docker security and some best practices.
Container Solutions have a nice Cheat Sheet for Docker Security, this is much related to the complete security review of using containers. Definitely worth thinking about. Related to my thought above, note their recommendation in “VERIFY IMAGES” about pulling images by digest or building them yourself.
Do you have a tendency to use the backlog as an eternal placeholder? If so, you probably have a lot of clutter that’s creating a lot of frustrations for your end-users. In this post we’ll show you how to clean up your Jira issues and reduce the backlog with some basic JQL queries.
Tips to improve project management in the Atlassian suite
How to test Kubernetes artifacts like Helm charts and YAML manifests in your CI pipelines with a low-overhead, on-demand Kubernetes cluster deployed with KIND - Kubernetes in Docker.
Low overhead, on-demand Kubernetes clusters deployed on CI Workers Nodes with KIND
Had enough of sluggish polling? With instant Artifactory event triggers you can give responsiveness in Jenkins a real boost. Here’s an easy way to set it up.
A super easy configuration guide
With the arrival of microservices code is becoming disposable. Does this mean that we no longer need maintainable code? Is it the end of refactoring?
Still relevant or increasingly redundant?
In software development tight coupling is one of our biggest enemies. On the function level it makes our application hard to change and fragile. Unfortunately, tight coupling is like the entropy of software development, so we have always have to be working to reduce it.
How to safely introduce modular architecture to legacy software
I am an Atlassian certified trainer and over the years I have been spending much time with clients and their Jiras. In this blogpost, I have collected some small tips and tricks that will make your Jira usage better.
Jira Software is a powerful tool deployed in so many organizations, yet in day to day usage people are missing out on improvements, big and small.
In this post, I’ll take a closer look at the version of Jenkins X using Tekton, to give you an idea of how the general development, build, test, deploy flow looks like with Jenkins X. How does it feel to ship your code to production using a product coming from the Jenkins community that has very little Jenkins in it?
A crash course in Jenkins X and how to test it out on a local Kubernetes cluster
In this blog I will show you how to create snapshots of Persistent volumes in Kubernetes clusters and restore them again by only talking to the api server. This can be useful for either backups or when scaling stateful applications that need “startup data”.
Sneak peak at CSI Volume snapshotting Alpha feature
When I read Fowler’s new ‘Refactoring’ book I felt sure the example from the first chapter would make a good Code Kata. However, he didn’t include the code for the test cases. I can fix that!
Writing tests for ‘Theatrical Players’
Nicole Forsgren and the Accelerate DORA team has just released the newest iteration of the State of DevOps report. The report investigates what practices make us better at delivering valuable software to our users as measured by business outcomes. Read on for our analysis of the report, and how it can be best put to use.
The latest drivers of software delivery performance
Hear about upcoming events in Scandinavia, latest tech blogs, and training in the field of Continuous Delivery and DevOps