Desacralizing the Linux overlay filesystem in Docker
Overlay filesystems (also called union filesystems) is a fundamental technology in Docker to create images and containers. They allow creating a union of directories to create a filesystem. Multiple filesystems, which are just directories, are superposed one on top of another to create a new filesystem. These directories are called layers and the unification process is referred to as a union mount. If two files with the same path exist in two layers, only the last file will appear in the overlay filesystem.
We will learn how to create an overlay filesystem ourselves and how Docker is using it to build images and run containers.
Creating an overlay filesystem is easy
The overlay filesystem is made from two types of filesystem. One or more lower filesystems that are immutable. Their content is only read and no modification will occur inside. One upper filesystem receives all the changes from the overlay filesystem including file creations, modifications, and deletions.
Creating an overlay filesystem is easy once you get your hands on it. All you need to a Linux machine with a root or sudoer access. A virtual machine will do.
We initialize the layout by creating multiple folders, each of them corresponding to a layer. We also need a mount
folder at the place where we want the overlay filesystem to be created and a workdir
folder for internal purposes.
mkdir overlay; cd overlay
mkdir \
lower-layer-1 lower-layer-2 lower-layer-3 upper-layer \
mount \
workdir
Let’s also create some files into 3 folders. We leave the upper folder empty.
echo "Content layer 1" > ./lower-layer-1/file-in-layer-1
echo "Content layer 2" > ./lower-layer-2/file-in-layer-2
echo "Content layer 3" > ./lower-layer-3/file-in-layer-3
Two or more directories are required. They make a list of lower directories and an upper directory. The lower directories of the filesystem are read-only, whereas the upper directory can be used for both reads and writes. The mount
command creates the overlay filesystem with the external type -t
set to overlay
. It must be executed as root
.
sudo mount -t overlay my-overlay \
-o lowerdir=$HOME/overlay/lower-layer-1:$HOME/overlay/lower-layer-2:$HOME/overlay/lower-layer-3,upperdir=$HOME/overlay/upper-layer,workdir=$HOME/overlay/workdir \
$HOME/overlay/mount
The df
command lists all the filesystem along with some useful information such as the amount of free space and the type of filesystem when executed with the -T
Flag. The -h
flag is only here to print the filesystem size in a human-readable format.
df -Th | grep overlay
my-overlay overlay 20G 5.5G 14G 29% /home/ubuntu/overlay/mount
The overlay filesystem is created and mounted inside the mount
folder. It contains the files from all the original filesystems.
ls -l mount
total 12
-rw-rw-r-- 1 ubuntu ubuntu 13 Jun 2 22:38 file-in-layer-1
-rw-rw-r-- 1 ubuntu ubuntu 13 Jun 2 22:38 file-in-layer-2
-rw-rw-r-- 1 ubuntu ubuntu 13 Jun 2 22:38 file-in-layer-3
cat mount/file-in-layer-3
Content layer 3
Let’ try to create a file in the mount folder:
echo "new content" > mount/new-file
It is written to the upper directory, upper-layer
:
tree
.
├── lower-layer-1
│ └── file-in-layer-1
├── lower-layer-2
│ └── file-in-layer-2
├── lower-layer-3
│ └── file-in-layer-3
├── mount
│ ├── new-file
│ ├── file-in-layer-1
│ ├── file-in-layer-2
│ └── file-in-layer-3
├── upper-layer
│ └── new-file
└── workdir
└── work [error opening dir]
7 directories, 8 files
Now, we modify a file, for example, file-in-layer-1
.
echo 'Add a new line' >> mount/file-in-layer-1
The original file present inside lower-layer-1
is not modified. Instead, a new file in upper-layer
is created:
cat lower-layer-1/file-in-layer-1
Content layer 1
cat upper-layer/file-in-layer-1
Content layer 1
Add a new line
cat mount/file-in-layer-1
Content layer 1
Add a new line
Let see the behavior when removing a file, for example, file-in-layer-2
.
The original file inside lower-layer-2
is still there. A new file in upper-layer
is created with a special type, it is a character file. This is how the overlay filesystem represents a deleted file.
ls -l lower-layer-2/file-in-layer-2
-rw-rw-r-- 1 ubuntu ubuntu 13 Jun 2 22:38 lower-layer-2/file-in-layer-2
ls -l upper-layer/file-in-layer-2
c--------- 1 root root 0, 0 Jun 2 23:33 upper-layer/file-in-layer-2
Now that the lab is finished, we can unmount the filesystem and purge our files.
sudo umount $HOME/overlay/mount
ls -l mount/
total 0
cd ..
rm -rf overlay
Overlay in Docker
Docker supports multiple storage drivers to write data to a container’s writable layer, OverlayFS is the recommended storage driver. If you print the information from your local Docker installation, chances are that it prints the overlay2
storage driver.
docker info | grep "Storage Driver"
Storage Driver: overlay2
Docker uses the overlay filesystem to create images as well as to position the container layer on top of the image layers.
When an image is downloaded, its layers are located inside the /var/lib/docker/overlay2
folder. For example, downloading a 3-layers image using docker pull ubuntu
creates 3+1 directories. The l
directory contains shortened layer identifiers as symbolic links.
docker pull ubuntu
Using default tag: latest
latest: Pulling from library/ubuntu
345e3491a907: Pull complete
57671312ef6f: Pull complete
5e9250ddb7d0: Pull complete
Digest: sha256:adf73ca014822ad8237623d388cedf4d5346aa72c270c5acc01431cc93e18e2d
Status: Downloaded newer image for ubuntu:latest
docker.io/library/ubuntu:latest
In my case, 3 layers are downloaded:
ls -l /var/lib/docker/overlay2/
total 16
drwx------ 4 root root 4096 Jun 3 11:21 289a71f4e07caadc95892ac5b4027606bb93c69d1a23d0e866818cdb1179644b/
drwx------ 4 root root 4096 Jun 3 11:21 40766b9f546e9826ff353976c167f60cb615f57c01926a607ab48a2df64806ab/
drwx------ 3 root root 4096 Jun 3 11:21 88826e8f5f21df691dbd998df70d94e1b6b480e489c4dbb5999dcc8a7367159e/
drwx------ 2 root root 4096 Jun 3 11:21 l/
ls -l /var/lib/docker/overlay2/l/
total 12
lrwxrwxrwx 1 root root 72 Jun 3 11:21 NSEHV6LZKQIRKICXA2T7T5252D -> ../88826e8f5f21df691dbd998df70d94e1b6b480e489c4dbb5999dcc8a7367159e/diff/
lrwxrwxrwx 1 root root 72 Jun 3 11:21 QPAIOX2SCZPFZIIXB27PFVHUPH -> ../40766b9f546e9826ff353976c167f60cb615f57c01926a607ab48a2df64806ab/diff/
lrwxrwxrwx 1 root root 72 Jun 3 11:21 USIDUBYHQEGWIRN4JOSF74ZWIL -> ../289a71f4e07caadc95892ac5b4027606bb93c69d1a23d0e866818cdb1179644b/diff/
Those layers are also exposed by inspecting the Docker image. The output of the Docker command is in JSON. We use jq to filter the part with the most interest to us.
docker image inspect ubuntu | jq -r '.[0] | Data: .GraphDriver.Data'
"Data":
"LowerDir": "/var/lib/docker/overlay2/40766b9f546e9826ff353976c167f60cb615f57c01926a607ab48a2df64806ab/diff:/var/lib/docker/overlay2/88826e8f5f21df691dbd998df70d94e1b6b480e489c4dbb5999dcc8a7367159e/diff",
"MergedDir": "/var/lib/docker/overlay2/289a71f4e07caadc95892ac5b4027606bb93c69d1a23d0e866818cdb1179644b/merged",
"UpperDir": "/var/lib/docker/overlay2/289a71f4e07caadc95892ac5b4027606bb93c69d1a23d0e866818cdb1179644b/diff",
"WorkDir": "/var/lib/docker/overlay2/289a71f4e07caadc95892ac5b4027606bb93c69d1a23d0e866818cdb1179644b/work"
The precedence order starts with the upper directory and then evaluates the lower directories from left to right. Thus, the layers are evaluated in this order:
1: 88826e8f5f21df691dbd998df70d94e1b6b480e489c4dbb5999dcc8a7367159e
2: 40766b9f546e9826ff353976c167f60cb615f57c01926a607ab48a2df64806ab
3: 289a71f4e07caadc95892ac5b4027606bb93c69d1a23d0e866818cdb1179644b
Starting with the first layer to be evaluated, its content is the Ubuntu filesystem:
ls /var/lib/docker/overlay2/88826e8f5f21df691dbd998df70d94e1b6b480e489c4dbb5999dcc8a7367159e/diff
bin boot dev etc home lib lib32 lib64 libx32 media mnt opt proc root run sbin srv sys tmp usr var
From there, the second layer additional directories:
tree /var/lib/docker/overlay2/40766b9f546e9826ff353976c167f60cb615f57c01926a607ab48a2df64806ab/diff/
├── etc
│ ├── apt
│ │ └── apt.conf.d
│ │ ├── docker-autoremove-suggests
│ │ ├── docker-clean
│ │ ├── docker-gzip-indexes
│ │ └── docker-no-languages
│ └── dpkg
│ └── dpkg.cfg.d
│ └── docker-apt-speedup
├── usr
│ └── sbin
│ ├── initctl
│ └── policy-rc.d
└── var
└── lib
└── dpkg
├── diversions
└── diversions-old
10 directories, 9 files
And the third layer as well:
tree /var/lib/docker/overlay2/289a71f4e07caadc95892ac5b4027606bb93c69d1a23d0e866818cdb1179644b/diff/
/var/lib/docker/overlay2/289a71f4e07caadc95892ac5b4027606bb93c69d1a23d0e866818cdb1179644b/diff/
└── run
└── systemd
└── container
2 directories, 1 file
The instructions creating the layers are defined inside the Dockerfile. The curl
command download the Dockerfile file.
curl https://raw.githubusercontent.com/tianon/docker-brew-ubuntu-core/c5bc8f61f0e0a8aa3780a8dc3a09ae6558693117/focal/Dockerfile
By default, it prints the content to the console.
FROM scratch
ADD ubuntu-focal-core-cloudimg-amd64-root.tar.gz /
RUN set -xe \ \ && echo '#!/bin/sh' > /usr/sbin/policy-rc.d \ && echo 'exit 101' >> /usr/sbin/policy-rc.d \ && chmod +x /usr/sbin/policy-rc.d \ \ && dpkg-divert --local --rename --add /sbin/initctl \ && cp -a /usr/sbin/policy-rc.d /sbin/initctl \ && sed -i 's/^exit.*/exit 0/' /sbin/initctl \ \ && echo 'force-unsafe-io' > /etc/dpkg/dpkg.cfg.d/docker-apt-speedup \ \ && echo 'DPkg::Post-Invoke ;' > /etc/apt/apt.conf.d/docker-clean \ && echo 'APT::Update::Post-Invoke ;' >> /etc/apt/apt.conf.d/docker-clean \ && echo 'Dir::Cache::pkgcache ""; Dir::Cache::srcpkgcache "";' >> /etc/apt/apt.conf.d/docker-clean \ \ && echo 'Acquire::Languages "none";' > /etc/apt/apt.conf.d/docker-no-languages \ \ && echo 'Acquire::GzipIndexes "true"; Acquire::CompressionTypes::Order:: "gz";' > /etc/apt/apt.conf.d/docker-gzip-indexes \ \ && echo 'Apt::AutoRemove::SuggestsImportant "false";' > /etc/apt/apt.conf.d/docker-autoremove-suggests
RUN [ -z "$(apt-get indextargets)" ]
RUN mkdir -p /run/systemd && echo 'docker' > /run/systemd/container
CMD ["/bin/bash"]CMD hello
The ADD
created the first layer. The first RUN
command created the second layer. The second RUN
didn’t create any layer because no file was created. The third RUN
command created the third layer. The CMD
command didn’t create any layer because it is evaluated at runtime when the container is created from the image.
Conclusion
Once we understand how overlay filesystems work, it is quite easy to see how Docker used the overlay filesystem in its Dockerfile with additional caching between each layer. It is easily combined with the chroot jail
to provide an isolated filesystem to the container on top of immutable filesystems from the image layers. Distributing images is just about combining multiple images together as tar
archive.