Docker: Taming the Beast - Part IV: Docker Volumes
Table of contents
Introduction
Hi again, and welcome to Part IV of Docker: Taming the Beast!
While the beast is not tamed yet, I reckon we have done a pretty good job at it already.
I’m glad to see you here, in Part IV, I hope you have read and liked the first three. In case you haven’t, I suggest you read the articles in order, it will be more convenient for you as each articles builds on the base laid out by the previous one.
Here is Part III if you missed it, and here is the very first part if you’re new here.
Enjoy the read!
What we Have so Far
We are now comfortable enough in Docker that the notions of container and image are clearly set, and we understand the fundamental difference between them. We have seen a fair deal of docker commands to interact with the docker daemon (such as list containers, list images, delete them, etc.).
In Part III—which was a big one!—we have learned about Dockerfiles, which are basically Makefiles, but for Docker, and we have learned almost everything there is to know about the art of writing Dockerfiles, i.e. to create our own images.
In early articles, I have insisted that containers instantiated from an image must be ephemeral, which means that it should be ok for the container to be stopped, paused, even destroyed and recreated; at (almost) any time. When I say (almost) at any time, it’s because we are not dealing with real-time programs with real time issues. But the idea is that a container is not (and thus must not be treated as) a virtual machine; on the contrary it must be treated as (because it is) a process.
We have already seen the commands to stop, start, destroy a container, they are respectively: docker stop
, docker start
and docker down
.
Our Fundamental Problem
We do have a huge problem on our hands right now: if stopping a container merely stops the process from running, thus keeping our data in the container, destroying it, on the other hand wipes out any data in it. So it sorts of clashes with the idea that containers should be ephemeral. We need a way out of this.
In other hand: we need a way to make (some) data persistent.
I have already talked & hinted about it in previous articles: this will be solved with Docker Volumes, which is an extremely important aspect of Docker, and the topic of this article.
I have just reminded you of what we have done so far, well, at the end of this article, we will be able to have truly ephemeral containers, with a safe way of keeping important data!
Grab a drink, grab some sugar and let’s do this together!
Reminders on Theory
Don’t flee!
I know that such a title is not very appealing, but stay with me, I just need to remind you a little of how things work together so that I can introduce the Docker Volumes, otherwise you’ll just think it’s a ‘new feature’ and you’ll start using them without understanding them. And you know it’s not how we do things in here.
Don’t worry, it will be a short reminder.
One of the main advantages of docker, what is at the core of the images / containers scheme and makes it so performant is the concept of layers, the fact that they are re-usable and the unionFS.
We have seen that, roughly speaking, unionFS is a sort of filesystem that is applied on top of ‘real’ filesystem (like NTFS, ext4, reiserFS, etc.) whose main goal is to provide the content of two or more separate directories, unified at a specific location. UnionFS, through union mounts, makes /path/to/dirA
and /path/to/dirB
’s content available at /path/to/unified/view
without the user knowing where each of the files and directories inside initially comes from.
Then we have seen the concept of layers: when you write a Dockerfile, you start with a base image, and each statement in the Dockerfile runs a series of operations on the image, and when it succeeds it creates a new layer, which, chained with the previous image, gives another image.
Similarly, instantiating a container from an image is done very quickly because it only consists in creating a Read-Write Layer which is linked on top of all the layers that compose the image. This is very cool because it makes instantiating containers so fast and so easy that it is virtually free (meaning you can instantiate containers on the fly, issue some operations in it and shut it down).
Also it makes it easy to destroy a container: only the RW layer need to be removed.
Alas, this is also the source of the problem: how do we delete a container but still keep some important data from it? When we delete the container, we delete the RW layer and thus lose all modifications from the image.
A first idea for a solution would be to simply save the RW layer, but this is a bad idea, for several reasons. First, we can say that the RW layer is the container, so saving it actually means ‘not destroying the container’.
Second, the RW layer, being a layer is a modification that applies on top of another layer (a bit like a git diff that applies to a commit). This means that if we backed up this RW layer, we could only re-apply it to the same image we initially applied it to. This is rather lame because we would not be able to share data between two different images.
And third, since all modifications in a container are stored in the RW layer, saving the RW layer would essentially save all data, and we could not cherry pick and save just the data we want.
So we need to find another solution, and what is done with Docker Volume is simply to bypass the unionFS. Let’s see this in more details.
Handle Persistent Data Correctly
Okay, here we are, the real stuff begins now. Let’s see how we can overcome the limitation of the unionFS.
Presentation of Docker Volumes
I have talked about them enough, it’s time to fully introduce them: the solution will come from Docker Volumes.
What are they? Well, strictly speaking, they are a directory, stored in a specific, Docker-reserved location in the host (for the curious, this location is /var/lib/docker/volumes
).
Now, this is not very helpful, so let’s talk what they are conceptually (and how you should think about them, really). You must think of Docker Volumes as a partition, or an external volume from the point of view of your containers (“volumes” as in USB-stick, external hard-drive, etc.).
The idea behind Docker Volume is quite simple: you mount them at a specific, chosen location in your container, and when you do that, operations in the mountpoint inside the container are actually performed in the directory in the host’s /var/lib/docker/volumes/
.
The previous sentence is, without a doubt, the most important sentence in the whole article. So make sure you read & understood it carefully.
Let’s take a step back for a minute: until now, every operation (say create a file, modify it, rename it, delete it, etc.) was done in the container’s RW layer, the one that disappears when you delete the container. All of these modifications happen entirely in the RW layer.
Docker Volumes, on the other hand, bypass this mechanism. When you mount a Docker Volume in a container, it is really like a traditional UNIX mount: there is no unionFS nor union mounts (or even layers) anymore. Let’s say we mount a volume in our container, at location /code/workspace/projects
. What this means is now, every operation that is done on the container’s /code/workspace/projects
is actually done in the directory stored in the special place /var/lib/docker/volumes/xxx
. Really. So strictly speaking, /code/workspace/projects
is not part of the container anymore (in the sense that it’s not in the RW layer anymore).
THe container’s /code/workspace/projects
and the host’s /var/lib/docker/volumes/xxx
are the same directories, exactly like when you create symbolic links with ln
.
This is very powerful, because now, when you delete / destroy the container, its RW layer will be destroyed as we have seen before, but this directory at the specific location (/var/lib/docker/volumes/xxx
) will not!
And then, what prevents us from recreating our container, and mount this directory again (the host’s /var/lib/docker/volumes/xxx
) at the same location (/code/workspace/projects
) ; or even at another location inside the container? Nothing! And this is exactly what we are going to do!
Congratulations, you have just understood how to make data persistent!
Just to give it another go and make absolutely sure you understand what we are doing, I’ll say it one last time: every modification that one makes inside a container is stored in this container’s RW layer, a bit like git commits which apply diffs from one state to reach another state.
When we use & mount a Docker Volume at a location in the container, we are actually (like really, there’s no intermediary step) linking the target location inside the container with a directory on the host. Exactly as if we made a Linux symbolic link with ln
. The consequence of this is that the target location inside the container now bypasses the uinionFS mechanism, and changes (such as adding a file, deleting it, rename it, etc.) are actually performed by the host.
This is powerful and convenient because it runs at native speed: the directories are linked, so there is no intermediary steps, caching or whatever else.
Okay Fine, Got It. How & When do I Create a Docker Volume?
That’s a legitimate question, I’ll answer later, but basically, either you create a Docker Volume and then mount it when you instantiate a container, or you can create your container and specify the docker volume at the same time.
Let’s dive in!
Practice Time
Create Docker Volumes Separately
As I previously hinted, there are usually two ways to create a Docker Volume, and we will first look at making a Docker Volume separately. This will allow us to discover a number of useful commands that deal with Docker Volumes, and then we shall see how to mount the volume in a container.
Quite logically, docker comes with several commands pertained to volumes, they all take the form docker volume <sub-command>
.
Listing Volumes
The first and simplest of all is to list all the volumes present on the host, the command is docker volume ls
:
$> docker volume ls ~
DRIVER VOLUME NAME
local 06d1cbdb687c167842a7423970e950617935dec140e8740ab26712b8fabd5001
local 1c6bf0f75ef901439e10e05461937e02902e0861086715185524d23845b06756
local 459129cc5e2a7655d0dc1337f06b54402cdc3fb916ae994bb124990288061f6a
local 792e7d8e333b133a1675b24c0ead99605e62a63ad30fdd107200b5be3c9db356
local 866a904c82ecd37216564105de0a073f409b5b10535554728c15865475c85567
local e16576939ac83f75f1de1eec688cd4413b824ca888b7e77808ddc9826d530d70
local ff7b2a404c14186909c0cfcf613e820fc097aac9dbf3c21555468f1f471e3476
local my-volume
As you can see on my output, some volumes have a cryptographic hash for a name while others have a real, intelligible name: this is why Docker Volumes are called Named Volumes (I will use both of the terms interchangeably from now on).
As it quickly becomes difficult to maintain, we will aim at always naming our volumes.
Inspecting Volumes
Another sometimes useful (but not that often) command is docker volume inspect <volume-name>
. The inspect
subcommand gives more detailed information on one particular Named Volume, as you can see here:
$> docker volume inspect my-volume ~
[
{
"Name": "my-volume",
"Driver": "local",
"Mountpoint": "/var/lib/docker/volumes/my-volume/_data",
"Labels": {},
"Scope": "local"
}
]
Creating Volumes
We were initially interested in creating a Named Volume, so let’s do this now, the command, quite logically is named docker volume create
.
You can type it as-if: docker volume create
and doing so should return a cryptographic hash on the console, like so:
$> docker volume create ~
ffa05ef362bf6126d500490ed5db5b64b0c480cacb320062d7e9e251380c4913
Congratulations, you’ve just created your first Docker Volume! To confirm this, let’s list the Docker Volumes with docker volume ls
, and if there are too many, let’s grep
the result with the first few characters of the hash you’ve just been returned: docker volume ls | grep <beginning-of-hash>
$> docker volume ls | grep ffa05 ~
local ffa05ef362bf6126d500490ed5db5b64b0c480cacb320062d7e9e251380c4913
And sure enough, we find our new volume.
But I said before that we will try to name our volumes so that it doesn’t quickly becomes a mess, the option for this, is --name
, so let’s do something like this: docker volume create --name MyFirstNamedVolume
.
$> docker volume create --name MyFirstNamedVolume
MyFirstNamedVolume
And voilà! You have create your first Docker Named Volume. Usually I say “Named Volume” when the volume is indeed named, and I call Anonymous Volume when the volume only has a cryptographic hash for a name.
Fundamentally it doesn’t change anything: they both work exactly the same, it’s just that it’s easier to list, inspect and identify volumes that have a proper name. Besides, taking the good habits of naming your volumes will help when we later see docker-compose
(in a later article), so… you should really do it!
Deleting Volumes
We have seen how to list, inspect and create Named Volumes, what remains is logically how to delete them. Sure enough, the command is docker volume rm <volume-name>
.
Be careful with that command! The goal of the article is to explain how to make data persistent. We showed that mounting a Named Volume in a container allowed us to bypass the unionFS mechanism, so that our files inside the mounted directory are kept outside of the container, safely in a Docker Volume.
This allows us to delete the container (along with any modifications), maybe recreate it and mount the Named Volume again, to find our persistent data again.
This whole scheme works on the fact that the data we want to make persistent is stored inside a Named Volume, so if you delete this Docker Volume, then your data will be gone for sure this time.
The thing to take out from this is that you must be very careful when deleting a Named Volume because you will lose all data inside. This is where it helps to have a proper name for the volume, indeed when deleting a volume named postgresql_data
, I know what I am about to delete and I can cancel it if I don’t want to. But if I’m to delete a container named <hash>
, well it’s more difficult!
I believe we can conclude this part for now, as we have seen all essential docker volume
commands. They are, after all, quite logic and follow both the docker and linux ways.
Create the Volume at Run Time
In the previous part, we saw how to create a Named Volume, but not how to mount it inside the container.
Let’s see it right now.
There are actually three different syntaxes to mount a Named Volume inside a container. All of them have a different meaning, so pay attention!
1. Mount a Named Volume
The first syntax is: docker run -v MyFirstNamedVolume:/path/inside/container <image-name>
.
As you have probably guessed, first comes the name of the Named Volume (here I chose the one we created in the previous part), then comes the separator (:
) and last comes the path inside the container where we want to mount our volume.
This is the directory that will bypass the unionFS (so if you write a file inside this directory, it won’t be stored in the container’s RW layer, but it will be written in the special location on the host; we’ve talked about this.).
If the Named Volume doesn’t exist already, i.e. if the name you put before the colon :
is not the name of a Docker Volume (let me remind you that you can check this with docker volume ls
), then it will be created, exactly as if you had previously run docker volume create
to create it.
So this is how we create a Named Volume at run time (as the title of this section describes!).
2. Mount an Anonymous Volume
The second syntax to mount a Docker Volume, is to omit the first part of the command (the name), and thus only provide the path at which we want to mount our volume.
We do it like this: docker run -v /path/inside/container <image-name>
.
When you do this, i.e. only provide a path to the -v
option, and no colon (:
), docker will create a Docker Volume for you, but since you did not specify a name, it will just have its cryptographic hash. Then it will mount it at the specified location inside the container.
It works exactly the same, it’s just that, as I said before, it won’t be obvious what this volume corresponds to when you list it with docker volume ls
.
Recap
This is very important to understand the three possibilities that are offered to you, and so it deserves its own title, paragraph and all.
Here are the three syntaxes recapped and their main, most common use-cases:
-v volume-name:/path/in/container
: used to mount a Named Volume inside the container. Use this when you want to make data persistent, i.e. keep it across container destruction and re-creation. Most common examples are: database’s data files, log files from server, config directory, credentials (private, public key pair)-v /path/in/container
: used to mount an Anonymous Volume inside the container. Since you did not specify a name, it will create the volume before mounting it. The volume will not have a name, only a hash, which is hard to remember and identify. Use this for temporary & short-lived containers or if you have good reasons not to want a Named Volume (for instance in a script, if you don’t usedocker-compose
). I advise against this form.-v /path/in/host:/path/in/container:{ro,rw}
: used to mount a host’s directory in the container. Use this when you need to share data between the two. For instance to pass information to the container or take information out of it. Used with theread-only
flag (:ro
), it is useful to share/etc/localtime
to synchronize the times between host and containers, or~/.ssh/
to share your ssh keys in your container. Or it can be used to share source code in a building or testing environment.
Always make sure you know your use-case before choosing the method.
Difference between Mounting a Host’s Directory and a Volume
This deserves a special mention because when I introduce people to Named Volumes, I always get this question: “what’s the difference?”.
From a technical standpoint, nothing: whether you mount a host’s directory or a named volume, what happens is that a Linux directory on your host is linked from inside the container. And it’s good news, performance-wise. So why do I always frown when I see people mounting host’s directories when they want to make data persistent (like a database)? It’s because it’s dangerous.
There are two fundamental aspects that make mounting a host’s directory for persistent data a bad solution:
When you mount a host’s directory, you generally have access to it. So it’s very easy to forget about it and accidentally change and/or delete it later. This would affect your container, and you would potentially lose ‘persistent’ data.
Mounted host’s directories are not Docker Volumes, so they are not tracked by
docker volume
commands. If you followed my advice from earlier and your name your volumes sensibly, by runningdocker volume ls
you would see names such asdb-data
,ssh-keys
,backup-files
, etc. So it’s relatively easy to know what a volume contains. On the other side, if you mount a host’s directory, you don’t have access to this information, and you have to manually search your directories on your host’s filesystem to know this. Since directories can be anywhere, you will surely be tempted to create a sort of ‘root’ directory, like~/docker-mounts/
in which you would place all the mounted host’s directories…
But this is exactly what docker already does for you!
Anyway, it would be pointless to insist more, I strongly encourage you to re-read the three use-cases I described under the ‘Recap’ section, and use Docker Named Volumes as much as you can.
Verifying That it Works
It’s time for practice, don’t you think?!
First of all, let’s create a container and a named volume with it. Let’s mount this volume to /home/data
for instance: docker run -it --name test-1 -v test-volume:/home/data ubuntu bash
.
This is becoming quite a verbose command, but you should be able to understand every part of it. Now you must have a terminal in your newly-created Ubuntu container.
Let’s fire a new terminal (don’t exit the one in ubuntu) and check that our docker volume was created: docker volume ls
, and you should be an entry test-volume
:
$> docker volume ls
DRIVER VOLUME NAME
local 06d1cbdb687c167842a7423970e950617935dec140e8740ab26712b8fabd5001
local 1c6bf0f75ef901439e10e05461937e02902e0861086715185524d23845b06756
local 459129cc5e2a7655d0dc1337f06b54402cdc3fb916ae994bb124990288061f6a
local 792e7d8e333b133a1675b24c0ead99605e62a63ad30fdd107200b5be3c9db356
local 866a904c82ecd37216564105de0a073f409b5b10535554728c15865475c85567
local MyFirstNamedVolume
local e16576939ac83f75f1de1eec688cd4413b824ca888b7e77808ddc9826d530d70
local ff7b2a404c14186909c0cfcf613e820fc097aac9dbf3c21555468f1f471e3476
local ffa05ef362bf6126d500490ed5db5b64b0c480cacb320062d7e9e251380c4913
local my-volume
local test-volume
Perfect.
Now let’s go back to the terminal in our container, and let’s check that we have a /home/data/
directory, the mount point of the docker volume: ls -l /home
:
root@e39ac1afe0ae:/# ls -l /home/
total 4
drwxr-xr-x 2 root root 4096 Dec 27 07:25 data
Here it is, looks good for now.
If you followed correctly from the beginning, you know (or else, I remind you here) that when you mount a docker volume at a mount point inside a container, this mount point now escapes (or bypasses) the unionFS mechanism, right? So data you put in the mount point is not stored in the container’s RW layer, but in the docker volume instead.
It’s time to verify this by ourselves.
Verifying the Read-Write Layer Conspiracy
Let me introduce a new option to the docker ps
command: -s, --size
. By appending this option, docker will inform you of the size your container takes. Let’s try it:
$> docker ps --size
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES SIZE
e39ac1afe0ae ubuntu:latest "bash" 4 minutes ago Up 4 minutes test-1 0 B (virtual 124.8 MB)
As you can see, the output is a little strange: 0 B (virtual 124.8 MB)
. There are two parts to this.
Take a minute here and try to guess what those mean, you should be able to guess without me explaining.
…
Found it? Let’s check!
The first part (the non-virtual size) is the size of your container’s RW layer. It is the size of all data and modifications you made to your container. You do remember how a container is just comprised of the read-only image’s layers with an additional read-write layer? Well the first size shown here is precisely this RW layer’s size.
And the ‘virtual’ part of it, is simply the size of the image. This is called ‘virtual’ because suppose you have 2 containers instantiated from the same image; suppose the image is 500MB large; suppose the first container’s RW layer is 100MB and the second container’s is 250MB.
What is then the total consumed disk space? It’s 100MB (first’s container’s RW layer) + 250MB (second container’s RW layer) + 500 MB (size of the image, included once, thanks to the layer’s reusability). So the total disk space consumed is 100 + 250 + 500 = 850MB
and not 100 + 500 + 250 + 500 = 1,350MB
.
This is why docker ps '-size
will give you this output. Generally, you want to watch out for containers having a ‘big’ (non-virtual) size. Why?
Let me return the question to you, and take a minute to try and understand why (this is fundamental that you understand this): why is it (generally) bad to have a container with a ‘big’ non-virtual size?
…
Seriously, don’t cheat, actually take a minute and think; hint: this has to do with the topic of this article.
…
I really hope you tried, and it’s good if you succeeded.
Now for the answer: the non-virtual container’s size is, as I just said, the size of its RW layer. Which means that if you have a container with a ‘big’ RW layer, there’s a lot of data that is not persistent: data which will be lost when you delete your container!
Depending on your container’s purpose, this might not be a problem: maybe it’s a container that is making some tests on your code and there is a lot of output data (like compilation logs, etc.), maybe this ‘big’ data is temporary data that you don’t care about, etc. In these cases, it’s fine.
But if your container possesses important data, which you should not lose should your container be deleted, make sure to put it in a Named Volume.
Anyway, let’s go back to making sure the RW layer thing was real and not a conspiracy. A bit earlier, you had read the size of your container (in this article it was 0
bytes).
Let’s now create a 15MB file in /tmp
. Since we did not mount a Named Volume or a host’s directory at /tmp
, this file we’re about to create should go into the container’s RW layer.
You can create a garbage file like this: dd if=/dev/zero of=/tmp/garbage-file bs=1M count=15 iflag=fullblock
.
Don’t try to cat
the file because since we took random bytes, it will surely mess up your terminal. If you did cat
it and messed up your terminal, then:
- press
<Enter>
to clear any remaining command - then type
reset
(even if you can’t see what you’re typing, just make sure to typer-e-s-e-t
- then press
<Enter>
again, and your terminal should come back to life (this is a useful UNIX tip: wherever your terminal is messed up, you can restore its state by typingreset
and pressing<Enter>
, make sure to remember it!)
It’s time to look at the size of our container again to see if what I have been telling you all along was a lie or not!
$> docker ps --size
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES SIZE
e39ac1afe0ae ubuntu:16.04 "bash" 14 minutes ago Up 14 minutes test-1 15.73 MB (virtual 140.5 MB)
And voilà!
We have added a 15MB
file and our container grew by 15MB
. Quite logic. You’ll notice that if you remove the file (rm /tmp/garbage-file
), the image will shrink. It seems logic, right? Not so fast, let’s consider this Dockerfile:
FROM ubuntu:16.04
RUN dd if=/dev/zero of=/tmp/garbage-file bs=1M count=15 iflag=fullblock
Let’s build it, instantiate a container from it and “log in”:
root@54099e08c610:/# ls -l /tmp/
total 15M
-rw-r--r-- 1 root root 15M Dec 27 17:32 garbage-file
So we have our 15M file, we can verify the image’s size:
$> docker ps --size
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES SIZE
54099e08c610 test-img "bash" 3 minutes ago Up 3 minutes mad_kare 0 B (virtual 140.5 MB)
So the image’s size is the same as before, when we had manually created the file after the container was created. So far so good.
The fun comes when we delete it, do it: rm /tmp/garbage-file
check the size again:
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES SIZE
54099e08c610 test-img "bash" 7 minutes ago Up 7 minutes mad_kare 0 B (virtual 140.5 MB)
And yeah, this is the fun part: it’s still the same size, can you explain why?
…
You know what? I think I won’t answer this right now and I’ll let you think about it until next article. Maybe I can create a sort of trend like this when I give you some work for the next article.
So back to out topic.
Verifying the Docker Volume Persistency
We wrote our file in /tmp/
and so it went into the container’s RW layer. Fine. We have been doing this since the beginning, now we want to see Docker Volumes in action!
Let’s do the same thing: create a garbage file, but put it in the Docker Volume this time.
Let’s instantiate a container with a Named Volume mounted on it: docker run -it --name test-volume -v TestVolume:/data ubuntu:16.04 bash
Take a look at the size:
$> docker ps --size
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES SIZE
dca1b241ab47 ubuntu:16.04 "bash" About a minute ago Up 59 seconds test-volume 0 B (virtual 124.8 MB)
And now, create our 15MB
file at the mount point (i.e. in the volume):
root@dca1b241ab47:/# dd if=/dev/zero of=/data/persistent-garbage bs=1M count=15 iflag=fullblock
15+0 records in
15+0 records out
15728640 bytes (16 MB, 15 MiB) copied, 0.0149029 s, 1.1 GB/s
Test #1: confirm that the container’s RW size did not change:
$> docker ps --size
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES SIZE
dca1b241ab47 ubuntu:16.04 "bash" 4 minutes ago Up 4 minutes test-volume 0 B (virtual 124.8 MB)
Ah ah! So we created a 15MB
file and the container did not grow up an inch! So that’s perfect: I haven’t lied either (good news, right?!).
Test #2: let’s destroy that container, wiping out its non-persistent data and see if we can get our persistent-garbage
file back!
Exit the container in your container with CTRL + D
, stop and destroy it: docker stop test-volume && docker rm test-volume
. Now it’s gone.
I’m about to show your where your persistent-garbage
file is hidden because I want you to understand how it works under the hood, I don’t want any part of docker to appear as ‘dark magic’ for you. But, what I’m about to show you is for educational purpose only. Do not, ever modify the files directly from outside the containers. This is asking for trouble.
On your host’s system, docker resides under a very nice, warm and sunny place called /var/lib/docker
. This is where you will find a lot of absolutely incomprehensible garbage, but you will find some useful information as well :)
In particular, there is a nice directory called at /var/lib/docker/volumes/
containing all your Named Volumes. Remember earlier when I was explain the difference between sharing a host’s directory and mounting a Named Volume? One of the reason was that when using Named Volumes, docker stores them at an internal place, well this is it.
What is cool with /var/lib/docker/volumes/
is that:
- it’s ‘well-hidden’, so there’s much less chance that you accidentally change / delete a file in there
- it’s owned by
root
and so, even if you were to wander around here you would not be able to do anything (unless of course you were root, in which case, I mean, natural selection, Darwin, etc. You understand, surely that for the sake of the species, you must not be permitted to continue your operations :)).
So, in there you should find a directory called test-volume
(the name of the docker volume) and inside it, a directory named _data/
. It’s in this _data/
that resides the data inside the container. Go ahead, take a look: ls /var/lib/docker/volumes/test-volume/_data/
. Again, don’t change anything, just know and remember it’s stored here, and that’s not dark magic.
It’s time to see the real thing in action, we will instantiate another container, and we will mount the Docker Volume so that you can see your persistent data being persistent.
For a change (and to really insist on the fact that the Docker Volumes are separated from the container’s RW layer, we will instantiate a container from a different image, like nginx
for instance).
docker run -it --name test-volume-2 -v TestVolume:/home/test/other/path nginx bash
. Here, we’ve changed the image from which we instantiate the container and the path at which we mount the volume. That’s to be extra-sure.
You should have a shell inside your container now, you can check your data is here:
root@8450afc561d4:/# ls -lh /home/test/other/path/
total 15M
-rw-r--r-- 1 root root 15M Dec 29 19:14 garbage-file
And tadaa!
You have just successfully:
- created a Docker Volume
- mounted it inside a container
- put some data on it (bypassing the unionFS mechanism)
- destroyed a container that was using it
- mounted it inside another container, and witness your data still being here!
How awesome is that?!
Take a Step Back
I really need you to take a step back here, relax and let it sink. I’ll ask you to take a break in a couple of minutes now, and I really hope you’ll do it.
What we’ve just done might seem unimportant and almost too easy, but we have just dived into an extremely important aspect of docker (seriously) and a lot of people using docker get that wrong. Simply understanding the concept of Docker Volume and data persistency, and using them correctly already makes you a better Docker user that a lot of people.
To recap and be a little heavy on the matter, what we just did is ‘pierce a hole’ in the unionFS / layer mechanism to store some of the container’s data elsewhere, in a secured location. This effectively separates the data from its container, enforcing the ephemeral aspect of the container which I keep talking about.
It’s like the first time your learned MVC and you thought “great! now I can change some aspect of my application without changing everything!”.
Well here, it’s a bit like this too. Every time your container should ‘keep’ or ‘archive’ something, use a Docker Volume. If your data is linked to the session and don’t need to persist then don’t use a Docker Volume.
Typical uses of Docker Volumes include:
- database storage
- user preferences
- configuration options
- ssh keys
- certificates
- user’s data (suppose you’re dockerized a sort of web service that provides users with a storage space where they can upload files)
- logs (if they need to be kept to be audited later)
- etc.
Take some time to actually think about this (this is called software architecture!) because you should not need to fall into the opposite trend and making dozens of Docker Volumes per container to keep everything.
Let me remind you that as everything you put in docker volume bypass the unionFS mechanism, they won’t be deleted when the container is destroyed and thus their disk space consumption won’t be freed either!
This is perfect because we’ve just finished a major part, so it’s really time to take a break. Please do it, really, and come back a little later, after your brain had had time to process all that. Then scroll back a couple of screens, and re-read.
This topic is paramount.

Going Further
We have seen quite a few things about Docker Volumes, and by now I hope I managed to imbue you with their importance and usefulness.
But I haven’t shown you everything, there are still a couple of interesting things left to do with the Docker Volumes, I’m offering to cover some of them in the following sections.
Use Cases
We have talked a lot about Docker Volumes and provided some snippets of code to illustrate the different points we were exploring, but this might still be a bit confusing as when you should use them.
This section explores some common use cases, by all means this is not an exhaustive list and I’m only trying to give you an intuition about them, so you can guess when you need them or not.
Make Data Persistent
I know I have talked about this a lot, but Volumes are primarily used to make data persistent. By that I mean —again, I am repeating myself— that your container should not be this “beast” that has been running for 5 months and you’re afraid that you docker daemon crashes or your server restarts because you don’t know what’s inside the container, what you will lose etc.
Very roughly, you should have “computations”, “processing” and “logic” as applications in your containers, and “data” in volumes. For instance:
- your webserver (apache, nginx, other) should run inside a container, but:
- the directory they are serving —the
/var/www
— should be inside a Docker Volume - the directory holding the files users uploaded on the server should be in a Docker Volume
- I’ll even argue that your config files might be in a separate Docker Volume (I admit that this will depend on your setup)
- the directory they are serving —the
- your database instance (postgres, MySQL, other) should run inside a container, but:
- the directory holding the database data (
/var/lib/postgres
for instance) should be inside a Docker Volume - same remark as for the webserver for the config files
- the directory holding the database data (
Where I work, I’ve designed the production stack to be like this. All our server instances are dockerized and the data made persistent in Docker Volumes. I frequently docker stop
and docker rm
containers from active clients and everything is happening without a problem because the data (the important part) is stored in Volumes that are mounted by the new containers. The downtime is only a few seconds for the time of the docker run
to return! When you reach this setup, you know your stack is sound and will be robust (I havent’ said “perfect”, I’m sure there is a lot to criticize about my stack).
Making Backups
Here, we’re going to see how nice and useful Docker Volumes and the --volumes-from
option are for creating backups.
Suppose you have a container, server-db
which runs a database instance (for your webserver for instance). Let’s suppose the DB is postgresql and that the data directory location is /var/lib/postgresql/data
.
Since you want to make this data persistent, you used a Named Volume to hosts this directory.
Now, you can make a backup with a one-liner that looks like this: docker run --rm --volumes-from=server-db:ro -v ~/db-backups:/backup ubuntu:16.04 tar cvjf /backup/backup-$(date +%m-%d-%y).tar.gz /var/lib/postgresql/data
.
This is a big one-liner, but still a one-liner. Let’s examine it:
docker run --rm
: with this we launch a container (instantiate from theubuntu:16.04
image, as seen further in the command). Since we want this container only for creating the backup, we use--rm
so that the container is destroyed when it exists. This way, it doesn’t leave our system with useless, not-running containers (especially useful is you script this command and run it regularly with a cron!).--volumes-from=server-db:ro
: as demonstrated in the previous section, we mount the same volumes as the containerserver-db
, we add the read-only option so that our script cannot modify any of the data from the volumes (especially useful if this is a production server database!)-v ~/db-backups/backup
: here we mount the host’s~/db-backups/
directory inside the container, at location/backup
. This is because in my example, I want to store all the backups I make in my~/db-backups/
, maybe I have another cron job thatrsync
this directory to another server, to prevent data loss in case my computer dies.ubuntu:16.04 tar cvjf /backup/backup-$(date +%m-%d-%y).tar.gz /var/lib/postgresql/data
: this is a long string, but it’s nothing we havent already seen. It simply tells that we want to instantiate the container from theubuntu:16.04
image, and that we want to run a command. Usually we runbash
as a command, and we specify options-it
so that we can “log in” the container. Not today, today we don’t specify-it
so it’s not interractive (perfectly suited to be scripted) and the command we specify istar
, which is the Linux tool to create archives.
We specifycvjf
astar
’s first argument:c
means “create archive”v
is for “verbose” (you can remove it if you don’t log your script’s output)j
means to compress the archive with thebzip2
algorithmf
is the option to specify the (target, archive) file
Then we give the target, archive file: /backup/backup-$(date +%m-%d-%m).tar.gz
. As you can see, we place the target archive file in the container’s /backup
directory which is mounted in the host’s ~/db-backups/
(this is the reason why we can delete (with the --rm
option) the container afterward, because the backup will not be inside the container, but in the host). We then include the date in the filename, which is handy when making backups.
The last argument /var/lib/postgresql/data
, is obviously the name of the directory we want to create an archive / backup from.
When you need to restore the data later, in another container, say "new-server-db"
, you can run docker run --rm --volumes-from=new-server-db -v ~/db-backups:/backup:ro -w /var/lib/postgresql/data ubuntu:16.04 tar -xvjf /backup/backup-MM-DD-YY.tar.gz
.
The idea is exactly the same, only here we extract a backup. Let’s examine this command:
docker run --rm
: same as before, we use the--rm
option so that the container disappears after it’s done--volumes-from=new-server-db
: so we mount the Volumes from the new container—which we called"new-server-db"
here—but this time, we don’t use the:ro
flag, as we will be writing data inside the volume.-v ~/db-backups:/backup:ro
: same as before: since we stored the backup of the other server in our hosts’s~/db-backups/
directory, we need to mount it again. But this time, since we will only read from it, it’s safe to use the:ro
flag (and since it’s safe to do, we should do it, to be sure our “restore” script cannot mess with our backup files)-w /var/lib/postgresql/data
: since we are going to extract our backup file in this location, we need to go there first. Two possibilities:- we could pass
bash -c "cd /var/lib/postgresql/data && tar ..."
as a command - we can use the
-w
or--workdir
option thatdocker run
provides us. When you specify this option with a path, it becomes the current working directory for the command you specify.
- we could pass
So really, both options are fine, but the latter is better because that makes use of the option that was made precisely for such purpose, it allows you to specify tar
as a command and not bash
-which-runs-tar
. And this will keep working whatever the image (maybe you don’t want to use ubuntu
but a smaller image which does not have bash
, only sh
?)
So basically, thanks to -w /var/lib/postgresql/data
, when the command is run, it’s done from the /var/lib/postgresql/data
directory, oh and another better reason to use -w
instead of bash -c "cd ..."
is that if the directory doesn’t exist, docker will create it (you would need to run bash -c mkdir -p /var/lib/postgresl/data && cd /var/lib/postgresql/data && tar ...
if you wanted to be resilient to this, which makes the command even bigger…)
ubuntu:16.04 tar -xvjf /backup/backup-MM-DD-YY.tar.gz
: simply means that we want to extract a file, make sure to replace ‘MM-DD-YY’ by the actual values of the file you want to backup.
And voilà! It’s simple enough: when tar
finishes (returns), the container will exit, and thanks to --rm
, it will disappears and leave us a clean system.
These were only some basic use cases but I’m sure you can find a lot of other ones if you start experimenting with Volumes. I hope by now that I have convinced you of two things:
- Docker Named Volumes are awesome and very useful
- Mounting a Docker Named Volume and mouting a host’s directory are two separates cases. Make sure you re-read the examples above, because in them I use both volumes and host’s-shared directory. They are not made for the same use cases!
I don’t want to see information made persistent by sharing it with a directory in your home (do not do things like-v ~/persistent_data:/var/lib/postgresql/data
, use a Volume!).
Additional Info
We are almost done with Docker Named Volumes!
A few things remain for us to see and understand before we can go play with Volumes in our own images and containers.
What About Dockerfiles?
The most astute of you might have noticed that everything we’ve done so far with Volumes were done at runtime, i.e. with docker run
, when we instantiated a container from an image.
If you have read some Dockerfiles on the web or read the documentation, you might have seen a VOLUME
Dockerfile statement. What is it for?
Take a few seconds to think about it, to think if it makes sense for you. What would that mean at build time?
…
So, what do you have?
Let’s see if you were on the right tracks.
Normally, we would expect that an image doesn’t care about Volumes. From the point of view of the container that is instantiated from this image, when it writes data to /path/to/data/
, it doesn’t care whether this data goes into the container’s RW layer, if it goes into a directory mounted on the host or inside a Named Volume. In fact, I’d even argue that it doesn’t even know this.
The decision to mount a directory inside the container (whether it’s to a Named Volume or to a host’s directory) is decided at run time, and given the same image, it can be different from one container to another.
That being said, sometimes, when you design your images cleverly or for specific purposes, you can sort of enforce this. This is where the VOLUME
Dockerfile statement comes into play.
Suppose you’re writing an image for a web server which allows for users to upload files. Most likely you’ll want to save those files, i.e. make the data persistent. If you’re building the image precisely for this task, you know that the directory /var/www/user-upload/
will have to be persistent and saved into Volumes.
When you want to enforce this, you can use the VOLUME
statement. It takes an array of directories, like this: VOLUME ["/data"]
.
When you instantiate a container from this image, docker will automatically create an Anonymous Volume, which will be mounted on the path you specified (obviously, if you specified several paths, it will create several volumes, each mounted to a path), and this Volume (or Volumes) will be filled with the data present at the mount point.
This has the advantage that even if you don’t specify a Volume to mount at run time (with docker run -v
), your data will still be made persistent, but the disadvantage is that it will create an Anonymous Volume, so it’s not easy to find it back for a given container.
Obviously, if you specify a Volume name with the -v
option, then the Docker Volume will be named according to what you specified.
An important thing to note is that the location of the VOLUME ["/path/to/mountpoint"]
statement in your Dockerfile is meaningful: it will create the Volume and populate it with all the data that is inside the mountpoint at this moment.
Every modification that is made to this path afterward this statement in the Dockerfile, is lost.
It’s time for some examples to clarify this.
First, a basic Dockerfile:
FROM ubuntu:16.04
MAINTAINER nschoe <nschoe@protonmail.com>
RUN mkdir /data
RUN echo "Hello, world!" > /data/file1
VOLUME ["/data"]
This is simple enough: we create a /data
directory and write something in a file in /data
. Next we call VOLUME ["/data"]
in the Dockerfile.
This will have the effect of creating (at run time, obviously) an Anonymous Volume mounted on /data
with some data in it: the file1
with the string "Hello, world!"
.
Let’s check this out, first let’s build the image: docker build -t test-vol:1 .
.
For now, we have no Docker Volumes:
$> docker volume ls
DRIVER VOLUME NAME
Let’s instantiate a container from the image: docker run --rm -it test-vol:1 bash
.
In another terminal window, we can see that a new Docker Volume was created:
$> docker volume ls
DRIVER VOLUME NAME
local d7a4ed94061b0754286c76cda157ba25d6d936898c164e15f2dc4039e672457f
So we can indeed see an Anonymous Volume created (so this is good: the VOLUME
statement does have the effect of creating a Volume, even though we did not specify the -v
option)
Inside the container, we can check that the volume does contain the file1
:
root@73456d7c3f36:/# ls /data/
file1
root@73456d7c3f36:/# cat /data/file1
Hello, world!
Perfect. Since we used the --rm
option, the container’s volume will be deleted when we exit the container (I’ll leave you check this by exiting the container and list the Volumes again.)
So we are back with no Volumes on the system. You can make the check that if you specify a Volume name with -v
, like -v MyVolume:/data
, then you won’t have an Anonymous Volume, but a Named Volume, and that you will need to delete it manually with docker volume rm MyVolume
.
When we specify the -v
option, we have the same behavior we saw earlier in the article, so I won’t make new example for this.
Now we will verify that whatever it written in the location specified in the VOLUME
statement after the volume is discarded:
FROM ubuntu:16.04
MAINTAINER nschoe <nschoe@protonmail.com>
RUN mkdir /data
RUN echo "Hello, world!" > /data/file1
VOLUME ["/data"]
RUN echo "Discarded data" > /data/file2
Let’s build this: docker build -t test-vol:2 .
and instantiate a container from it:
$> docker run --rm -it test-vol:2 bash
root@19310ad44bbb:/# ls /data/
file1
See? No file2
. It’s because the Volume is initialized with the data that is present in the image at the moment the VOLUME
statement is.
So the rule is that you should generally put your VOLUME
statements toward the end of the Dockerfile, so that you don’t accidentally miss some data.
Is that all right? I hope so.
Now that’s good and all, but what about mounting a host’s directory with a VOLUME
Dockerfile statement?
Actually take a minute to think about it: how would yo ugo about doing this?
…
It’s time to check if you’ve been listening—or rather reading—and thinking. The answer is: you can’t. And that’s logic: remember that Dockerfiles are used to build images. These images are then destined to be stored in a registry and then pulled by clients, which can then instantiate containers from it.
So the bottom line is—again—portability: suppose you specified a path /home/nschoe/data
to mount. The other computers on which the image is pulled will not have a /home/nschoe/data
.
This is textbook: a classical example of non-portability. So the answer is simple: when using VOLUME
inside a Dockerfile, you create a Volume that is mounted on this volume. Period.
Locating a Volume
So you have a score of containers that have been running for some time and a score of volumes. Now you need to identify which volume a container uses, how would you go about that?
There is a handy docker inspect
command that you can use on a container, and it gives a lot of information.
To filter out some information, we can use go templates. In particular, there is a “Mounts” section that we can use. We will use the json
function to format output nicely.
For example, on my machine:
$> docker inspect --format "{{json .Mounts}}" <container-name>
[{"Type":"volume","Name":"MyVolume","Source":"/var/lib/docker/volumes/MyVolume/_data","Destination":"/data","Driver":"local","Mode":"z","RW":true,"Propagation":""}]
Here we can see Type: "volume"
, which means that it’s a Docker Volume and not a host’s directory that has been mounted (otherwise it would be Type: "bind"
).
The second interesting part is Name: "MyVolume"
which gives the name of the Docker Volume. So this is a way we can find out what volume a container mounts.
Then we have Destination: "/data"
which tells us the mountpoint inside the container for the volume.
The option RW: true
tells us that we did not specify the :ro
flag, and that we can write in the container.
Now let’s see the output for a container which doesn’t mount a Named Volume, but a host’s directory:
$> docker inspect --format "{{json .Mounts}}" <container-name>
[{"Type":"bind","Source":"/home","Destination":"/data","Mode":"ro","RW":false,"Propagation":""}]
Here we see that Type
had taken value "bind"
instead of "volume"
.
Instead of Name
, we have an attribute Source
which gives us the host’s directory that is mounted inside the container.
Unfortunately, there is no easy way for a given Volume to know by which containers it is used. So you just have to iterate through all your containers.
Keeping a Clean System
We have introduced the Docker Volumes which—in a sentence—are used to make data persistent. With it, we need to introduce a new docker command: docker volume
.
We can create a volume by hand with docker volume create
and list them with docker volume ls
.
As usual, we strive to keep a clean system. Two solutions for this:
- Prior to version
1.12
, we can use the same command we used with images: we can list dangling volumes withdocker volume ls -qf dangling=true
and then use this indocker volume rm
- Starting with version
1.12
, you can simply usedocker volume prune
, which has several advantages:- it prompts for confirmation, so less chance to mess it up by error
- when it’s done, it prints the list of volumes it deleted and the space freed, so that’s always nice
Warning: Volumes, which are generally used to keep data persistent tend to take up space (since they store data, so we tend to find them holding database data, user-uploaded files, etc.).
So it’s important to run this prune
command (or the equivalent prior to version 1.12
) regularly to free some disk space.
Conclusion
Ah! So here we meet, finally!
I hope this article about Docker Named Volumes was good enough for you and I hope I explained it clearly enough so that Volumes have no more secrets for you.
Let us recap briefly what we have seen about Volumes, so that it serves as a reminder when you seek something.
Images are basically a set of read-only layers, which correspond to Dockerfile statements and represent individual increments (installing programs, creating directories, files, etc.) from a base image.
When we have an image, instantiating a container from it is as simple as—and as fast as—creating a new layer, with read-write permissions and stacking it on top of the image layers. From here on, everything you do inside a container is recorded in this container’s RW layer. This is great because it makes containers so cheap you can instantiate hundreds of them in the blink of an eye, but the cons are that when you destroy the container, you lose everything (because destroying a container is simply a matter of destroying the container’s RW layer), and thus, this mechanism is ill-fitted to store permanent data, such as the data from log files or from a database.
Then we discovered the Docker Volumes, which are a mechanism to bypass the unionFS. When you create a Volume and mount it inside a container’s path, every operations that is done at this mounpoint will not be recorded in the container’s RW layer, instead it will be recorded straight on the host’s disk, in a special, docker-specific, root-protected location (/var/lib/docker/volumes/
by default).
The advantage of this is that when you destroy the container, the data inside the volume remain on disk, this is called making data persistent. The cons of this is that it’s easy to forget about a Volume after the container(s) that use it is (are) deleted, and it can end up taking a lot of space on your hard drive—but this is where the docker volume prune
command becomes useful.
Then we saw a basic set of commands to deal with volumes (list them, mount them, delete them, delete all unused volumes, etc.).
We saw that docker provided a very easy way for a container to mount all volumes that another container uses. This is particulary useful when you need to mount the volumes of another container and the first container uses either a lot of volumes or uses Anonymous Volumes. It’s also very useful when you want to have temporary containers popping up, making management actions (such as backups, pruning, etc.) and then popping out.
Last but not least, we saw that we could also mount a host’s directory inside a container, to share data directly from the host to the container. We insisted that creating and mounting a Volume and sharing / mounting a host’s directory inside the container is not the same and they correspond to two different use-cases.
I invite you to re-read the section about this if it’s not absolutely clear, but the TL;DR is that:
- Mounting a Docker Volume is meant to store data persistently (database’s data, log files, etc.)
- Mounting a host’s directory is meant to share data between the host and the container (share some developement code, share
/etc/localtime
, etc.)
Obviously these are only two very common use-cases, but you can find some others. This very distinction is the core of this article, really. It’s the thing you need to understand, because when talking or dealing with Docker Volumes on the IRC #docker
chan, it’s the remark I make most often : “you’re not using Volumes correctly”. Now I made an (hopefully clear) article about them, so I hope it will help clear the confusion.
I think it’s all good for now, as usual, any remark is welcomed at , whether they are thank-you messages, positive or negative feedbacks, a request for clarifying some unclear notions, etc. Do not hesitate.
In the next article (Part V), we will see “the last” part of Docker before we can make really complex setups and become real Docker hackers: it will be about Docker Networks. Along with the Docker Volumes, the Docker Networks are—in my opinion—the greatest feature that Docker brings on the table. In my own stacks, I use them extensively and they allow you to ramp up your stacks, it’s almost undecent how cool they are.
But as often with Docker, they are a very misundertood feature, so I want to make the article extra-clear, with lots of examples etc. I will work as hard as I can not to take too long to write the article, but it’s most likely to be a big one too (but I think you’re accustomed now, and you seem to like it, don’t you?!).
So this is a good-bye, and see you in Part V!
Part V is available here.
January 28, 2017