Blog

The History of Containers

You've probably heard things like:

  • "It's like a lightweight VM, but not really a VM."
  • "They are just processes running on your OS."
  • "Containers do not virtualize the hardware, but they virtualize the OS."

I heard these quotes from colleagues, all meant to help me understand containers. But honestly? They didn't. They barely made sense.

For me, containers always felt like this new technology that everybody talked about but that I never really understood. Sure, I downloaded Docker a few times, played around with starting containers, but I had no idea what I was actually doing.

At work, we don't use containers directly, so I never came into contact with them professionally either. But recently, while studying for the AZ-204 Developer Associate exam, I noticed a section on containers.

That felt like the perfect opportunity to finally dive in and answer the question I had been quietly asking for years: what really is a container?

And I did that in the only way I know how: by diving back into the time machine and exploring the origins of this technology. What problem was it solving? What technologies came before it? And wow, are we in for an amazing, unexpected deep dive.

Join me as we travel from the earliest experiments in multiprogramming, through virtual machines, Linux kernel features, LXC, Docker, and up to the modern container ecosystem. By the end of this journey, containers won't feel like magic anymore; you'll see exactly how they work, and why they're such an elegant solution to a very old problem.




#1. Setting the stage (1950s)

To understand containers, we first need to go back over 70 years. In the late 1950s, computers were enormous, expensive, and extremely limited by today's standards. A single machine could cost hundreds of thousands of dollars (equivalent to several million today), and only a handful of people had access to one at a time.

The Second World War had ended just five years earlier, and it had given a tremendous boost to automated computing for military purposes. Machines like the Colossus in the UK and ENIAC in the US were developed to break codes, calculate artillery trajectories, and process large amounts of military data. After the war, this technology began to transition into scientific research and commercial computing.

Some examples of computers from that era illustrate the scale and limitations:

  • UNIVAC I (1951): The first commercially available computer in the United States, used for census and scientific calculations. It had about 1,000 words of memory and weighed over 13,000 pounds.
  • IBM 701 (1952): Designed for scientific applications, it had a memory of 2,048 36-bit words and could perform about 16,000 additions per second.
  • IBM 650 (1954): The first mass-produced computer, widely used in universities and businesses, with 2,000 words of magnetic drum memory and very slow I/O compared to CPU speed.

In these systems, CPU time was precious, while peripherals such as printers, tape drives, and disk storage were extremely slow. When a program needed to wait for input or output, the CPU often sat idle, wasting the expensive hardware.

Multiprogramming emerged in this context. It allowed multiple programs to reside in memory simultaneously. When one program became blocked waiting for I/O, the CPU could switch context and execute another program instead, dramatically improving overall efficiency and throughput.

The goal was clear: make the most of scarce hardware resources while keeping multiple processes safely isolated from each other.

## The Kernel

While multiprogramming improved hardware utilization, it also significantly increased the complexity of system software. Multiple processes now executed simultaneously and in an interleaved manner, interacting with shared hardware resources and, indirectly, with each other.

This increased complexity also raised the stakes. Any misbehaving program had the potential to disrupt other programs running on the same machine, or even compromise the entire system.

In response to these challenges, systems programmers came up with the concept of the kernel as a way to provide isolation and control. The kernel was designed as a small, privileged program with unrestricted access to all hardware resources and running processes. Its job was to manage potentially disruptive operations like memory and storage allocation, process scheduling, and interrupt handling, while keeping the rest of the system safe from accidental or malicious interference.

By introducing the kernel, these systems programmers effectively minimized complexity for application programmers. Instead of manually coordinating parallel execution and hardware access, application programmers could focus primarily on the internal logic of their own programs, relying on the kernel to safely manage shared system resources.




#2. Early Virtual Machines: 1960s through 1990s

Containers are often what gets all the attention today, but they arrived much later. To understand why they exist and how they work, we first need to look at the decades-long evolution of virtualization. By this point, systems already supported multiprogramming, allowing a computer to run multiple processes concurrently under a single operating system kernel.

Virtual machines grew as a continuation of what operating system kernels had already made possible. Initially, this work focused on extending existing kernel mechanisms, particularly memory protection, to create stronger isolation boundaries between execution environments. These improvements made shared systems more reliable, but they still assumed a relatively trusted environment. As computers transitioned into shared resources used by multiple users, teams, or even entire departments running widely different programs, that assumption no longer held.

At the same time, operating system kernels were still evolving rapidly, and kernel crashes or bugs were not uncommon. When all workloads shared the same kernel, a single fault could bring down the entire system.

In this new multi-user and multi-tenant context, the challenge shifted from simple process isolation to fault containment and administrative separation. Engineers needed stronger guarantees that failures in one workload would not affect others.

Virtual machines emerged as a response to these challenges by introducing a much harder isolation boundary. Each virtual environment ran its own kernel, behaving as if it were executing on dedicated hardware. If a kernel crashed, only that virtual machine was affected, making it possible to safely share expensive hardware across users and workloads.

By enabling multiple kernels to run simultaneously on a single physical machine, virtual machines provided a level of isolation, security, and flexibility that could not be achieved with a single shared kernel alone. Workloads could be managed, configured, and secured independently, even though they ultimately ran on the same underlying hardware.

From inside a virtual machine, a workload had access only to the resources assigned to it and experienced the illusion of running on its own dedicated hardware. Because this illusion was created below the operating system, virtual machines did not require any special support from the kernel software.

This model of virtualization is now commonly referred to as hardware-level compute virtualization.




#3. The Rise of Containers (2000s)

By the early 2000s, computing looked very different than it had just a decade earlier. Three major operating systems had emerged as dominant platforms. On the PC and enterprise side, Windows 2000 and Windows XP provided stable, widely used platforms for businesses and home users alike. Linux distributions had gained a foothold on servers and in data centers, offering a flexible open-source alternative for enterprise workloads. And on the desktop, Mac OS X was just around the corner, bringing a modern Unix-based system to consumer and professional users.

And there was a new kid on the block: the World Wide Web. First made publicly available in 1991, the web spent the 1990s slowly finding its footing. By the early 2000s, however, it was booming. Businesses were no longer experimenting with a handful of static web pages; they were building always-on web applications that users around the world depended on every day.

This explosion of web-driven services placed new demands on infrastructure. Companies needed to deploy applications quickly, scale them up and down in response to traffic, move workloads between machines with minimal downtime, and make efficient use of expensive hardware. Reliability and isolation were no longer just technical concerns; they were business-critical requirements.

To meet these challenges, massive data centers began emerging, hosting thousands of servers designed to serve multiple users, teams, or even entirely separate customers on the same physical hardware. Ensuring that these environments remained reliable, secure, and cost-effective set the stage for new approaches to efficiently managing compute resources at scale. While companies like VMware continued to heavily invest in optimizing virtual machines, a new approach was starting to emerge, lighter, faster, more flexible environments that would eventually become known as containers.

## Kernel-space & User-space

Before we can fully grasp this new wave of virtualization, there's one crucial concept we need to introduce: the separation of kernel-space and user-space, an architectural principle implemented by all modern operating systems.

In this model, the kernel runs in a privileged mode, with unrestricted access to CPU, memory, and other hardware resources. User-space, on the other hand, is restricted: programs there cannot directly touch memory used by other processes or interfere with hardware operations. All access and interaction with the hardware must go through the kernel-space, which acts as a gatekeeper.

These boundaries are enforced not just by software, but by the hardware itself. Modern CPUs include privilege levels (sometimes called "rings") and memory protection units, which ensure that instructions executed in user-space cannot compromise the kernel or other processes.

## The Main Idea

Virtual machines had become the workhorses of isolation: they were robust, secure, and reliable. But they were also heavy. Each workload carried an entire operating system with it, consuming disk space, memory, and time just to boot. So a few engineers began asking a different question: instead of virtualizing hardware and duplicating entire kernels, could we rely on the capabilities of a single, modern kernel to isolate workloads directly? If virtual machines offered maximum isolation at maximum cost, perhaps a slightly weaker boundary, enforced by the kernel itself, would be "good enough" while dramatically improving speed and efficiency.

That is the main idea behind containers. At their core, containers are isolated environments that each run in their own user-space on top of a shared kernel. Conceptually, this is often described as "virtualizing user-space": instead of virtualizing hardware, each container receives its own isolated user space, with its own file system, network interface, process IDs, and other resources, while the underlying kernel continues to provide security, scheduling, and resource control.

Early experiments with this idea explored how far this type of isolation could go. FreeBSD introduced jails, Solaris developed zones, and Linux saw early projects like VServer and Virtuozzo, which patched the kernel to improve isolation.

This model is commonly referred to as OS-level compute virtualization. But, in my opinion, that term is somewhat misleading. Traditionally, virtualization means creating a software-based version of something physical, as virtual machines do by emulating hardware and giving each workload the illusion of dedicated physical resources. However, an operating system is already a software construct. What containers are "virtualizing" is not hardware, but software itself.

In that sense, containers are less about virtualization and more about isolation. The term likely evolved by association with virtual machines, where "virtualization" had already become synonymous to isolation. For clarity, I will personally refer to this model as OS-level isolation.

## Inside a Container

This is the main idea in action: a container is a self-contained environment that runs on a shared kernel but sees only its own operating system resources. These resources include: a file system, process IDs, users, network interfaces, routing tables, hostname, etc.

The file system is perhaps the most intuitive example. What we experience as files and folders is already an abstraction created by the operating system. The directory tree we navigate is not the physical disk itself, but a structured view constructed by the OS. Virtual machines never had to worry about recreating that illusion. Each VM boots its own kernel, and that kernel naturally constructs its own independent file system on top of its virtual disk.

Containers operate at a higher layer. This means the kernel must create multiple independent file system views within the same operating system. Each container must believe it has its own root directory, its own installed programs, and its own configuration files. It is, in a sense, an illusion layered on top of an illusion.

Providing this level of isolation requires significant architectural features built directly into the kernel's source code. While other operating systems, such as FreeBSD and Solaris, explored similar concepts, Linux emerged as the primary breeding ground for these innovations. Its open-source nature, highly active development community, and widespread adoption in server environments made Linux an ideal platform for experimenting with and eventually standardizing kernel-level isolation mechanisms.




#4. Early Kernel Isolation Features

If we want true OS-level isolation enforced by a single shared kernel, the natural question becomes: what features must the kernel provide to make this possible?

First, we need separate views of the system itself. Each isolated environment must have its own file system hierarchy, its own process ID space, its own networking stack, and its own hostname. In other words, the kernel must be able to create multiple independent perspectives of core operating system resources.

Second, we need control over resource consumption. If multiple isolated workloads share one kernel, there must be mechanisms to limit and account for CPU time, memory usage, and I/O activity so that one environment cannot starve the others.

Third, we need fine-grained security controls. Traditional Unix systems distinguish primarily between normal users and the all-powerful root user. For safe isolation, that model is too coarse. The kernel must allow privileges to be broken into smaller units so that processes can operate with the principle of least privilege rather than unrestricted authority.

Interestingly, Linux already contained many of these building blocks before the idea of containers emerged. Capabilities, introduced in the late 1990s, split the root user into discrete privileges. chroot provided early file system isolation. Later, namespaces enabled separate views of processes, networking, and mounts, while control groups (cgroups) made it possible to limit and account for CPU and memory usage. Seccomp added the ability to restrict which syscalls a process could execute.

None of these mechanisms were originally designed as part of a unified container system. Early container implementations simply combined and extended these features into a cohesive model of isolation. To appreciate how containers emerged from this foundation, we need to examine these technologies more closely.

## chroot (1979)

The chroot command allows you to change the apparent root directory for a process, effectively isolating it from the rest of the file system.

Example:

mkdir -p /tmp/myroot
sudo chroot /tmp/myroot /bin/bash

However, if we just run the above snippet as is, you will see an error like:

chroot: failed to run command '/bin/sh': No such file or directory

This happens because chroot does not automatically populate the new root with any programs or libraries. Even the shell itself is just a user-space program, so if you want to interactively navigate your new root, you must copy in a shell binary.

So we adjust our code snippet

mkdir -p /tmp/myroot/bin
cp /bin/bash /tmp/myroot/bin/
sudo chroot /tmp/myroot /bin/bash

However, we will soon find that copying only /bin/bash is still not enough. Most binaries depend on shared libraries. Similarly, common commands like ls, mkdir, cp, cat, and others are not included automatically; they must be present inside the new root as separate binaries.

chroot therefore provides only a minimal building block. It changes the illusion of the file system, but it does not construct a usable environment for us. Everything inside the new root must be assembled manually.

Man page: https://www.man7.org/linux/man-pages/man2/chroot.2.html

## Capabilities (1990s)

Capabilities divide what was once unrestricted root authority into narrowly scoped kernel privileges. Instead of granting full administrative access, a process can receive only the specific privilege it requires.

For example, binding to privileged ports below 1024 normally requires root:

python3 -m http.server 80
# Permission denied

Rather than running the entire program as root, we can grant only the capability needed to bind low-numbered ports:

sudo setcap cap_net_bind_service=+ep /usr/bin/python3

The +ep flags refer to two important capability sets. The permitted set defines which capabilities a process is allowed to use. The effective set determines which of those capabilities are currently active.

By adding the capability to both sets, the program gains only the ability to bind privileged ports and nothing more. If a capability exists in the permitted set but not in the effective set, the process would need to explicitly enable it at runtime using syscalls.

Note: Using chroot requires the CAP_SYS_CHROOT capability.

Man page: https://man7.org/linux/man-pages/man7/capabilities.7.html

## Namespaces (2000s)

A namespace is a kernel feature that isolates what a process can see. More specifically, a namespace partitions a certain global system resource, resources that would normally be shared across the entire operating system.

These global resources include:

  • The list of running processes
  • The system hostname
  • Mounted file systems
  • Network interfaces and routing tables
  • User and group ID mappings
  • Inter-process communication mechanisms

At system boot, Linux creates one default instance of each namespace type (i.e. for each global resource). This is called the initial namespace. All processes start inside this shared environment and therefore see the same processes, the same network interfaces, the same mount points, and the same hostname.

To understand what this means in practice, let's first look at the process list in a normal shell (the initial PID namespace):

ps aux

Example output:

USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root         1  0.0  0.1 168108  5408 ?        Ss   10:00   0:02 /sbin/init
root       232  0.0  0.1 398524  8220 ?        Ssl  10:01   0:01 /usr/bin/dbus-daemon
roy       1534  0.5  0.3 899212 12324 pts/0    S+   10:15   0:15 bash
...

Here, we see processes from across the entire system: system services, background daemons, and other user sessions.

Now let's create a new PID namespace with the unshare command:

sudo unshare --fork --pid --mount-proc bash

Inside this new shell, run:

ps aux

Example output:

USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root         1  0.0  0.1 10480  1864 pts/0    S+   10:35   0:00 bash
root         2  0.0  0.1  9876  1592 pts/0    R+   10:35   0:00 ps aux

Notice the difference: the shell now sees only the processes created inside this namespace. From its perspective, it appears to be PID 1, even though on the host system it has a completely different process ID.

We can perform a similar experiment with networking. First, check your normal network interfaces:

ip a

You will typically see multiple interfaces such as eth0, wlan0, and lo.

Now create a new network namespace:

sudo unshare --net bash
ip a

Inside this namespace, you will usually see only the loopback interface (lo). All other network interfaces are invisible. From inside, it appears as if the system has no external network connectivity.

Linux supports several namespace types, each isolating a different category of global resources. These namespace types were not introduced all at once. Instead, they were added gradually over time as new isolation needs emerged:

  • Mount namespaces (2002): isolate the file system mount table (what is mounted where).
  • UTS namespaces (2006): isolate hostname and domain name.
  • IPC namespaces (2006): isolate shared memory and message queues.
  • PID namespaces (2008): isolate process ID number spaces.
  • Network namespaces (2009): isolate network devices and routing.
  • User namespaces (2013): isolate user and group ID mappings.

Individually, each namespace isolates only one aspect of the system. Combined, they create the illusion of a separate operating system environment, even though everything still runs on the same shared kernel.

Man page: https://man7.org/linux/man-pages/man7/namespaces.7.html

## Seccomp (2005)

Another important kernel feature used in container environments is seccomp, short for “secure computing.” Seccomp is a Linux kernel facility that lets a process restrict which system calls it is permitted to make.

System calls (syscalls) are the interface between user-space programs and the kernel. They let a process request services from the kernel, such as reading a file, creating a socket, forking a new process, or mounting a file system.

For example, container runtimes apply a default seccomp profile that blocks dozens of syscalls that are considered unnecessary or dangerous for most workloads. This means that even if an application or attacker were to try to use one of these blocked syscalls, the kernel would deny the call or terminate the process, reducing the potential for privilege escalation or escape from the container.

Man page: https://man7.org/linux/man-pages/man2/seccomp.2.html

## Control Groups (2006)

Around 2006, engineers at Google began work on a kernel mechanism originally called “process containers.” Later renamed control groups (or cgroups), this feature was merged into the mainline Linux kernel in early 2008.

Cgroups allow the kernel to limit, account for, and isolate resource usage across groups of processes. Rather than focusing on what processes see (like namespaces), cgroups focus on how much they can consume: CPU time, memory, disk throughput, and more.

Think of a cgroup as a group for processes that share a set of resource constraints. Once processes are placed into a cgroup, the kernel enforces whatever limits you define. These limits help prevent a runaway process from consuming all of a system's CPU or memory, and they provide predictable performance in multi-tenant environments.

Cgroups are typically exposed via a pseudo-file system mounted under /sys/fs/cgroup. You can create your own cgroup, assign limits, and place processes into it. For example, on a system with cgroup mounted at /sys/fs/cgroup:

sudo mkdir /sys/fs/cgroup/webapp

# Limit to 50% of one CPU core
echo "50000 100000" > /sys/fs/cgroup/webapp/cpu.max
# Limit memory to 100 MB
echo "100M" > /sys/fs/cgroup/webapp/memory.max

# Add a running process (PID 12345) to the cgroup
echo 12345 > /sys/fs/cgroup/webapp/cgroup.procs

In this example, processes in the webapp cgroup are constrained to use at most half a CPU core's worth of time and no more than 100 MB of memory.

Man page: https://man7.org/linux/man-pages/man7/cgroups.7.html




#5. LXC: Linux Containers (2008)

At this point, we've covered all the kernel mechanisms that make OS-level isolation possible: namespaces to control what a process can see, cgroups to limit what it can use, capabilities to pare down privileges, and seccomp to reduce the attack surface. While each of these interfaces has its place, using them all directly by hand would be fairly complex.

In 2008, a tool called LXC, short for Linux Containers, was introduced to make creating and running containers practical. LXC was the first complete container runtime built entirely on top of the vanilla Linux kernel; it did not require any custom patches.

What LXC provided was a convenient way to combine existing kernel features. Instead of manually invoking these features for each process, LXC offers simple command-line tools and configuration files that orchestrate them for you. In modern terminology, software that brings together kernel isolation primitives like this is generally called a container runtime.

## File Systems and Templates

One of the practical challenges LXC helps with is the container's file system. As you saw earlier, chroot, namespaces and cgroups establish the isolation mechanisms, but they do not populate a container with programs or libraries. A freshly created container begins with an essentially empty root file system.

To understand why that matters, it helps to look at what a normal Linux file system actually contains. On a typical host system, the root directory / is the top of a complete hierarchy of directories defined by the file system Hierarchy Standard (FHS). It usually looks roughly like this:


# Typical Linux Root file system

/
├── bin        Essential command binaries
│   ├── ls
│   ├── cp
│   └── bash
├── boot       Bootloader and kernel files
├── dev        Device files representing hardware
├── etc        System-wide configuration files
├── home       User home directories
│   ├── roy/
│   └── alice/
├── lib        Shared libraries used by binaries in /bin and /sbin
│   └── libc.so.6
├── media      Mount points for removable media (USB, CD-ROM)
├── mnt        Temporary mount points
├── opt        Optional third-party software
├── root       Home directory of the root user
├── sbin       System administration binaries
├── srv        Data served by system services
├── sys        Virtual file system for kernel and device information
├── tmp        Temporary files
├── usr        Secondary hierarchy (more binaries & libs)
└── var        Variable data such as log files

Every command you use [ls, cp, even the bash shell itself] is simply a binary stored somewhere in this directory tree. Those binaries depend on shared libraries located under directories such as /lib or /usr/lib. Without those files, the commands cannot execute.

Under LXC, each container's file system is not a magical construct, it is simply a directory inside the host's file system. By default, containers live under:


/
└── var/
    └── lib/
        └── lxc/
            ├── container1/
            │   └── rootfs/
            └── container2/
                └── rootfs/

From the host's perspective, rootfs is just another directory. From inside the container, however, that directory becomes /.

If rootfs is empty, then the container sees an empty system. No /bin, no /etc, no /lib. And therefore no shell, no utilities, and no working userland.

To make a container usable, its root file system must be populated with a complete Linux distribution layout similar to the structure shown earlier.

To automate this process, LXC introduced the concept of templates.

Templates are simply shell scripts stored under /usr/share/lxc/templates on the host:


/usr/share/lxc/templates/
├── lxc-alpine
├── lxc-centos
├── lxc-debian
├── lxc-fedora
├── lxc-ubuntu
└── lxc-download

Each template knows how to download or bootstrap a specific Linux distribution, extract its base file system into the container's rootfs, configure essential files, and prepare the container for its first launch.

A container can be created using a template with a command like:

lxc-create -n container1 -t ubuntu

When this command runs, LXC executes the corresponding template script. The script downloads a minimal Ubuntu root file system, extracts it into /var/lib/lxc/container1/rootfs, and generates a configuration file for the container.

After creation, the container directory typically looks something like this:


/var/lib/lxc/container1/
├── config       Config file defining namespace settings, cgroup limits, etc.
└── rootfs/      Populated file system root
    ├── bin/
    ├── boot/
    ├── dev/
    ├── ...
    └── var/

Once populated, the container can boot into this environment and behave much like a standalone Linux system.

## What an LXC Template Is - and Is Not

A word of caution: although template names resemble full Linux distributions, an LXC container is not a complete Linux distribution.

A traditional Linux distribution consists of two major parts:

  • The Linux kernel
  • The userland (libraries, utilities, configuration, package manager)

An LXC container provides only a minimal userland: a root file system containing binaries, shared libraries, and configuration files.

For example, if the host runs a Debian kernel and you create a container using an Ubuntu template, processes inside the container will still make syscalls to the host's Debian kernel.

This works because Linux maintains a stable syscall Application Binary Interface (ABI). The ABI defines how compiled user-space programs interact with the kernel at the binary level.

As long as the kernel continues to support the expected syscalls, user-space programs can run across different kernel versions and even across different distributions.

### Important Caveats

Despite this compatibility, there are important limitations.

If a program inside a container attempts to use a newer syscall that does not exist in the host kernel, that call will fail.

Kernel configuration also matters. If the host kernel was compiled without certain features (for example, specific networking modules or file system drivers), software inside the container may not function as expected.

Standard libraries such as libc often detect missing kernel features and fall back to older mechanisms when possible. However, this behavior is not guaranteed for all software.

Some applications bypass standard library wrappers and invoke syscalls directly. In such cases, compatibility issues are more likely to surface.

Finally, CPU architecture must match. User-space binaries compiled for one architecture (for example, x86_64) can only run on a kernel built for that same architecture.




#6. Docker and LXC (2013)

In the early 2010s, the computing landscape shifted dramatically once again. A decade earlier, the rise of the World Wide Web and large web services in the early 2000s had already changed expectations for always-on applications, user scale, and infrastructure demand. Those pressures drove the growth of massive data centers and a focus on efficiency at scale.

But a new model was emerging: The Cloud.

Cloud providers such as Amazon Web Services (AWS), Google Cloud Platform (GCP), and Microsoft Azure were redefining how infrastructure was consumed. Instead of buying hardware or manually provisioning virtual machines, companies could rent elastic compute and storage on demand.

Infrastructure became programmable, globally distributed, and available by API. At the same time, Platform as a Service (PaaS) offerings promised developers freedom from babysitting servers, operating systems, and deployments.

Developers were told to “push your code” and let the platform handle the rest.

One of the companies operating in the PaaS space was dotCloud. But building such a platform was far from simple. dotCloud, like many emerging cloud providers, had to scale applications up and down in response to traffic, move workloads between machines with minimal downtime, and make efficient use of increasingly expensive hardware.

Virtual machines helped, but they were heavy. Booting a VM could take minutes. Each VM required its own kernel. Resource overhead accumulated quickly at scale. The cloud providers needed something lighter and faster.

To solve this, dotCloud developed an internal tool that packaged applications together with their dependencies using this cool new idea called containers. As we now know, containers can start in seconds rather than minutes, consume fewer resources than full virtual machines, and share the host kernel instead of requiring one per instance.

What began as an internal infrastructure component soon proved powerful beyond its original purpose. In 2013, dotCloud released it as the open-source project we all know today called Docker.

Other cloud providers had build their own container-like technologies developed internally. Google, for example, had built a large-scale cluster management system called Borg. However, Borg was tightly coupled to Google's internal infrastructure and was never intended to be open sourced.

While Borg itself remained internal, it later served as a major conceptual foundation for Kubernetes. That story, however, is one for another time.

## Video

If you are curious, the original 2013 PyCon talk by Solomon Hykes offers a fascinating look at Docker's first public appearance.

## What made Docker special

In its earliest versions, Docker relied directly on LXC for container execution. So, from a purely technical standpoint, Docker did not invent containers; despite what you may have heard.

The real innovation was not the isolation mechanism itself, but how it was packaged, distributed, and made accessible to developers.

The real power of Docker was built around two key ideas: images and layering.

### Images

Although LXC provided the technical foundation for containerization, it operated at a relatively low level. Setting up a container for a specific application still required significant manual work.

With LXC, you could begin from a template that created a basic root file system. But you still had to install your application, resolve dependencies, configure environment variables, and prepare networking yourself.

This process could be automated using provisioning scripts. However, that approach introduced two persistent problems: portability and maintainability.

Moving a container to another machine often meant rerunning the same provisioning steps from scratch. Small environmental differences could lead to subtle failures. Scripts had to be written, updated, debugged, and versioned for every application.

The dotCloud team recognized that virtual machines had already solved this problem years earlier with a simple but powerful idea: the image.

A VM image captures the state of an entire virtual disk at a specific point in time. You provision once, save the disk into an image file, and can then launch identical machines repeatedly without repeating the setup process.

dotCloud brought this concept to containers through the container image.

Instead of replaying installation steps every time, a container image represents a fully prepared file system snapshot. This includes binaries, libraries, configuration, and your application's executables / code.

Docker was introduced as a program that takes such an image, instantiate a container from it, and immediately run the application, no additional provisioning required.

But Docker did not simply copy the virtual machine image model. It refined it to be lightweight, composable, and efficiently shareable by breaking images into reusable layers.

### Layers

At first glance, the problem appears trivial. Archive formats such as tar can package entire directory trees into a single file.

The tar format dates back to the 1970s and early UNIX systems. It was originally designed to bundle files for storage on magnetic tape while preserving directory structure, permissions, and timestamps.

Although tar itself does not compress data, it is commonly paired with tools such as gzip or bzip2, making it practical for storage and distribution.

So why not simply provision a container, archive its root file system into a .tar archive, distribute it, and call it a day?

And yes, that approach would function, but the developers saw an opportunity to drastically reduce storage footprint and network overhead. Most container images share a substantial common base. Thousands of applications may start from the same Ubuntu or Alpine root file system. If each image were stored as a single monolithic archive, that shared base would be duplicated repeatedly, wasting disk space and network bandwidth.

Updates would be just as inefficient. Changing a single library would require rebuilding and redistributing the entire archive. It would be like version-controlling source code by copying the entire project directory every time one file changes. It works, but it is profoundly wasteful.

Instead of treating an image as a single immutable blob, Docker represents it as a stack of smaller immutable layers. Each layer captures a specific file system change. The bottom layer typically contains a base OS file system. Subsequent layers record incremental modifications, such as installed packages, configuration updates, or application source code.

Because these layers are immutable, they can be safely reused across multiple images. A common base layer, for example, only needs to be stored once, even if thousands of images build upon it.

For example, an image for a Python app might consist of:

  1. A base layer providing core system utilities and a package manager
  2. A layer installing a Python runtime
  3. A layer copying in requirements.txt
  4. A layer installing application dependencies based on requirements.txt
  5. A final layer copying in the application source code

All of this sounds elegant in theory, immutable layers stacked on top of each other. But how are these separate layers actually combined into a single, coherent file system at runtime?

Bringing multiple independent file system layers together into something usable is no small feat. To understand how this works, we need to introduce another important Linux kernel primitive: the union file system.

### Union File System

A union file system is a storage mechanism that allows multiple file system layers to be combined into a single, unified view, while keeping those layers physically separate on disk.

In other words, several directories can be stacked on top of each other and presented to applications as if they were one coherent file system.

Linux has seen several union file system implementations over time, including UnionFS, AUFS, and OverlayFS.

UnionFS was one of the earliest practical implementations, developed in the mid-2000s. It was distributed as an external kernel module and had to be compiled and loaded separately.

However, its complexity and maintenance overhead prevented it from being merged into the mainline Linux kernel.

A later and more successful approach was OverlayFS. OverlayFS offered a simpler design and better performance, and was eventually merged into the Linux kernel in version 3.18, becoming the standard in-kernel union file system.

#### OverlayFS

To understand how OverlayFS works, consider the following layer structure.

OverlayFS simply presents us with a merged view of these layers as a single directory:

If a file exists in multiple layers, the version from the highest layer in the stack always takes precedence.

How can a higher layer delete a file that still physically exists in a lower layer?

OverlayFS solves this using a special marker called a whiteout. When a file is deleted in the upper layer, a whiteout entry is created to indicate that the corresponding file in a lower layer should be hidden. The file still exists on disk in the lower layer, but it disappears entirely from the merged view presented to applications.

If all image layers are immutable, how can a container modify its own file system at runtime?

The answer lies in an additional writable top-layer. OverlayFS always puts a writeable layer on top of all the immutable layers. Any file that is created or modified while the container is running is written exclusively to this upper layer.

This mechanism is known as copy-on-write. A file from a lower layer is only copied into the writable layer when it is modified.

### Wrapping it up

At this point, it is important to address a common source of confusion.

Unlike a virtual machine image, there is no single monolithic "image file" that represents a container image.

Instead, a container image is best understood as a data structure that describes a stack of file system layers. Each layer typically being stored as a separate .tar archive. Alongside these layers, the image includes a manifest file that defines layer order and a config file that specifies runtime configuration such as the entry point, environment variables, and networking settings.

In that sense, exporting an image might appear straightforward: it consists of its layer archives plus its metadata.

However, the key insight is that layers are not owned exclusively by a single image. This means an “image” is not a standalone blob of data, but rather a lightweight reference to a set of shared building blocks.

What makes this model powerful is also what makes it initially difficult to grasp: the same physical data on disk can simultaneously belong to many different images.

## Docker - The Container Engine

Armed with an understanding of images and layering, we can now clearly define what Docker actually is.

Docker is what we now call a container engine.

A container engine is a program takes a container image as input and produces a running container as output. In many ways, it is analogous to a hypervisor, but for OS-level isolation rather than hardware isolation.

Internally, the container engine performs several key steps to bring a container to life. First, it creates and assigns all the necessary Linux namespaces, cgroups, and capabilities to provide process, file system, user, and network isolation. Next, it initializes a union file system. The read-only layers from the container image are mounted one by one as lower layers, and finally, an empty writable layer is placed on top.

Any changes made to the file system while the container is running are written only to this writable upper layer. This ensures that the underlying image layers remain immutable. When the container is stopped, its writable layer can be discarded. Restarting the container creates a fresh empty writable layer, preserving the original state of the image.

The immutability of lower layers also enables reuse. Multiple containers can be started from the same image, each with its own independent writable layer, while sharing the same underlying read-only layers.

## Image Builders and Dockerfiles

So far, we have talked extensively about how container images are used and stored. The remaining question is how these images are actually created.

At first glance, manually creating and managing all of these file system layers seems like a lot of work.

To simplify this process, Docker provides an Image Builder, which is configured using Dockerfiles.

A Dockerfile is a text-based document where each line defines an instruction for building an image. These instructions can copy files, run commands, define environment variables, and specify how a container should start.

As an example, the following Dockerfile produces a ready-to-run Python application.


FROM python:3.13

WORKDIR /usr/local/app

# Install app dependencies
COPY requirements.txt ./
RUN pip install -r requirements.txt

# Copy in the source code
COPY src ./src

EXPOSE 5000

# Setup app user so the container does not run as root
RUN useradd -m app
USER app

CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8080"]
  

In general, each instruction in a Dockerfile results in a new file system layer, with a few exceptions for instructions that only define build-time or startup behavior.

### Step by step explanation

  • FROM python:3.13 defines the base image. It pulls an existing image and provides all of its layers as the foundation for the new image. While it does not technically create a new layer itself, it establishes the starting point for all subsequent layers.
  • WORKDIR sets the working directory for all subsequent RUN, COPY, ADD, CMD, and ENTRYPOINT instructions.
  • COPY requirements.txt ./ creates a new layer by copying the requirements.txt file from the host.
  • RUN pip install -r requirements.txt creates another new layer that contains all installed Python dependencies.
  • COPY src ./src adds yet another layer, copying the application source code into the image.
  • EXPOSE 5000 is a startup instruction. It informs Docker that the container intends to listen on port 5000 at runtime. Importantly, EXPOSE does not actually publish the port. It serves primarily as documentation. To publish a port, you must explicitly do so when running the container, for example using the -p flag with docker run to map a port, or the -P flag to publish all exposed ports to dynamically assigned host ports.
  • RUN useradd -m app creates a new layer that adds a non-root user to the container's file system.
  • USER app defines the default user and group that will be used for subsequent instructions during the build, as well as at runtime.
  • Finally, CMD specifies the default command that will be executed when a container is started from this image.

Another commonly used instruction is ENV, which sets environment variables that persist for the remainder of the build stage and are also available at runtime. If an environment variable is only needed during the build process, the ARG instruction should be used instead.




#7. Image Registries and Docker Inc. (2013)

Unlike virtual machine images, which are often distributed as single large files, container images are typically distributed through a container registry.

A registry is a network service that stores container images and makes them available for download.

When you run docker pull, Docker first contacts the registry to retrieve the image manifest.

The manifest describes the image in terms of the cryptographic hashes of its layers.

Docker then checks which layers already exist locally and downloads only the missing ones from the registry.

This design makes distributing container images far more efficient than copying large, monolithic virtual machine images.

In May 2013, Docker released version 0.4 along with the Docker Index, a public registry for publishing and discovering container images.

In October 2013, the company changed its name to Docker Inc..

The following year, the company restructured its business to focus entirely on Docker and its growing ecosystem.




#8. libcontainer and the Road to Standardization (2014)

By 2014, containers had gained significant traction, and development around Linux Containers, or LXC, suddenly accelerated.

After years of relatively slow progress, LXC began releasing new versions rapidly. In just a few months, there were more LXC releases than in the previous several years combined.

While this renewed activity was good for the ecosystem, it created a serious problem for Docker. Frequent changes in LXC broke Docker multiple times, highlighting the need for a more stable foundation.

Rather than choosing or maintaining a specific fork of LXC, the Docker team decided to take a different approach altogether.

## Why Docker Moved Away from LXC

Docker's reliance on LXC limited its ability to innovate.

LXC introduced its own abstractions, configuration formats, and design decisions that did not always align with Docker's goals. Docker wanted direct, programmatic access to Linux kernel features such as namespaces, cgroups, Linux capabilities, and seccomp.

By bypassing LXC entirely, Docker could reduce external dependencies, increase portability, and gain fine-grained control over container behavior.

The result was libcontainer, a pure Go library designed to interact directly with the Linux kernel.

libcontainer replaced LXC as Docker's runtime and was embedded directly into Docker itself.

Unlike LXC, libcontainer was not a standalone tool and provided no CLI. It was an open-source software library intended to be consumed by Docker or any other Go-based project.

#9. OCI: Open Container Initiative (2015)

One of the most important outcomes of libcontainer was a shift in mindset toward standardization and collaboration.

This shift led to the formation of the Open Container Initiative, or OCI, in 2015.

OCI is an open governance body created specifically to define open, vendor-neutral standards for container formats and runtimes.

The two primary initiators were Docker and CoreOS.

CoreOS, a company focused on a Linux operating system optimized for running containers, had previously criticized Docker and introduced its own container runtime called rkt.

OCI can be seen as Docker's response to these criticisms, aimed at avoiding fragmentation and establishing common ground across the industry.

Major vendors quickly joined the initiative, including AWS, Cisco, Google, IBM, Microsoft, Red Hat, and others, lending immediate credibility to the effort.

## OCI Specifications

Today, OCI maintains three primary specifications:

  • Runtime Specification (2015)
  • Image Specification (2016)
  • Distribution Specification (2018)

At a high level, an OCI-compliant container engine downloads an OCI image, unpacks it into an OCI file system bundle, and then executes that bundle using an OCI-compliant runtime.

## The OCI Runtime Specification

The runtime specification defines how a container and its configuration should be represented on a local file system.

Its goal is portability: any compliant runtime should be able to start a container from the same on-disk representation.

This representation is called an OCI bundle.

A bundle is simply a directory containing:

  • A config.json file describing configuration such as environment variables, namespaces, mounts, and capabilities
  • A root file system directory referenced by root.path in config.json

The runtime specification does not define how images are unpacked into a bundle. It only specifies that if the files are organized in this way, any compliant runtime must be able to run the container.

## runc: The Reference Runtime

To provide a concrete implementation of the OCI runtime specification, Docker wrapped libcontainer in a command-line tool called runc.

runc was submitted as the reference implementation for the OCI runtime specification.

Unlike libcontainer, runc is a standalone CLI tool capable of running OCI bundles directly.

Today, runc is widely used as the low-level container runtime beneath Docker, Kubernetes, Podman, and many other container platforms.

## containerd and the Decomposition of Docker

While runc standardized the low-level act of starting and stopping containers, many higher-level features remained tightly coupled to Docker itself.

These included pulling and pushing images, managing local image storage, tracking container lifecycles, and orchestrating calls to runc.

This tight coupling became problematic as systems like Kubernetes emerged.

Kubernetes originally relied on Docker as its container runtime, but Docker's API was never designed as a stable third-party integration point. Frequent breaking changes caused ongoing compatibility issues.

To address this, Docker began decomposing its large, monolithic daemon, dockerd.

One of the first major results of this effort was containerd.

containerd started as a dedicated daemon focused on container lifecycle management, essentially supervising containers and invoking runc when needed.

Over time, containerd was expanded to include core image functionality such as pulling and pushing images, unpacking layers, and managing local image storage.

Around early 2017, containerd was positioned as a vendor-neutral project with a stable API that orchestration systems like Kubernetes could rely on.

In March 2017, containerd was donated to the Cloud Native Computing Foundation (CNCF), a sister organization to OCI.

## What Docker Still Provides

With so much functionality standardized or delegated to containerd and runc, one might wonder what Docker still offers.

Docker remains responsible for its developer-focused build system based on Dockerfiles.

It also provides higher-level abstractions such as networking models, volume and storage management, and tooling for application composition and deployment.

Additionally, Docker delivers a rich ecosystem, including Docker Compose, Docker Hub, and Docker Swarm.

Together, these components position Docker not as the container runtime itself, but as a powerful developer experience built on top of standardized, vendor-neutral container infrastructure.

Copyright © All rights reserved | Template adapted from Colorlib