Diving Deeper into Docker: Understanding Layers and Optimizing Build Time

Did you know that Docker maintains its cache? 🤔

Yes, it does—and it uses Layers to do so! In this article, we’ll explore how Docker efficiently manages caching through its layered architecture.

But that’s not all! We’ll also dive into two other critical aspects of Docker:

Volumes: For handling persistent data across containers
Networks: To enable seamless communication between containers

So, let’s get started! 🚀

PermalinkWhat is Layers

To understand Layers in Docker, think of each line in a Dockerfile as a separate layer.

# Layer 1: Base image
FROM node:18-alpine

# Layer 2: Set working directory
WORKDIR /usr/src/app

# Layer 3: Copy Files
COPY . .

# Layer 4: Install dependencies
RUN npm install

# Layer 5: Expose Port Number
EXPOSE 3000

# Layer 6: Start the app
CMD ["node", "index.js"]

Each command creates a new layer. Take a look at this visual representation:

From the first line (FROM node:18-alpine, Layer 1) to the last line (CMD ["node", "index.js"], Layer 5), Docker builds the image step by step.

PermalinkHow Layers Work

Caching: When you run docker build, Docker caches each layer. This means if nothing changes in a layer, Docker reuses it instead of rebuilding it.
Invalidation: If you change any command in the Dockerfile, Docker will rebuild that layer and all subsequent layers, starting from the modified line.

Let’s look at an example for understanding layers, in an easier way,

PermalinkFirst Example

PermalinkFirst Image:

PermalinkSecond Image:

In the First Image: The build process took 7.0 seconds because there were no cached layers. The fourth layer (RUN npm install) alone took 3.9 seconds, making it the most time-consuming step.
In the Second Image: The same process took only 1.4 seconds to build the image. This time, the most time-consuming step (RUN npm install) took 0 seconds because it was cached.

PermalinkWhy Did This Happen?

Docker had to build the first image from scratch. Since there were no cached layers, Docker had to run every command in the Dockerfile, including installing dependencies in the fourth layer (RUN npm install).

In the second image, Docker reused cached layers.

There were no changes in the Dockerfile or the working project.
As a result, Docker skipped re-executing the commands and used the cached results instead.
This is why the overall build took 1.4 seconds, and the fourth layer (RUN npm install) took 0 seconds.

PermalinkKey Insight from the Second Image:

CACHED appears before every cached layer, indicating that Docker is reusing the results. This caching mechanism saves significant time and resources during subsequent builds.

PermalinkSecond Example

Now that we’ve seen how caching works when there are no changes, let’s explore what happens when we make a change.

In this example, we’ll modify the index.js file and observe how Docker handles the layers during the rebuild.

PermalinkFirst Image:

PermalinkSecond Image:

In the First Image: The build process took 6.2 seconds because there were no cached layers except for the second layer. The npm install step alone took 2.9 seconds.
In the Second Image: After building the first image, some layers were cached. However, a small change in the working directory caused Docker to rebuild the image, taking 6.1 seconds, where the npm install step took 2.7 seconds.

Now, the question arises here, If npm install was cached, why did it still take 2.7 seconds? And why are there no cached layers visible after the second layer?

PermalinkWhy Did This Happen?

When building the first image, there were no cached layers, so the entire image was built from scratch.

For the second image, there was a difference between the current code and the previous code. This caused the COPY . . layer (Layer 3) to be invalidated, as it couldn't use the cached directory due to the changes in the code. As a result, all layers after this one were rebuilt.

PermalinkKey Insight from the Second Image:

If a cached layer is re-executed without using the cache, all subsequent layers will also be executed without a cache. This is why you won’t see "CACHE" before any steps following Layer 3.

PermalinkThe Flaw in Our Process

I have a question for you: Can you spot a simple flaw in our Dockerfile that could save us a lot of build time? 🤔

Here’s the issue: As shown in the second image above, every time we make a change to our project, the RUN npm install the command gets executed—even when it’s not necessary.

This happens in two scenarios:

When we make changes to the project files, Docker rebuilds layers and re-runs RUN npm install, even though dependencies haven’t changed.
When we install a new dependency, running npm install is essential, but it shouldn’t happen unnecessarily for unchanged files.

To optimize this, we need to fix our Dockerfile to avoid redundant execution of npm install and save the build time. Let’s solve this! 🚀

PermalinkSolution

To solve this issue, we just need to make a simple change in our Dockerfile. Here’s how it should be updated:

# Layer 1: Base image
FROM node:18-alpine

# Layer 2: Set working directory
WORKDIR /usr/src/app

# Layer 3: Copy the files that starts with package*
COPY package* .

# Layer 4: Installing dependency
RUN npm install

# Layer 5: Copy Files
COPY . .

# Expose Port 3000
EXPOSE 3000

# Run the app
CMD ["node", "index.js"]

In our solution, we’ve modified the Dockerfile to copy only the files that start with package*. We’ve added a dependency on the RUN npm install command, which means it will only be re-executed if there are any changes in the package files.

This will ensure that npm install only runs when there are changes in the package.json or package-lock.json files, and not when there are changes in other files in the project.

Now let’s look at the result ✨

That’s the result after building the image again, following a change in a file other than the package.json. As you can see, Docker did not execute the npm install command because it wasn't necessary. This is due to Docker’s layer caching mechanism, which ensures that unchanged layers are reused, saving time and resources.

PermalinkOutro

That’s it, everyone! We’ve explored how the layer mechanism works in Docker, with clear examples, and how it helps solve the issue of time-consuming rebuilds. I hope you found this blog helpful and that it will assist you in your DevOps journey.

Thank you for reading, and happy coding! 👨‍💻🚀