Chapter 12: Troubleshooting and Debugging Docker

Introduction

As you delve deeper into Docker, building more complex applications and services, you’ll inevitably encounter situations where things don’t work as expected. Containers might fail to start, services might not communicate, or performance could be suboptimal. This is where the crucial skills of troubleshooting and debugging come into play.

This chapter will equip you with the essential tools, commands, and strategies to diagnose and resolve common Docker-related issues. Understanding how to effectively debug your Dockerized applications will save you countless hours and significantly improve your development workflow.

Main Explanation

Effective troubleshooting in Docker involves a systematic approach, starting with observation and moving towards deeper inspection. Docker provides a rich set of commands and features designed to give you visibility into the state and behavior of your containers, images, networks, and volumes.

Common Docker Issues

Before diving into tools, let’s identify some frequently encountered problems:

Container not starting or exiting immediately: This is often due to an incorrect entrypoint/command, missing dependencies, or a misconfigured application inside the container.
Port conflicts: When a host port is already in use by another process or container, preventing a new container from binding to it.
Networking issues: Containers unable to communicate with each other, with the host, or with external services. DNS resolution problems are also common.
Image build failures: Errors during the docker build process, often related to incorrect Dockerfile instructions, missing files, or network issues during package downloads.
Volume mounting problems: Data not persisting, incorrect permissions, or volumes not appearing inside the container as expected.
Resource exhaustion: Containers consuming too much CPU, memory, or disk I/O, leading to performance degradation or crashes.
Permissions errors: Processes inside containers lacking the necessary permissions to read/write files or execute commands.

Essential Debugging Tools and Commands

Docker offers a suite of commands to help you peek into almost every aspect of your Docker environment.

docker logs [CONTAINER_ID_OR_NAME]: This is your first line of defense. It displays the standard output (stdout) and standard error (stderr) logs of a container.
- -f or --follow: Follow log output.
- --tail [NUMBER]: Show the last N lines.
- --since [TIMESTAMP]: Show logs since a specific timestamp.
docker ps -a: Lists all containers, including those that have exited. This is vital for seeing if a container started and then stopped immediately.
- -q or --quiet: Only display container IDs.
- -f or --filter: Filter output based on conditions (e.g., status=exited).
docker inspect [OBJECT_ID_OR_NAME]: Provides detailed low-level information about Docker objects (containers, images, networks, volumes). This JSON output contains configuration details, network settings, mount points, and more.
- --format '{{ .State.ExitCode }}': Extract specific information using Go templates.
docker exec -it [CONTAINER_ID_OR_NAME] [COMMAND]: Executes a command inside a running container. The -it flags are crucial for interactive sessions (like opening a shell).
- docker exec -it my-web-app bash: Open a bash shell inside the container.
docker events: Streams real-time events from the Docker daemon. Useful for observing when containers start, stop, die, or when images are pulled/pushed.
docker stats [CONTAINER_ID_OR_NAME]: Displays a live stream of resource usage statistics (CPU, memory, network I/O, disk I/O) for running containers.
docker system df: Shows Docker disk usage, detailing how much space is consumed by images, containers, local volumes, and build cache.
docker system prune: Cleans up unused Docker objects (stopped containers, dangling images, unused networks, build cache). Be cautious, as it removes non-running resources.
- -a or --all: Remove all unused images, not just dangling ones.
- --volumes: Also remove unused local volumes.

Strategies for Troubleshooting

Check Logs First (docker logs): Always start here. Most application errors or startup issues will be logged to stdout/stderr. If a container exits immediately, check its logs right after it stops.
Inspect Container State (docker ps -a, docker inspect): If logs are unhelpful or missing, check docker ps -a to confirm the container’s status. If it exited, docker inspect can reveal the ExitCode and Error message. Look at the Config and HostConfig sections for port mappings, volume mounts, and environment variables.
Access Inside Container (docker exec): If a container is running but misbehaving, use docker exec -it [container] bash (or sh if bash isn’t available) to get an interactive shell. From there, you can:
- Check file paths and permissions.
- Run application commands manually.
- Inspect network configuration (ip a, ping).
- Check installed packages.
Network Diagnostics: If containers can’t communicate:
- Use docker inspect [network_name] to see connected containers and their IP addresses.
- Use docker exec to ping another container’s IP or service name.
- Check firewall rules on the host.
Rebuild and Rerun: For issues during image builds, break down your Dockerfile into smaller steps. Build layer by layer to identify where the failure occurs. For persistent runtime issues, try removing the container and image, then rebuilding and rerunning.
Resource Monitoring (docker stats): If an application is slow or crashing under load, docker stats can quickly show if it’s hitting CPU or memory limits. You might need to adjust resource limits in your docker run command or docker-compose.yml.
Check Docker Daemon Status: Ensure the Docker daemon itself is running correctly. On Linux, systemctl status docker is a common command.
Consult Documentation and Community: When you encounter obscure errors, the official Docker documentation, Stack Overflow, and Docker forums are invaluable resources.

Examples

Let’s walk through a few common debugging scenarios.

Example 1: Container Exiting Immediately

Imagine you have a Dockerfile for a simple Python Flask app:

# Dockerfile
FROM python:3.9-slim-buster
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY app.py .
CMD ["python", "app.py"]

And app.py:

# app.py
from flask import Flask
app = Flask(__name__)

@app.route('/')
def hello():
    return "Hello, Docker!"

if __name__ == '__main__':
    # Typo here: 'run' instead of 'App.run'
    app.run(host='0.0.0.0', port=5000)

You build and run it:

docker build -t my-flask-app .
docker run -p 5000:5000 my-flask-app

The container starts and immediately stops.

Check docker ps -a:

docker ps -a
CONTAINER ID   IMAGE           COMMAND             CREATED          STATUS                      PORTS     NAMES
a1b2c3d4e5f6   my-flask-app    "python app.py"     5 seconds ago    Exited (1) 4 seconds ago              vigilant_darwin

Status Exited (1) indicates an error.

Check docker logs:

docker logs a1b2c3d4e5f6
Traceback (most recent call last):
  File "/app/app.py", line 10, in <module>
    app.run(host='0.0.0.0', port=5000)
AttributeError: 'Flask' object has no attribute 'run'

The logs immediately point to an AttributeError on line 10 of app.py. Upon reviewing the Python code, you’d find the typo (app.run vs App.run). Correcting it and rebuilding would fix the issue.

Example 2: Network Connectivity Issue

You have a docker-compose.yml with two services, web and db:

version: '3.8'
services:
  web:
    image: my-web-app:latest
    ports:
      - "80:80"
    environment:
      DB_HOST: db
      DB_PORT: 5432
  db:
    image: postgres:13
    environment:
      POSTGRES_DB: mydb
      POSTGRES_USER: user
      POSTGRES_PASSWORD: password

The web service can’t connect to the db service.

Check docker logs web: Look for connection errors. Suppose you see something like could not connect to server: Connection refused. This means the web app reached the db host, but the db app wasn’t listening or refused the connection.

Check docker ps:

docker ps
CONTAINER ID   IMAGE           COMMAND                  CREATED         STATUS         PORTS                  NAMES
f1g2h3i4j5k6   my-web-app      "nginx -g 'daemon of…"   2 minutes ago   Up 2 minutes   0.0.0.0:80->80/tcp     myproject_web_1
l7m8n9o0p1q2   postgres:13     "docker-entrypoint.s…"   2 minutes ago   Up 2 minutes   5432/tcp               myproject_db_1

Both containers are Up.

Access web container and test connectivity:

docker exec -it myproject_web_1 bash
ping db # Try to ping the service name
# PING db (172.20.0.2) 56(84) bytes of data.
# 64 bytes from db.myproject_default (172.20.0.2): icmp_seq=1 ttl=64 time=0.076 ms
# ... (ping successful)

If ping works, DNS resolution and basic network connectivity are fine. The issue is likely at the application level (e.g., db not accepting connections, wrong port, or web app misconfigured).

Access db container and check listening ports:
```
docker exec -it myproject_db_1 bash
netstat -tulnp | grep 5432 # (You might need to install net-tools or iproute2 first)
# tcp        0      0 0.0.0.0:5432            0.0.0.0:*               LISTEN      1/postgres
```
This confirms PostgreSQL is listening on port 5432 inside the container. The problem might then be in the web application’s database configuration (e.g., wrong username/password, or trying to connect before db is fully ready).

Example 3: Debugging a Failing Build

Suppose your Dockerfile fails at a RUN instruction:

FROM ubuntu:22.04
WORKDIR /app
COPY . .
RUN apt-get update && apt-get install -y non-existent-package # Typo here
CMD ["bash"]

docker build -t my-broken-image .

The build output will stop at the RUN command, showing an error like:

...
E: Unable to locate package non-existent-package
The command '/bin/sh -c apt-get update && apt-get install -y non-existent-package' returned a non-zero code: 100

To debug this:

Identify the failing layer: The build output clearly states the command that failed.
Inspect the build cache: Docker keeps intermediate layers. You can comment out the failing line, add CMD ["bash"] after the previous successful RUN command, build up to that point, and then run an interactive container from that intermediate image.
- Modify Dockerfile:
```
FROM ubuntu:22.04
WORKDIR /app
COPY . .
# RUN apt-get update && apt-get install -y non-existent-package
CMD ["bash"] # Temporarily added for debugging
```
- Build: docker build -t debug-build .
- Run and inspect: docker run -it debug-build bash Now you’re in the container at the state just before the problematic RUN command. You can manually try apt-get update and apt-get install -y non-existent-package to experiment and understand why it failed.

Mini Challenge

You have a Dockerized Node.js application. When you run it, you get a “connection refused” error when trying to access it via http://localhost:3000.

Here are the Dockerfile and app.js:

Dockerfile:

FROM node:18-alpine
WORKDIR /app
COPY package*.json ./
RUN npm install
COPY . .
EXPOSE 3000
CMD ["node", "app.js"]

app.js:

const express = require('express');
const app = express();
const port = 3000;

app.get('/', (req, res) => {
  res.send('Hello from Node.js!');
});

app.listen(port, () => {
  console.log(`App listening at http://localhost:${port}`);
});

You build and run it with:

docker build -t node-app .
docker run node-app

Your Task:

Identify why you can’t access the app on http://localhost:3000.
Provide the docker command(s) needed to fix this issue and make the app accessible.
Explain your reasoning.

Summary

Troubleshooting and debugging are indispensable skills for anyone working with Docker. By systematically using commands like docker logs, docker ps -a, docker inspect, and docker exec, you can gain deep insights into the state and behavior of your Dockerized applications. Remember to:

Start with logs: They often reveal the root cause directly.
Inspect thoroughly: Understand container configuration and state.
Go inside the container: Use docker exec for interactive diagnosis.
Monitor resources: Prevent performance bottlenecks with docker stats.
Clean up regularly: Use docker system prune to manage disk space.

Mastering these techniques will significantly boost your productivity and confidence in managing Docker environments.