Skip to content

Instantly share code, notes, and snippets.

@LalatenduMohanty
Last active January 19, 2026 14:26
Show Gist options
  • Select an option

  • Save LalatenduMohanty/29d74438abbe38cb1b8a996646b47e7d to your computer and use it in GitHub Desktop.

Select an option

Save LalatenduMohanty/29d74438abbe38cb1b8a996646b47e7d to your computer and use it in GitHub Desktop.

Container Image Testing Design

This document describes the testing strategy for ODH base container images.

Goals

  1. Validate built images - Ensure images meet quality standards before merge
  2. Fast feedback - Tests run on every PR as pre-merge checks
  3. Local reproducibility - Developers can run the same tests locally
  4. No special hardware - Tests run on standard CI runners (no GPU required)

Test Architecture

┌─────────────────────────────────────────────────────────────┐
│                      GitHub Actions                         │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────────────┐  │
│  │ Lint Job    │  │ Build Python│  │ Build CUDA          │  │
│  │ (hadolint)  │  │ + Test      │  │ + Test (no GPU)     │  │
│  └─────────────┘  └─────────────┘  └─────────────────────┘  │
└─────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────┐
│                    pytest test suite                        │
│  ┌─────────────────┐  ┌─────────────────────────────────┐   │
│  │ conftest.py     │  │ test_python_image.py            │   │
│  │ - fixtures      │  │ test_cuda_image.py              │   │
│  │ - podman helper │  └─────────────────────────────────┘   │
│  └─────────────────┘                                        │
└─────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────┐
│              Container Runtime (podman)                     │
│  ┌─────────────────┐       ┌─────────────────┐              │
│  │ Python Image    │       │ CUDA Image      │              │
│  │ (under test)    │       │ (under test)    │              │
│  └─────────────────┘       └─────────────────┘              │
└─────────────────────────────────────────────────────────────┘

Test Categories

All tests below are blocking - PRs cannot merge if any fail.

1. Smoke Tests

Basic sanity checks that the image starts and core tools work.

Test Command Expected
Python version python --version Python 3.12.x
pip available pip --version Exit 0
uv available uv --version Exit 0

2. User & Permission Tests

Verify OpenShift compatibility (non-root user, correct permissions).

Test Check Expected
User ID id -u 1001
Group ID id -g 0 (root group)
Not running as root whoami Not root
Workdir writable touch /opt/app-root/src/test Exit 0

3. Configuration Tests

Verify package index configuration files exist and are valid.

Test Check Expected
pip.conf exists /etc/pip.conf File exists
pip.conf valid /etc/pip.conf Contains [global]
uv.toml exists /etc/uv/uv.toml File exists
UV_CONFIG_FILE set printenv UV_CONFIG_FILE /etc/uv/uv.toml

4. Image Metadata Tests

Verify Dockerfile directives are set correctly via podman inspect.

Directive Check Expected
WORKDIR .Config.WorkingDir /opt/app-root/src
USER .Config.User 1001

5. Environment Variable Tests

Verify expected environment variables are set.

Variable Expected Value
HOME /opt/app-root/src
PATH Contains /opt/app-root/bin
PYTHONDONTWRITEBYTECODE 1
PYTHONUNBUFFERED 1
PIP_NO_CACHE_DIR 1
UV_SYSTEM_PYTHON 1

6. OCI Label Tests

Verify required OCI and OpenShift labels are present.

Label Expected
name Image name set
version Version string set
io.k8s.display-name Kubernetes display name
org.opencontainers.image.source GitHub URL
com.opendatahub.accelerator cpu or cuda
com.opendatahub.python 3.12

7. File System Structure Tests

Verify expected directories and files exist.

Path Type Expected
/opt/app-root/src Directory Exists, is WORKDIR
/etc/pip.conf File Exists
/etc/uv/uv.toml File Exists

8. Security Tests

Basic security posture checks.

Test Check Expected
User is non-root Container starts as UID 1001 Not root
Sensitive files protected cat /etc/shadow Permission denied

9. CUDA-Specific Tests

Additional tests for CUDA image (no GPU required).

Test Check Expected
CUDA_VERSION Environment variable 12.8.x
NVIDIA_VISIBLE_DEVICES Environment variable all
nvcc exists which nvcc /usr/local/cuda/bin/nvcc
CUDA in PATH printenv PATH Contains /usr/local/cuda/bin
CUDA toolkit dir /usr/local/cuda Directory exists

10. CUDA Library Tests (no GPU required)

Verify CUDA shared libraries are present.

Library Check
libcudart ldconfig -p | grep libcudart
libcublas ldconfig -p | grep libcublas
libcudnn ldconfig -p | grep libcudnn

11. CUDA Label Tests

Verify CUDA-specific labels.

Label Expected
com.nvidia.cuda.version CUDA version string
com.opendatahub.accelerator cuda

Note: nvidia-smi requires GPU hardware and is skipped in CI.

Test Implementation

Directory Structure

tests/
├── conftest.py              # Shared fixtures and helpers
├── test_common.py           # Tests that apply to BOTH images
├── test_python_image.py     # Python-specific tests (labels)
└── test_cuda_image.py       # CUDA-specific tests

Test Dependencies

Create requirements-test.txt:

pytest>=8.0.0

Fixtures (conftest.py)

The test runner uses a session-scoped container for efficiency. Instead of starting a new container for each test (~30 container startups), we start one container per image and use podman exec to run commands. This reduces test time significantly.

import os
import subprocess
import json
import shlex
import pytest


class ContainerRunner:
    """Efficient container runner using session-scoped container with exec.

    Starts a single container per test session and uses 'podman exec' to run
    commands. This avoids the overhead of starting a new container for each test.
    """

    def __init__(self, image: str):
        self.image = image
        self.container_id = None

    def start(self):
        """Start container in background with sleep infinity."""
        result = subprocess.run(
            ["podman", "run", "-d", "--rm", self.image, "sleep", "infinity"],
            capture_output=True,
            text=True,
            timeout=60,
        )
        if result.returncode != 0:
            raise RuntimeError(f"Failed to start container: {result.stderr}")
        self.container_id = result.stdout.strip()

    def stop(self):
        """Stop and remove container."""
        if self.container_id:
            subprocess.run(
                ["podman", "stop", "-t", "1", self.container_id],
                capture_output=True,
                timeout=30,
            )
            self.container_id = None

    def run(self, command: str, timeout: int = 30) -> subprocess.CompletedProcess:
        """Execute command in running container using podman exec."""
        if not self.container_id:
            raise RuntimeError("Container not started. Call start() first.")
        return subprocess.run(
            ["podman", "exec", self.container_id, "bash", "-c", command],
            capture_output=True,
            text=True,
            timeout=timeout,
        )

    def get_env(self, var: str) -> str:
        """Get an environment variable value safely."""
        if not var.replace("_", "").isalnum():
            raise ValueError(f"Invalid environment variable name: {var}")
        result = self.run(f"printenv {var}")
        return result.stdout.strip() if result.returncode == 0 else ""

    def file_exists(self, path: str) -> bool:
        """Check if a file exists."""
        result = self.run(f"test -f {shlex.quote(path)}")
        return result.returncode == 0

    def dir_exists(self, path: str) -> bool:
        """Check if a directory exists."""
        result = self.run(f"test -d {shlex.quote(path)}")
        return result.returncode == 0

    def get_labels(self) -> dict:
        """Get image labels using podman inspect."""
        result = subprocess.run(
            ["podman", "inspect", "--format", "{{json .Config.Labels}}", self.image],
            capture_output=True,
            text=True,
            timeout=30,
        )
        if result.returncode == 0:
            return json.loads(result.stdout)
        return {}

    def get_config(self, key: str) -> str:
        """Get image config value using podman inspect."""
        result = subprocess.run(
            ["podman", "inspect", "--format", f"{{{{json .Config.{key}}}}}", self.image],
            capture_output=True,
            text=True,
            timeout=30,
        )
        if result.returncode == 0:
            return json.loads(result.stdout)
        return None


@pytest.fixture(scope="session")
def python_image():
    """Image name for Python base image."""
    return os.environ.get(
        "PYTHON_IMAGE",
        "localhost/odh-midstream-python-base:3.12-ubi9"
    )


@pytest.fixture(scope="session")
def cuda_image():
    """Image name for CUDA base image."""
    return os.environ.get(
        "CUDA_IMAGE",
        "localhost/odh-midstream-cuda-base:12.8-py312"
    )


@pytest.fixture(scope="session")
def python_container(python_image):
    """Session-scoped container runner for Python image.

    Container starts once at session start and stops at session end.
    All tests share the same running container.
    """
    runner = ContainerRunner(python_image)
    runner.start()
    yield runner
    runner.stop()


@pytest.fixture(scope="session")
def cuda_container(cuda_image):
    """Session-scoped container runner for CUDA image.

    Container starts once at session start and stops at session end.
    All tests share the same running container.
    """
    runner = ContainerRunner(cuda_image)
    runner.start()
    yield runner
    runner.stop()

Performance comparison:

Approach ~30 tests Container starts
New container per test ~60-90 seconds 30
Session container + exec ~5-10 seconds 1

Note: Since tests share the same container, avoid tests that modify global state. All current tests are read-only (checking env vars, file existence, running queries) so this is safe.

Example Tests (test_common.py)

import pytest


@pytest.fixture(params=["python_container", "cuda_container"])
def container(request):
    """Parameterize to run same tests against both images."""
    return request.getfixturevalue(request.param)


# --- Smoke Tests ---

def test_python_version(container):
    result = container.run("python --version")
    assert result.returncode == 0
    assert "Python 3.12" in result.stdout


def test_pip_available(container):
    result = container.run("pip --version")
    assert result.returncode == 0


def test_uv_available(container):
    result = container.run("uv --version")
    assert result.returncode == 0


# --- User & Permission Tests ---

def test_user_id(container):
    result = container.run("id -u")
    assert result.returncode == 0
    assert result.stdout.strip() == "1001"


def test_group_id(container):
    result = container.run("id -g")
    assert result.returncode == 0
    assert result.stdout.strip() == "0"


def test_not_root(container):
    result = container.run("whoami")
    assert result.returncode == 0
    assert result.stdout.strip() != "root"


def test_workdir_writable(container):
    result = container.run("touch /opt/app-root/src/test && rm /opt/app-root/src/test")
    assert result.returncode == 0


# --- Configuration Tests ---

def test_pip_conf_exists(container):
    assert container.file_exists("/etc/pip.conf")


def test_pip_conf_valid(container):
    result = container.run("cat /etc/pip.conf")
    assert "[global]" in result.stdout


def test_uv_toml_exists(container):
    assert container.file_exists("/etc/uv/uv.toml")


def test_uv_config_file_env(container):
    assert container.get_env("UV_CONFIG_FILE") == "/etc/uv/uv.toml"


# --- Image Metadata Tests ---

def test_workdir(container):
    assert container.get_config("WorkingDir") == "/opt/app-root/src"


def test_user(container):
    assert container.get_config("User") == "1001"


# --- Environment Variable Tests ---

def test_home(container):
    assert container.get_env("HOME") == "/opt/app-root/src"


def test_path_contains_app_root(container):
    assert "/opt/app-root/bin" in container.get_env("PATH")


def test_pythondontwritebytecode(container):
    assert container.get_env("PYTHONDONTWRITEBYTECODE") == "1"


def test_pythonunbuffered(container):
    assert container.get_env("PYTHONUNBUFFERED") == "1"


def test_pip_no_cache_dir(container):
    assert container.get_env("PIP_NO_CACHE_DIR") == "1"


def test_uv_system_python(container):
    assert container.get_env("UV_SYSTEM_PYTHON") == "1"


# --- Security Tests ---

def test_shadow_not_readable(container):
    result = container.run("cat /etc/shadow")
    assert result.returncode != 0

Example Tests (test_cuda_image.py)

# --- CUDA Environment Tests ---

def test_cuda_version(cuda_container):
    assert cuda_container.get_env("CUDA_VERSION").startswith("12.8")


def test_nvidia_visible_devices(cuda_container):
    assert cuda_container.get_env("NVIDIA_VISIBLE_DEVICES") == "all"


def test_cuda_in_path(cuda_container):
    assert "/usr/local/cuda/bin" in cuda_container.get_env("PATH")


# --- CUDA Toolkit Tests ---

def test_nvcc_exists(cuda_container):
    result = cuda_container.run("which nvcc")
    assert result.returncode == 0
    assert "/usr/local/cuda" in result.stdout


def test_cuda_dir_exists(cuda_container):
    assert cuda_container.dir_exists("/usr/local/cuda")


# --- CUDA Library Tests ---

def test_libcudart_present(cuda_container):
    result = cuda_container.run("ldconfig -p | grep libcudart")
    assert result.returncode == 0


def test_libcublas_present(cuda_container):
    result = cuda_container.run("ldconfig -p | grep libcublas")
    assert result.returncode == 0


def test_libcudnn_present(cuda_container):
    result = cuda_container.run("ldconfig -p | grep libcudnn")
    assert result.returncode == 0


# --- CUDA Label Tests ---

def test_cuda_version_label(cuda_container):
    assert "com.nvidia.cuda.version" in cuda_container.get_labels()


def test_accelerator_label(cuda_container):
    assert cuda_container.get_labels().get("com.opendatahub.accelerator") == "cuda"

Running Tests

Local Development

# Build image first
./scripts/build.sh python

# Install test dependencies
pip install -r requirements-test.txt

# Run tests for Python image
pytest tests/test_common.py tests/test_python_image.py -v

# Run tests for CUDA image
./scripts/build.sh cuda
pytest tests/test_common.py tests/test_cuda_image.py -v

Environment Variables

Variable Description Default
PYTHON_IMAGE Python image to test localhost/odh-midstream-python-base:3.12-ubi9
CUDA_IMAGE CUDA image to test localhost/odh-midstream-cuda-base:12.8-py312

CI Integration

GitHub Actions Workflow

jobs:
  build-python:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Build Python image
        run: ./scripts/build.sh python
      - name: Install test dependencies
        run: pip install -r requirements-test.txt
      - name: Run tests
        run: pytest tests/test_common.py tests/test_python_image.py -v

  build-cuda:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Build CUDA image
        run: ./scripts/build.sh cuda
      - name: Install test dependencies
        run: pip install -r requirements-test.txt
      - name: Run tests
        run: pytest tests/test_common.py tests/test_cuda_image.py -v

Future: GPU Testing

When self-hosted GPU runners are available:

@pytest.mark.gpu
def test_nvidia_smi(cuda_container):
    result = cuda_container.run("nvidia-smi")
    assert result.returncode == 0
    assert "CUDA Version" in result.stdout
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment