LalatenduMohanty/test_mode_design_document.md

## test_mode_design_document.md

      
    Raw
  

              test_mode_design_document.md
            
          
    --test-mode Design Document

Table of Contents


Overview
Problem Statement
Design Goals
Architecture
Implementation Details
API Design
Error Handling
Performance Considerations
Security Considerations
Testing Strategy
Real-World Example
Conclusion
Limitations and Future Work

Overview

--test-mode enables resilient bootstrap by continuing processing after package failures, attempting to download pre-built wheels as fallback, and reporting all failures at completion.
⚠️ Important: Test mode is only supported with bootstrap (serial mode), not bootstrap-parallel. This ensures comprehensive failure detection including wheel compilation failures. Most packages will use cached wheels, so serial mode performance is acceptable for testing scenarios.
Problem Statement

Current bootstrap fails at first package failure, preventing discovery of all problematic packages.
Failure Types Detected

With bootstrap --test-mode, all failure types are detected:

Wheel compilation failures (C extensions, Rust, missing build tools, etc.)
Source preparation failures (download, unpack, patch)
Dependency resolution failures
Build system requirement failures

Design Goals


Complete Discovery: Identify all failures in single run
Efficient Processing: O(n) time complexity
Clear Reporting: Actionable failure information
Compatibility: Maintain existing workflow behavior
Simplicity: Serial mode only - most packages use cache, performance is acceptable

Architecture

High-Level Design


      graph TD
    A[Bootstrap Command] --> B{Test Mode?}
    B -->|Yes| C[Test Mode Bootstrap]
    B -->|No| D[Normal Bootstrap]
    
    C --> E[Process Package]
    E --> F{Build Success?}
    F -->|Yes| G[Continue to Next]
    F -->|No| H[Attempt Pre-built Download]
    H --> I{Download Success?}
    I -->|Yes| G
    I -->|No| J[Add to Failed Set]
    J --> G
    
    G --> K{More Packages?}
    K -->|Yes| E
    K -->|No| L[Generate Report]
    L --> M[Exit with Status]
    
    D --> N[Process Package]
    N --> O{Build Success?}
    O -->|Yes| P[Continue to Next]
    O -->|No| Q[Fail Fast]
    
    P --> R{More Packages?}
    R -->|Yes| N
    R -->|No| S[Complete]

    
      Loading

  
Core Components


CLI Interface: --test-mode flag with error handling and reporting
Bootstrapper Class: Test mode state, failure tracking, runtime package settings modification
Build Pipeline: Individual package processing with pre-built fallback strategy

Implementation Details

Key Data Structures

@dataclasses.dataclass
class BuildResult:
    wheel_filename: pathlib.Path | None = None
    sdist_filename: pathlib.Path | None = None
    unpack_dir: pathlib.Path | None = None
    source_url_type: str = "unknown"
    sdist_root_dir: pathlib.Path | None = None
    build_env: build_environment.BuildEnvironment | None = None
    failed: bool = False
    
    # Context fields for error tracking and reporting
    req: Requirement | None = None
    resolved_version: Version | None = None
    exception: Exception | None = None
    exception_type: str | None = None  # Serializable: exception.__class__.__name__
    exception_message: str | None = None  # Serializable: str(exception)

# In Bootstrapper.__init__():
self.failed_builds: list[BuildResult] = []  # Replaces old self.failed_packages: set[str]
Core Algorithm

Single-Pass Processing: Each package processed once with immediate failure handling and pre-built fallback.
def _build_package(self, req, resolved_version, pbi, build_sdist_only) -> BuildResult:
    """Build or download package - handles test mode failures gracefully."""
    try:
        # Attempt normal build process
        return self._build_wheel_and_sdist(req, resolved_version, pbi, build_sdist_only)
    except Exception as build_error:
        if not self.test_mode:
            raise  # Re-raise in normal mode (fail-fast behavior)
        
        # Test mode: try pre-built fallback
        logger.warning("test mode: build failed for %s==%s, attempting fallback to pre-built", 
                      req.name, resolved_version, exc_info=True)
        
        try:
            # Directly resolve and download pre-built wheel
            wheel_url, _ = self._resolve_prebuilt_with_history(
                req=req, req_type=RequirementType.TOP_LEVEL
            )
            wheel_filename, unpack_dir = self._download_prebuilt(
                req=req,
                req_type=RequirementType.TOP_LEVEL,
                resolved_version=resolved_version,
                wheel_url=wheel_url,
            )
            logger.info("test mode: successfully handled %s as pre-built after build failure", req.name)
            return BuildResult(
                wheel_filename=wheel_filename,
                unpack_dir=unpack_dir,
                source_url_type=str(SourceType.PREBUILT),
            )
        except Exception as prebuilt_error:
            # Even pre-built fallback failed - track failure with full context
            logger.error("test mode: failed to handle %s as pre-built: %s", 
                        req.name, prebuilt_error, exc_info=True)
            
            # Create failure result with captured exception details
            result = BuildResult.failure(
                req=req, 
                resolved_version=resolved_version, 
                exception=build_error  # Original build error, not prebuilt fallback error
            )
            self.failed_builds.append(result)
            return result
How it works:

Single Attempt: Each package gets exactly one build attempt
Immediate Failure Handling: Exception caught immediately, no retry loops
Test Mode Check: Only applies fallback logic in test mode
Pre-built Fallback: Directly resolve and download pre-built wheel from wheel servers
Failure Tracking: Capture full context (requirement, version, exception) for detailed reporting
Continue Processing: Return result (success or failure) and continue to next package

Pre-built Fallback Mechanism: When a build fails in test mode, the system directly attempts to resolve and download a pre-built wheel from configured wheel servers.
# In Bootstrapper class
def _resolve_prebuilt_with_history(
    self,
    req: Requirement,
    req_type: RequirementType,
) -> tuple[str, Version]:
    """Resolve pre-built wheel URL, checking previous bootstrap graph first."""
    # Check previous bootstrap graph for cached resolution
    cached_resolution = self._resolve_from_graph(
        req=req,
        req_type=req_type,
        pre_built=True,
    )
    
    if cached_resolution and not req.url:
        wheel_url, resolved_version = cached_resolution
        logger.debug(f"resolved from previous bootstrap to {resolved_version}")
    else:
        # Resolve from wheel servers (PyPI, cache server, etc.)
        servers = wheels.get_wheel_server_urls(
            self.ctx, req, cache_wheel_server_url=resolver.PYPI_SERVER_URL
        )
        wheel_url, resolved_version = wheels.resolve_prebuilt_wheel(
            ctx=self.ctx, req=req, wheel_server_urls=servers, req_type=req_type
        )
    return (wheel_url, resolved_version)

def _download_prebuilt(
    self,
    req: Requirement,
    req_type: RequirementType,
    resolved_version: Version,
    wheel_url: str,
) -> tuple[pathlib.Path, pathlib.Path]:
    """Download pre-built wheel and unpack metadata."""
    logger.info(f"{req_type} requirement {req} uses a pre-built wheel")
    
    wheel_filename = wheels.download_wheel(req, wheel_url, self.ctx.wheels_prebuilt)
    unpack_dir = self._create_unpack_dir(req, resolved_version)
    # Update the wheel mirror so pre-built wheels are indexed
    # and available to subsequent builds that need them as dependencies
    server.update_wheel_mirror(self.ctx)
    return (wheel_filename, unpack_dir)
How it works:

Direct Resolution: When build fails, directly resolve pre-built wheel URL from wheel servers
History Check: First checks previous bootstrap graph for cached resolutions
Server Fallback: Falls back to resolving from configured wheel servers (PyPI, cache server, etc.)
Download and Index: Downloads wheel and updates wheel mirror for subsequent builds
No State Mutation: Does not modify package settings or mark packages as pre-built
Simple and Efficient: Direct approach without runtime state management

API Design

CLI Interface

Command Line Options

@click.option(
    "--test-mode",
    "test_mode",
    is_flag=True,
    default=False,
    help="Test mode: mark failed packages as pre-built and continue, report failures at end",
)
Usage Examples

# Basic test mode usage
fromager bootstrap --test-mode package1 package2 package3

# Test mode with requirements file
fromager bootstrap --test-mode -r requirements.txt

# Test mode with requirements and constraints files
fromager -c constraints.txt bootstrap --test-mode -r requirements.txt

# Test mode discovers all build failures in serial mode
# Most packages will use cached wheels, so performance is acceptable
fromager bootstrap --test-mode -r large-requirements.txt
Programmatic Interface

# Bootstrapper initialization
Bootstrapper(ctx: WorkContext, test_mode: bool = False)

# Package build method with full type annotations
def _build_package(
    self,
    req: Requirement,
    resolved_version: Version,
    pbi: PackageBuildInfo,
    build_sdist_only: bool
) -> BuildResult
Error Handling

Recovery Strategy


Build Failure: Attempt to download pre-built wheel from configured servers
Fallback Failure: Return BuildResult.failure(), continue processing
Dependency Failure: Skip dependencies for failed packages

Performance Considerations

Time Complexity


Single-Pass Processing: O(n) where n = number of packages
Each package processed exactly once
No retry loops or backtracking
Efficient failure handling with immediate fallback

Serial Mode Performance


Cached Wheels: Most packages use cached wheels (fast downloads)
Build-Required Packages: Only packages requiring compilation are slow
Acceptable for Testing: Complete failure discovery justifies serial processing
Not for Production: For production builds, use regular bootstrap mode

Memory Usage


Failed Builds List: O(f) where f = number of failures
Typical Case: f << n (failures much less than total packages)
Lightweight Objects: BuildResult contains metadata only, no source code
Minimal Overhead: Memory usage proportional to failure count, not total packages

Comparison with Parallel Mode


Serial Mode: Detects 100% of failures (source prep + wheel builds)
Parallel Mode: Would only detect ~20% of failures (source prep phase)
Trade-off: Slightly slower but complete failure discovery

Security Considerations

Exception Information Disclosure


Exception messages may contain sensitive path information
Logs should be reviewed before sharing publicly
Consider sanitizing paths in exception messages for public reports

Pre-built Package Fallback


Pre-built fallback directly downloads wheels from configured servers
Does not modify package settings or configuration files
Only attempts download when build fails in test mode
No persistent state changes, scoped to current bootstrap run only

Wheel Download Security


Pre-built fallback downloads from configured wheel servers
Uses existing authentication and verification mechanisms
No new security vectors introduced
Follows same security policies as normal bootstrap

Testing Strategy

Core Tests

Tests in tests/test_bootstrap_test_mode.py:


test_test_mode_tracks_complete_failures

Verifies test mode tracks failures with full context (req, version, exception)
Tests pre-built fallback mechanism when build fails
Validates failed_builds list population when both build and fallback fail


test_normal_mode_still_fails_fast

Ensures test_mode=False preserves fail-fast behavior
Regression test for existing functionality


test_build_result_captures_exception_context

Tests BuildResult.failure() enhancement
Verifies exception_type and exception_message fields


Real-World Example

Scenario: Testing a Large Requirements File

# Start test mode bootstrap
$ fromager bootstrap --test-mode -r requirements.txt

INFO stevedore: building wheel...
INFO pbr: building wheel...
WARNING test mode: build failed for cryptography==41.0.0, attempting fallback to pre-built
INFO test mode: successfully handled cryptography as pre-built after build failure
WARNING test mode: build failed for lxml==4.9.0, attempting fallback to pre-built
ERROR test mode: failed to handle lxml as pre-built: No matching wheel found
WARNING test mode: build failed for pillow==10.0.0, attempting fallback to pre-built
ERROR test mode: failed to handle pillow as pre-built: RuntimeError: zlib not found

...

ERROR test mode: the following packages failed to build:
ERROR   - cryptography==41.0.0
ERROR     Error: CalledProcessError: Command '...' returned non-zero exit status 1
ERROR   - lxml==4.9.0
ERROR     Error: ValueError: No matching wheel found
ERROR   - pillow==10.0.0
ERROR     Error: RuntimeError: zlib development headers not found

ERROR
ERROR test mode: failure breakdown by type:
ERROR   CalledProcessError: 1 package(s)
ERROR   RuntimeError: 1 package(s)
ERROR   ValueError: 1 package(s)

ERROR test mode: 3 package(s) failed to build
Actionable Insights

This output helps you:


Install missing build dependencies
# From the error messages, we know we need:
sudo apt-get install libssl-dev libxml2-dev libxslt1-dev zlib1g-dev


Verify wheel availability

cryptography==41.0.0: Pre-built wheel available (fallback succeeded)
lxml==4.9.0: No wheel available, must build from source
pillow==10.0.0: Wheel might be available after installing zlib headers


Complete failure list in one run

No need to fix one issue and re-run repeatedly
All 3 problems discovered in a single bootstrap attempt
Save time by parallelizing fixes (install all deps at once)


Conclusion

The --test-mode feature enables resilient bootstrap processing with comprehensive failure discovery. Key design decisions:

Single-pass processing for O(n) efficiency
Direct pre-built wheel download for immediate fallback without state mutation
Comprehensive error reporting for actionable information
Compatibility with existing workflows
Serial mode only - simplifies implementation while catching all failure types

Scope: Test mode is only supported with bootstrap (serial mode):

Detects all failure types: wheel compilation, source preparation, and dependency resolution
Serial mode performance is acceptable since most packages use cached wheels
Simpler implementation without the complexity of parallel build coordination

Rationale for Serial Only: Adding test mode to bootstrap-parallel would:

Only catch ~20% of failures (source prep), missing ~80% (wheel builds)
Add significant complexity to parallel build phase
Give false confidence about test coverage

This design enables complete failure discovery in a single run while keeping the implementation maintainable.
Limitations and Future Work

Current Limitations


Serial Mode Only

Not supported in bootstrap-parallel
Rationale: Parallel mode would only catch ~20% of failures (source prep phase)
Wheel compilation failures (~80% of issues) would be missed in parallel mode
Serial mode provides complete failure visibility


Fixed Retry Strategy

Uses n+1 retry approach (build attempt, then pre-built fallback)
No configurable retry policies
Single pre-built fallback attempt per package


Human-Readable Output Only

Logs are formatted for human consumption
No machine-readable output format (e.g., JSON)
Integration with CI/CD tools requires log parsing


No Resume Capability

Cannot resume from last successful package
Must reprocess all packages on subsequent runs
No checkpoint/state persistence between runs


Future Enhancements


Machine-Readable Output
fromager bootstrap --test-mode --json-output failures.json -r requirements.txt

Export failure details in JSON format for tooling integration
Enable automated analysis and reporting
Support CI/CD pipeline integration


Configurable Retry Policies
# settings.yaml
test_mode:
  max_retries: 3
  retry_delay: 5
  fallback_strategies:
    - pre_built_wheel
    - cached_wheel
    - skip


Resume from Checkpoint
fromager bootstrap --test-mode --resume-from checkpoint.json -r requirements.txt

Save progress to checkpoint file
Resume from last successful package
Useful for large requirement files with long build times


Enhanced Failure Classification

Categorize failures by root cause (missing build tools, dependency issues, etc.)
Suggest specific remediation steps
Group related failures for batch fixing


CI/CD Integration
# GitHub Actions annotations
fromager bootstrap --test-mode --github-actions -r requirements.txt

# GitLab CI format
fromager bootstrap --test-mode --gitlab-ci -r requirements.txt

Native integration with GitHub Actions workflow commands
GitLab CI test reports
Better visibility in CI/CD environments
No results found