Skip to content

Instantly share code, notes, and snippets.

@nehalecky
Last active November 5, 2025 16:50
Show Gist options
  • Select an option

  • Save nehalecky/4807ffe147321914387858dd4633c5c7 to your computer and use it in GitHub Desktop.

Select an option

Save nehalecky/4807ffe147321914387858dd4633c5c7 to your computer and use it in GitHub Desktop.
Foundation Models Integration Design for Darts - Response to Issue #2933

Foundation Models Integration Design for Darts

Issue: #2933: Foundation Models Integration Last Updated: November 5, 2025

Problem Statement

Time series foundation models (TSFMs) like TimesFM, Chronos 2, Moirai, and TimeGPT each provide different APIs. Users wanting to compare models or switch between them must reimplement evaluation pipelines for each model. This creates friction and limits experimentation.

As @hrzn noted: "Darts could be a great neutral ground for multiple TSFMs under a unified API."

Goals

  1. Unified API: Single interface for zero-shot forecasting, covariates, and probabilistic predictions across all foundation models
  2. Leverage existing infrastructure: Historical forecasts, backtesting, ensembles work automatically
  3. Zero breaking changes: New functionality, no impact on existing Darts models
  4. Minimal CI/CD overhead: Tests don't download multi-GB models
  5. Extensible: Easy to add new foundation models (Moirai, TimeGPT, Lag-Llama)

Non-Goals

  1. Training foundation models from scratch (pre-trained only)
  2. Supporting non-forecasting foundation models (classification, anomaly detection)
  3. Replacing existing Darts models (additive, not replacement)
  4. Abstracting away all model differences (expose capabilities where appropriate)

Design Decisions

1. Extend GlobalForecastingModel (Not ForecastingModel)

Decision: Foundation models inherit from GlobalForecastingModel.

Rationale:

Foundation models are semantically "global" models:

  • Trained on massive diverse datasets spanning domains
  • Transfer patterns learned from billions of time points to new series
  • Pre-trained on many series, not fitted to a single local series

Benefits:

  • historical_forecasts(), backtest(), residuals() work immediately
  • Ensemble support via RegressionEnsembleModel
  • Metrics and evaluation framework integration
  • Consistent with PyTorch model hierarchy (requirement)

Alternatives Considered:

  • Standalone hierarchy outside ForecastingModel - rejected because loses Darts infrastructure
  • Extend ForecastingModel directly - semantically incorrect (foundation models aren't local)

2. Optional fit() for Zero-Shot Models

Decision: fit() is optional but enables unified Darts workflows.

Rationale:

Zero-shot foundation models don't require training, but fit() serves two purposes:

  1. Validation: Check input dimensions, covariate compatibility
  2. Metadata tracking: Store series properties for consistency checks
  3. API consistency: Enables historical_forecasts(), backtest() patterns

When lora_config is provided, fit() performs actual fine-tuning via PEFT/LoRA.

Community Concern Addressed: dennisbader's concern about fit() semantics resolved via dual-path implementation:

  • Zero-shot path: validate + track metadata (no training)
  • Fine-tuning path: apply PEFT adapters + train

Alternatives Considered:

  • Require fit() always - rejected because violates zero-shot semantics
  • Skip fit() entirely - rejected because breaks Darts API patterns
  • Separate ZeroShotModel base class - rejected as over-engineering

3. Lazy Loading with Device Management

Decision: Models download on first property access, not at initialization.

Rationale:

Foundation model weights are 120MB-2GB. Downloading during __init__() would:

  • Block CI/CD pipelines
  • Prevent model instantiation without network
  • Force users to wait even when using mocked tests

Lazy loading via @property delays download until actual use.

Implementation:

@property
def model(self):
    if not self._is_loaded:
        self._model = self._load_pretrained_model()
        self._is_loaded = True
    return self._model

Device Management: Automatic detection: CUDA > MPS > CPU, with manual override support.

Community Concern Addressed: dennisbader's CI/CD concern resolved - tests use mocks, never download models.

4. Minimal Base Class (Surgical Approach)

Decision: FoundationForecastingModel provides essential infrastructure only.

Rationale:

Initial implementation had registry patterns, validation helpers, and complex abstractions. Following maintainer feedback, these were removed in favor of:

  • Minimal base class (lazy loading, device management, PEFT hooks)
  • Model-specific implementations self-contained
  • Direct code over abstraction layers

Benefits:

  • Easier to understand and maintain
  • Less code to review
  • Clearer extension path for new models
  • Follows Darts principle: "better write clear code than fancy code"

What Base Class Provides:

  • Lazy loading infrastructure
  • Device detection and management
  • PEFT/LoRA integration hooks
  • Test patterns (mocking, integration marks)

What Models Implement:

  • _load_pretrained_model() - model-specific loading
  • predict() - forecasting logic
  • Optional: covariate handling, multivariate support

Community Concern Addressed: daidahao's unified base class request satisfied via Template Method pattern without over-abstraction.

5. Native Quantile Support

Decision: Use Darts quantile naming convention q{value:.3f} (e.g., q0.100, q0.500, q0.900).

Rationale:

Both models use native quantile heads that produce direct multi-step quantile forecasts in a single forward pass:

  • TimesFM 2.5: 10 fixed quantile levels [0.0, 0.1, 0.2, ..., 0.9]
  • Chronos 2: 21 fixed quantile levels [0.01, 0.05, 0.1, 0.2, ..., 0.9, 0.95, 0.99]

Chronos 2's richer quantile grid (21 vs 10) includes extreme quantiles (0.01, 0.99) for improved coverage of rare events, enhancing applicability to anomaly detection and risk-aware forecasting.

Both models expose quantiles via standard Darts API:

  • Use quantile_timeseries() to access specific quantiles
  • Follow established naming convention from issue #2933
  • Fixed quantile sets determined by model architecture

Benefits:

  • Efficient: Single forward pass for all quantiles
  • Deterministic: Same inputs produce same quantile forecasts
  • Standard Darts interface for consuming quantiles
  • Chronos 2's extreme quantiles enable tail risk analysis

Alternatives Considered:

  • Quantile registry system - rejected as over-engineering after maintainer feedback
  • Force uniform quantile sets across models - rejected because loses model-specific strengths

6. PEFT/LoRA for Fine-Tuning

Decision: Support parameter-efficient fine-tuning via PEFT library for models that allow it.

Rationale:

Fine-tuning foundation models on domain-specific data improves accuracy, but full fine-tuning is:

  • Computationally expensive
  • Memory intensive
  • Risks catastrophic forgetting

PEFT/LoRA fine-tunes <1% of parameters while preserving pre-trained knowledge.

Implementation:

  • peft_utils.py: Adapter management, trainable parameter tracking
  • Dual-path fit(): Routes to _train_with_peft() when lora_config provided
  • Models opt-in by implementing PEFT integration

Current Support:

  • Chronos 2: Full PEFT support
  • TimesFM: No upstream fine-tuning API

Future: New models inherit PEFT infrastructure automatically if upstream supports it.

Architecture

Class Hierarchy

GlobalForecastingModel (existing)
    └── FoundationForecastingModel (new base class)
            ├── TimesFMModel
            └── ChronosModel

Base Class Responsibilities

class FoundationForecastingModel(GlobalForecastingModel):
    """Base class for foundation models with lazy loading and device management."""

    def __init__(self, device: Optional[str] = None, lora_config: Optional[Dict] = None):
        """Initialize without downloading model weights."""

    @property
    def model(self):
        """Lazy load pre-trained model on first access."""

    @abstractmethod
    def _load_pretrained_model(self):
        """Subclasses implement model-specific loading."""

    def fit(self, series, ...):
        """Dual-path: validation (zero-shot) or fine-tuning (PEFT)."""

    @abstractmethod
    def predict(self, n, series, ...):
        """Subclasses implement forecasting logic."""

Model Capabilities Matrix

Capability TimesFM 2.5 Chronos 2 Design Support
Univariate Both
Multivariate Per-model
Past Covariates Per-model
Future Covariates Per-model
Quantile Forecasting ✅ (10 levels) ✅ (21 levels) Native quantile heads
Fine-tuning ✅ (PEFT) Infrastructure ready

Extension Path

Adding a new foundation model (e.g., Moirai):

  1. Subclass FoundationForecastingModel
  2. Implement _load_pretrained_model() - model-specific loading logic
  3. Implement predict() - forecasting logic
  4. Optional: Override fit() if model supports fine-tuning
  5. Optional: Handle covariates if model supports them

Infrastructure (lazy loading, device management, PEFT hooks, test patterns) works automatically.

Trade-offs

Simplicity vs. Abstraction

Chose: Simplicity

  • Removed registry patterns and validation helpers
  • More direct code in model implementations
  • Easier to understand, review, and extend

Trade-off: Some code duplication between models (acceptable for clarity)

API Consistency vs. Model Differences

Chose: Expose differences where meaningful

  • Not all models support covariates (reflect in API)
  • Not all models support fine-tuning (optional feature)
  • Different quantile generation methods (both valid)

Trade-off: Users must understand model capabilities (documentation addresses this)

Zero-Shot Semantics vs. Darts Patterns

Chose: Dual-path fit()

  • Optional for zero-shot (validation only)
  • Required for fine-tuning (actual training)

Trade-off: fit() semantics differ from traditional models (documentation clarifies)

Open Questions

  1. Repository structure: Is darts/models/forecasting/foundation/ the right location?
  2. Dependencies: Should we have darts[foundation] or keep separate darts[timesfm]/darts[chronos]?
  3. Documentation placement: Where should design docs live long-term?

References

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment