nehalecky/foundation-models-design-FINAL.md

## foundation-models-design-FINAL.md

      
    Raw
  

              foundation-models-design-FINAL.md
            
          
    Foundation Models Integration Design for Darts


Issue: #2933: Foundation Models Integration
Last Updated: November 5, 2025

Problem Statement

Time series foundation models (TSFMs) like TimesFM, Chronos 2, Moirai, and TimeGPT each provide different APIs. Users wanting to compare models or switch between them must reimplement evaluation pipelines for each model. This creates friction and limits experimentation.
As @hrzn noted: "Darts could be a great neutral ground for multiple TSFMs under a unified API."
Goals


Unified API: Single interface for zero-shot forecasting, covariates, and probabilistic predictions across all foundation models
Leverage existing infrastructure: Historical forecasts, backtesting, ensembles work automatically
Zero breaking changes: New functionality, no impact on existing Darts models
Minimal CI/CD overhead: Tests don't download multi-GB models
Extensible: Easy to add new foundation models (Moirai, TimeGPT, Lag-Llama)

Non-Goals


Training foundation models from scratch (pre-trained only)
Supporting non-forecasting foundation models (classification, anomaly detection)
Replacing existing Darts models (additive, not replacement)
Abstracting away all model differences (expose capabilities where appropriate)

Design Decisions

1. Extend GlobalForecastingModel (Not ForecastingModel)

Decision: Foundation models inherit from GlobalForecastingModel.
Rationale:
Foundation models are semantically "global" models:

Trained on massive diverse datasets spanning domains
Transfer patterns learned from billions of time points to new series
Pre-trained on many series, not fitted to a single local series

Benefits:

historical_forecasts(), backtest(), residuals() work immediately
Ensemble support via RegressionEnsembleModel
Metrics and evaluation framework integration
Consistent with PyTorch model hierarchy (requirement)

Alternatives Considered:

Standalone hierarchy outside ForecastingModel - rejected because loses Darts infrastructure
Extend ForecastingModel directly - semantically incorrect (foundation models aren't local)

2. Optional fit() for Zero-Shot Models

Decision: fit() is optional but enables unified Darts workflows.
Rationale:
Zero-shot foundation models don't require training, but fit() serves two purposes:

Validation: Check input dimensions, covariate compatibility
Metadata tracking: Store series properties for consistency checks
API consistency: Enables historical_forecasts(), backtest() patterns

When lora_config is provided, fit() performs actual fine-tuning via PEFT/LoRA.
Community Concern Addressed:
dennisbader's concern about fit() semantics resolved via dual-path implementation:

Zero-shot path: validate + track metadata (no training)
Fine-tuning path: apply PEFT adapters + train

Alternatives Considered:

Require fit() always - rejected because violates zero-shot semantics
Skip fit() entirely - rejected because breaks Darts API patterns
Separate ZeroShotModel base class - rejected as over-engineering

3. Lazy Loading with Device Management

Decision: Models download on first property access, not at initialization.
Rationale:
Foundation model weights are 120MB-2GB. Downloading during __init__() would:

Block CI/CD pipelines
Prevent model instantiation without network
Force users to wait even when using mocked tests

Lazy loading via @property delays download until actual use.
Implementation:
@property
def model(self):
    if not self._is_loaded:
        self._model = self._load_pretrained_model()
        self._is_loaded = True
    return self._model
Device Management:
Automatic detection: CUDA > MPS > CPU, with manual override support.
Community Concern Addressed:
dennisbader's CI/CD concern resolved - tests use mocks, never download models.
4. Minimal Base Class (Surgical Approach)

Decision: FoundationForecastingModel provides essential infrastructure only.
Rationale:
Initial implementation had registry patterns, validation helpers, and complex abstractions. Following maintainer feedback, these were removed in favor of:

Minimal base class (lazy loading, device management, PEFT hooks)
Model-specific implementations self-contained
Direct code over abstraction layers

Benefits:

Easier to understand and maintain
Less code to review
Clearer extension path for new models
Follows Darts principle: "better write clear code than fancy code"

What Base Class Provides:

Lazy loading infrastructure
Device detection and management
PEFT/LoRA integration hooks
Test patterns (mocking, integration marks)

What Models Implement:

_load_pretrained_model() - model-specific loading
predict() - forecasting logic
Optional: covariate handling, multivariate support

Community Concern Addressed:
daidahao's unified base class request satisfied via Template Method pattern without over-abstraction.
5. Native Quantile Support

Decision: Use Darts quantile naming convention q{value:.3f} (e.g., q0.100, q0.500, q0.900).
Rationale:
Both models use native quantile heads that produce direct multi-step quantile forecasts in a single forward pass:

TimesFM 2.5: 10 fixed quantile levels [0.0, 0.1, 0.2, ..., 0.9]
Chronos 2: 21 fixed quantile levels [0.01, 0.05, 0.1, 0.2, ..., 0.9, 0.95, 0.99]

Chronos 2's richer quantile grid (21 vs 10) includes extreme quantiles (0.01, 0.99) for improved coverage of rare events, enhancing applicability to anomaly detection and risk-aware forecasting.
Both models expose quantiles via standard Darts API:

Use quantile_timeseries() to access specific quantiles
Follow established naming convention from issue #2933
Fixed quantile sets determined by model architecture

Benefits:

Efficient: Single forward pass for all quantiles
Deterministic: Same inputs produce same quantile forecasts
Standard Darts interface for consuming quantiles
Chronos 2's extreme quantiles enable tail risk analysis

Alternatives Considered:

Quantile registry system - rejected as over-engineering after maintainer feedback
Force uniform quantile sets across models - rejected because loses model-specific strengths

6. PEFT/LoRA for Fine-Tuning

Decision: Support parameter-efficient fine-tuning via PEFT library for models that allow it.
Rationale:
Fine-tuning foundation models on domain-specific data improves accuracy, but full fine-tuning is:

Computationally expensive
Memory intensive
Risks catastrophic forgetting

PEFT/LoRA fine-tunes <1% of parameters while preserving pre-trained knowledge.
Implementation:

peft_utils.py: Adapter management, trainable parameter tracking
Dual-path fit(): Routes to _train_with_peft() when lora_config provided
Models opt-in by implementing PEFT integration

Current Support:

Chronos 2: Full PEFT support
TimesFM: No upstream fine-tuning API

Future:
New models inherit PEFT infrastructure automatically if upstream supports it.
Architecture

Class Hierarchy

GlobalForecastingModel (existing)
    └── FoundationForecastingModel (new base class)
            ├── TimesFMModel
            └── ChronosModel

Base Class Responsibilities

class FoundationForecastingModel(GlobalForecastingModel):
    """Base class for foundation models with lazy loading and device management."""

    def __init__(self, device: Optional[str] = None, lora_config: Optional[Dict] = None):
        """Initialize without downloading model weights."""

    @property
    def model(self):
        """Lazy load pre-trained model on first access."""

    @abstractmethod
    def _load_pretrained_model(self):
        """Subclasses implement model-specific loading."""

    def fit(self, series, ...):
        """Dual-path: validation (zero-shot) or fine-tuning (PEFT)."""

    @abstractmethod
    def predict(self, n, series, ...):
        """Subclasses implement forecasting logic."""
Model Capabilities Matrix


Capability
TimesFM 2.5
Chronos 2
Design Support


Univariate
✅
✅
Both


Multivariate
❌
✅
Per-model


Past Covariates
❌
✅
Per-model


Future Covariates
❌
✅
Per-model


Quantile Forecasting
✅ (10 levels)
✅ (21 levels)
Native quantile heads


Fine-tuning
❌
✅ (PEFT)
Infrastructure ready


Extension Path

Adding a new foundation model (e.g., Moirai):

Subclass FoundationForecastingModel
Implement _load_pretrained_model() - model-specific loading logic
Implement predict() - forecasting logic
Optional: Override fit() if model supports fine-tuning
Optional: Handle covariates if model supports them

Infrastructure (lazy loading, device management, PEFT hooks, test patterns) works automatically.
Trade-offs

Simplicity vs. Abstraction

Chose: Simplicity

Removed registry patterns and validation helpers
More direct code in model implementations
Easier to understand, review, and extend

Trade-off: Some code duplication between models (acceptable for clarity)
API Consistency vs. Model Differences

Chose: Expose differences where meaningful

Not all models support covariates (reflect in API)
Not all models support fine-tuning (optional feature)
Different quantile generation methods (both valid)

Trade-off: Users must understand model capabilities (documentation addresses this)
Zero-Shot Semantics vs. Darts Patterns

Chose: Dual-path fit()

Optional for zero-shot (validation only)
Required for fine-tuning (actual training)

Trade-off: fit() semantics differ from traditional models (documentation clarifies)
Open Questions


Repository structure: Is darts/models/forecasting/foundation/ the right location?
Dependencies: Should we have darts[foundation] or keep separate darts[timesfm]/darts[chronos]?
Documentation placement: Where should design docs live long-term?

References


Issue #2933 - Community discussion
Darts Contributing Guidelines - Design principles
TimesFM Paper - Google's foundation model
Chronos 2 Paper - Amazon's foundation model
PEFT Library - Parameter-efficient fine-tuning
Capability	TimesFM 2.5	Chronos 2	Design Support
Univariate	✅	✅	Both
Multivariate	❌	✅	Per-model
Past Covariates	❌	✅	Per-model
Future Covariates	❌	✅	Per-model
Quantile Forecasting	✅ (10 levels)	✅ (21 levels)	Native quantile heads
Fine-tuning	❌	✅ (PEFT)	Infrastructure ready
No results found