Issue: #2933: Foundation Models Integration Last Updated: November 5, 2025
Time series foundation models (TSFMs) like TimesFM, Chronos 2, Moirai, and TimeGPT each provide different APIs. Users wanting to compare models or switch between them must reimplement evaluation pipelines for each model. This creates friction and limits experimentation.
As @hrzn noted: "Darts could be a great neutral ground for multiple TSFMs under a unified API."
- Unified API: Single interface for zero-shot forecasting, covariates, and probabilistic predictions across all foundation models
- Leverage existing infrastructure: Historical forecasts, backtesting, ensembles work automatically
- Zero breaking changes: New functionality, no impact on existing Darts models
- Minimal CI/CD overhead: Tests don't download multi-GB models
- Extensible: Easy to add new foundation models (Moirai, TimeGPT, Lag-Llama)
- Training foundation models from scratch (pre-trained only)
- Supporting non-forecasting foundation models (classification, anomaly detection)
- Replacing existing Darts models (additive, not replacement)
- Abstracting away all model differences (expose capabilities where appropriate)
Decision: Foundation models inherit from GlobalForecastingModel.
Rationale:
Foundation models are semantically "global" models:
- Trained on massive diverse datasets spanning domains
- Transfer patterns learned from billions of time points to new series
- Pre-trained on many series, not fitted to a single local series
Benefits:
historical_forecasts(),backtest(),residuals()work immediately- Ensemble support via
RegressionEnsembleModel - Metrics and evaluation framework integration
- Consistent with PyTorch model hierarchy (requirement)
Alternatives Considered:
- Standalone hierarchy outside
ForecastingModel- rejected because loses Darts infrastructure - Extend
ForecastingModeldirectly - semantically incorrect (foundation models aren't local)
Decision: fit() is optional but enables unified Darts workflows.
Rationale:
Zero-shot foundation models don't require training, but fit() serves two purposes:
- Validation: Check input dimensions, covariate compatibility
- Metadata tracking: Store series properties for consistency checks
- API consistency: Enables
historical_forecasts(),backtest()patterns
When lora_config is provided, fit() performs actual fine-tuning via PEFT/LoRA.
Community Concern Addressed: dennisbader's concern about fit() semantics resolved via dual-path implementation:
- Zero-shot path: validate + track metadata (no training)
- Fine-tuning path: apply PEFT adapters + train
Alternatives Considered:
- Require fit() always - rejected because violates zero-shot semantics
- Skip fit() entirely - rejected because breaks Darts API patterns
- Separate
ZeroShotModelbase class - rejected as over-engineering
Decision: Models download on first property access, not at initialization.
Rationale:
Foundation model weights are 120MB-2GB. Downloading during __init__() would:
- Block CI/CD pipelines
- Prevent model instantiation without network
- Force users to wait even when using mocked tests
Lazy loading via @property delays download until actual use.
Implementation:
@property
def model(self):
if not self._is_loaded:
self._model = self._load_pretrained_model()
self._is_loaded = True
return self._modelDevice Management: Automatic detection: CUDA > MPS > CPU, with manual override support.
Community Concern Addressed: dennisbader's CI/CD concern resolved - tests use mocks, never download models.
Decision: FoundationForecastingModel provides essential infrastructure only.
Rationale:
Initial implementation had registry patterns, validation helpers, and complex abstractions. Following maintainer feedback, these were removed in favor of:
- Minimal base class (lazy loading, device management, PEFT hooks)
- Model-specific implementations self-contained
- Direct code over abstraction layers
Benefits:
- Easier to understand and maintain
- Less code to review
- Clearer extension path for new models
- Follows Darts principle: "better write clear code than fancy code"
What Base Class Provides:
- Lazy loading infrastructure
- Device detection and management
- PEFT/LoRA integration hooks
- Test patterns (mocking, integration marks)
What Models Implement:
_load_pretrained_model()- model-specific loadingpredict()- forecasting logic- Optional: covariate handling, multivariate support
Community Concern Addressed: daidahao's unified base class request satisfied via Template Method pattern without over-abstraction.
Decision: Use Darts quantile naming convention q{value:.3f} (e.g., q0.100, q0.500, q0.900).
Rationale:
Both models use native quantile heads that produce direct multi-step quantile forecasts in a single forward pass:
- TimesFM 2.5: 10 fixed quantile levels [0.0, 0.1, 0.2, ..., 0.9]
- Chronos 2: 21 fixed quantile levels [0.01, 0.05, 0.1, 0.2, ..., 0.9, 0.95, 0.99]
Chronos 2's richer quantile grid (21 vs 10) includes extreme quantiles (0.01, 0.99) for improved coverage of rare events, enhancing applicability to anomaly detection and risk-aware forecasting.
Both models expose quantiles via standard Darts API:
- Use
quantile_timeseries()to access specific quantiles - Follow established naming convention from issue #2933
- Fixed quantile sets determined by model architecture
Benefits:
- Efficient: Single forward pass for all quantiles
- Deterministic: Same inputs produce same quantile forecasts
- Standard Darts interface for consuming quantiles
- Chronos 2's extreme quantiles enable tail risk analysis
Alternatives Considered:
- Quantile registry system - rejected as over-engineering after maintainer feedback
- Force uniform quantile sets across models - rejected because loses model-specific strengths
Decision: Support parameter-efficient fine-tuning via PEFT library for models that allow it.
Rationale:
Fine-tuning foundation models on domain-specific data improves accuracy, but full fine-tuning is:
- Computationally expensive
- Memory intensive
- Risks catastrophic forgetting
PEFT/LoRA fine-tunes <1% of parameters while preserving pre-trained knowledge.
Implementation:
peft_utils.py: Adapter management, trainable parameter tracking- Dual-path fit(): Routes to
_train_with_peft()whenlora_configprovided - Models opt-in by implementing PEFT integration
Current Support:
- Chronos 2: Full PEFT support
- TimesFM: No upstream fine-tuning API
Future: New models inherit PEFT infrastructure automatically if upstream supports it.
GlobalForecastingModel (existing)
└── FoundationForecastingModel (new base class)
├── TimesFMModel
└── ChronosModel
class FoundationForecastingModel(GlobalForecastingModel):
"""Base class for foundation models with lazy loading and device management."""
def __init__(self, device: Optional[str] = None, lora_config: Optional[Dict] = None):
"""Initialize without downloading model weights."""
@property
def model(self):
"""Lazy load pre-trained model on first access."""
@abstractmethod
def _load_pretrained_model(self):
"""Subclasses implement model-specific loading."""
def fit(self, series, ...):
"""Dual-path: validation (zero-shot) or fine-tuning (PEFT)."""
@abstractmethod
def predict(self, n, series, ...):
"""Subclasses implement forecasting logic."""| Capability | TimesFM 2.5 | Chronos 2 | Design Support |
|---|---|---|---|
| Univariate | ✅ | ✅ | Both |
| Multivariate | ❌ | ✅ | Per-model |
| Past Covariates | ❌ | ✅ | Per-model |
| Future Covariates | ❌ | ✅ | Per-model |
| Quantile Forecasting | ✅ (10 levels) | ✅ (21 levels) | Native quantile heads |
| Fine-tuning | ❌ | ✅ (PEFT) | Infrastructure ready |
Adding a new foundation model (e.g., Moirai):
- Subclass
FoundationForecastingModel - Implement
_load_pretrained_model()- model-specific loading logic - Implement
predict()- forecasting logic - Optional: Override
fit()if model supports fine-tuning - Optional: Handle covariates if model supports them
Infrastructure (lazy loading, device management, PEFT hooks, test patterns) works automatically.
Chose: Simplicity
- Removed registry patterns and validation helpers
- More direct code in model implementations
- Easier to understand, review, and extend
Trade-off: Some code duplication between models (acceptable for clarity)
Chose: Expose differences where meaningful
- Not all models support covariates (reflect in API)
- Not all models support fine-tuning (optional feature)
- Different quantile generation methods (both valid)
Trade-off: Users must understand model capabilities (documentation addresses this)
Chose: Dual-path fit()
- Optional for zero-shot (validation only)
- Required for fine-tuning (actual training)
Trade-off: fit() semantics differ from traditional models (documentation clarifies)
- Repository structure: Is
darts/models/forecasting/foundation/the right location? - Dependencies: Should we have
darts[foundation]or keep separatedarts[timesfm]/darts[chronos]? - Documentation placement: Where should design docs live long-term?
- Issue #2933 - Community discussion
- Darts Contributing Guidelines - Design principles
- TimesFM Paper - Google's foundation model
- Chronos 2 Paper - Amazon's foundation model
- PEFT Library - Parameter-efficient fine-tuning