Skip to content

Instantly share code, notes, and snippets.

@dannon
Created February 24, 2026 13:20
Show Gist options
  • Select an option

  • Save dannon/fee3b50fa79888928f1c94cce119f88f to your computer and use it in GitHub Desktop.

Select an option

Save dannon/fee3b50fa79888928f1c94cce119f88f to your computer and use it in GitHub Desktop.
Triage: galaxyproject/galaxy #21867 - Migrate ssh file source plugin to fsspec

Issue #21867: Migrate ssh file source plugin to fsspec

Repository: galaxyproject/galaxy State: OPEN Author: davelopez (David Lopez) Created: 2026-02-17 Labels: area/backend Comments: 0 Reactions: None Project: Galaxy Dev - weeklies (Triage/Discuss)

Issue Body

Summary Migrate the Galaxy SSH (SFTP) file source plugin from the deprecated fs backend to fsspec, using fsspec.implementations.sftp.SFTPFileSystem.

Relevant documentation:

Context

This issue is part of a coordinated series of fsspec migration issues (#21865-#21869), all filed by davelopez on 2026-02-17:

Issue Plugin Target fsspec Implementation State
#21865 Dropbox dropboxdrivefs.DropboxDriveFileSystem Open
#21866 FTP fsspec.implementations.ftp.FTPFileSystem Open
#21867 SSH (SFTP) fsspec.implementations.sftp.SFTPFileSystem Open
#21868 Google Drive gdrive_fsspec.GoogleDriveFileSystem Open
#21869 WebDAV webdav4.fsspec.WebdavFileSystem Open

Related Active PRs

  • PR #21646 (OPEN): "Improvements for ssh file sources" by bernt-matthias - Modifies the existing SSH plugin (still on PyFilesystem2), adds template support, makes user mandatory, disables keepalive messages. This PR will need to be coordinated with the fsspec migration.

Code Research: Issue #21867 - Migrate SSH file source plugin to fsspec

Current SSH Plugin Implementation

File: lib/galaxy/files/sources/ssh.py (72 lines)

The current SSH file source plugin is a minimal implementation that extends PyFilesystem2FilesSource:

  • Import: Uses fs.sshfs.sshfs.SSHFS from the fs.sshfs package (PyFilesystem2 ecosystem)
  • Configuration classes:
    • SshFileSourceTemplateConfiguration - template config with TemplateExpansion support
    • SshFileSourceConfiguration - resolved config
    • Fields: host, user, passwd, pkey, timeout (10), port (22), compress (False), config_path (~/.ssh/config), path
  • Plugin class: SshFilesSource(PyFilesystem2FilesSource)
    • plugin_type = "ssh"
    • required_module = SSHFS
    • required_package = "fs.sshfs"
    • Only overrides _open_fs(): creates an SSHFS handle with config params, optionally opens a subdirectory via handle.opendir(config.path)

The Old Base Class: PyFilesystem2FilesSource

File: lib/galaxy/files/sources/_pyfilesystem2.py (141 lines)

Provides the shared infrastructure that all PyFilesystem2 plugins use:

  • _list() -- directory listing with pagination, search, recursive walk
  • _realize_to() -- download files
  • _write_from() -- upload files
  • Uses fs.base.FS context manager pattern (with self._open_fs() as h:)

Still using PyFilesystem2 (12 plugins): azure, posix, rspace, ssh, webdav, ftp, googledrive, onedata, anvil, basespace, dropbox, _pyfilesystem2 base

The New Base Class: FsspecFilesSource

File: lib/galaxy/files/sources/_fsspec.py (324 lines)

The fsspec base class provides equivalent functionality with additional features:

  • supports_pagination = True, supports_search = True, supports_sorting = False
  • Cache options: use_listings_cache, listings_expiry_time, max_paths
  • _list() -- with recursive walk, glob-based search, pagination
  • _realize_to() -- uses fs.get_file()
  • _write_from() -- uses fs.put_file()
  • Hook methods for subclasses: _adapt_entry_path(), _to_filesystem_path(), _extract_timestamp(), _get_file_hashes()
  • _open_fs() is abstract and takes (context, cache_options) -- note the extra cache_options parameter vs PyFilesystem2

Already using fsspec (8 plugins): s3fs, azureflat, googlecloudstorage, huggingface, ascp, temp, memory, _fsspec base

Key Differences: _open_fs() Signature

Old (PyFilesystem2):

def _open_fs(self, context: FilesSourceRuntimeContext[TResolvedConfig]) -> FS:

New (fsspec):

def _open_fs(self, context: FilesSourceRuntimeContext[FsspecResolvedConfigurationType], cache_options: CacheOptionsDictType) -> AbstractFileSystem:

The fsspec version adds a cache_options dict parameter.

Target fsspec Implementation: SFTPFileSystem

The target is fsspec.implementations.sftp.SFTPFileSystem, which is a built-in fsspec implementation. Key characteristics:

  • Built into fsspec itself (no extra package needed beyond fsspec + paramiko)
  • Uses paramiko under the hood
  • Both fsspec and paramiko are already core Galaxy dependencies
  • Constructor accepts: host, port, username, password, known_hosts, etc.
  • Important parameter name change: user -> username, passwd -> password (paramiko convention)

Dependencies

Current:

  • fs.sshfs -- listed in lib/galaxy/dependencies/conditional-requirements.txt as fs.sshfs # type: ssh
  • paramiko -- core dependency (already in pyproject.toml)

After migration:

  • fsspec -- core dependency (already in pyproject.toml: fsspec>=2025.7.0)
  • paramiko -- core dependency (already in pyproject.toml)
  • fs.sshfs -- can be removed from conditional-requirements.txt

Files That Would Need Changes

  1. lib/galaxy/files/sources/ssh.py -- Primary file: rewrite to extend FsspecFilesSource instead of PyFilesystem2FilesSource
  2. lib/galaxy/dependencies/conditional-requirements.txt -- Remove fs.sshfs # type: ssh line
  3. lib/galaxy/dependencies/pinned-requirements.txt -- Remove fs.sshfs pinned version
  4. packages/app/setup.cfg -- May need update if ssh extra is listed there

Existing fsspec Migration Patterns to Follow

Best reference: S3 migration (PR #20794) -- 4 files, clean migration:

  1. Change base class from PyFilesystem2FilesSource to FsspecFilesSource
  2. Change config base classes from BaseFileSourceConfiguration to FsspecBaseFileSourceConfiguration
  3. Update _open_fs() to accept cache_options parameter and pass it through
  4. Import from _fsspec instead of _pyfilesystem2
  5. Update required_module and required_package
  6. Update dependency files

Simpler reference for this specific case: Since fsspec.implementations.sftp.SFTPFileSystem is built into fsspec (not a separate package), the migration is especially clean. No new conditional dependency is needed.

Coordination Concern: PR #21646

PR #21646 ("Improvements for ssh file sources") is currently open and modifies the same ssh.py file. It:

  • Makes user mandatory
  • Disables keepalive messages
  • Adds template support (lib/galaxy/files/templates/examples/ssh.yml, template models)
  • Adds test files

This PR should ideally be merged first (or the improvements incorporated into the fsspec migration). The functional changes (mandatory user, keepalive disable, template support) are orthogonal to the backend swap and should carry over.

No Existing Tests

There are no existing automated tests specifically for the SSH file source plugin. PR #21646 adds basic test scaffolding (test/unit/files/test_ssh.py).

Demand Research: Issue #21867 - Migrate SSH file source plugin to fsspec

Reactions and Engagement

  • Thumbs up / reactions: None (0 reactions of any kind)
  • Comments: 0
  • Watchers/subscribers: Not tracked beyond default

Community Demand Signals

Direct Demand

The issue itself has zero community engagement (no reactions, no comments). It was filed as part of a systematic batch of 5 migration issues (#21865-#21869) by davelopez, a core Galaxy developer. This represents an internal engineering initiative rather than a community-requested feature.

Related Work

  • PR #21646 (bernt-matthias, open since 2026-01-22): "Improvements for ssh file sources" -- This is active work on the SSH file source plugin, demonstrating that at least two core developers are working on or thinking about the SSH file source. The PR makes improvements to the existing PyFilesystem2-based implementation, including:
    • Disabling keepalive messages (was causing issues in production)
    • Making user mandatory instead of defaulting to the current system user
    • Adding template support for SSH file sources
    • This PR touches the same file that would be rewritten for the fsspec migration

Sibling Issues (fsspec migration series)

All five sibling issues have identical engagement patterns: 0 reactions, 0 comments. They were all filed on the same day (2026-02-17) as a coordinated effort.

Issue Plugin Reactions Comments
#21865 Dropbox 0 0
#21866 FTP 0 0
#21867 SSH 0 0
#21868 Google Drive 0 0
#21869 WebDAV 0 0

Historical Context: Successful fsspec Migrations

Previous fsspec migrations have been merged without controversy:

  • PR #20698 (2025-08-20): Original fsspec base implementation -- merged
  • PR #20794 (2025-08-26): S3 file source to fsspec -- merged (4 files changed)
  • PR #21590 (2026-01-19): Google Cloud Storage to fsspec -- merged (8 files changed)

These were all driven by the same developer (davelopez) and went through smoothly, suggesting the pattern is well-established.

Demand Assessment

User demand: LOW -- No external community requests; this is purely an internal engineering modernization effort.

Developer demand: MEDIUM -- Two core developers are actively working on the SSH file source plugin. The fsspec migration is part of a systematic effort to deprecate the PyFilesystem2 backend across all file source plugins.

Strategic demand: HIGH -- Removing dependence on the deprecated PyFilesystem2 library reduces maintenance burden and technical debt. The SSH plugin is one of 12 remaining plugins on the old stack, while 8 have already been migrated to fsspec.

Importance Assessment: Issue #21867 - Migrate SSH file source plugin to fsspec

User Demand: LOW

  • Zero reactions, zero comments on the issue
  • No community requests for this specific change
  • This is an internal engineering initiative, not user-driven
  • SSH file source works fine on the current PyFilesystem2 backend from a user perspective

Strategic Value: HIGH

  • Part of a systematic deprecation effort: This is one of 5 coordinated issues (#21865-#21869) to migrate all remaining PyFilesystem2 plugins to fsspec. 8 plugins have already been migrated successfully. Completing this series would bring the total to 13 of ~20 plugins on fsspec.
  • Dependency reduction: Removes the fs.sshfs conditional dependency. Since fsspec and paramiko are already core dependencies, no new packages are needed -- this is a net reduction in dependency count.
  • Unified architecture: Having all file source plugins on the same base class (FsspecFilesSource) simplifies maintenance, testing, and documentation. The fsspec base class is more feature-rich (cache options, better pagination, search with glob patterns).
  • PyFilesystem2 ecosystem health: The PyFilesystem2 project (fs) and its SSH extension (fs.sshfs) are less actively maintained than fsspec. Moving away reduces risk of being stuck on an unmaintained dependency.
  • Consistency with prior decisions: The fsspec base class was specifically designed and merged (PR #20698, Aug 2025) to replace PyFilesystem2. This migration is the natural continuation of that architectural decision.

Effort Estimate: SMALL

  • Lines of code: The current SSH plugin is only 72 lines. The migration is primarily a base class swap with minor parameter renaming.
  • Pattern is well-established: S3 (PR #20794, 4 files) and GCS (PR #21590, 8 files) migrations have been completed successfully and serve as direct templates.
  • No new dependencies: Unlike some migrations (Dropbox, Google Drive), the SSH/SFTP implementation is built into fsspec itself.
  • Estimated files to change: 2-4 files (ssh.py, conditional-requirements.txt, possibly pinned-requirements.txt and setup.cfg)
  • Estimated effort: 2-4 hours for an experienced contributor familiar with the codebase

Risk Assessment

Low Risks

  • Breaking existing configurations: The plugin_type remains "ssh", so existing Galaxy configurations should continue to work without changes. Configuration field names (host, user, passwd, etc.) should be preserved at the Galaxy config layer even though the underlying fsspec API uses different names (username, password).
  • Behavioral differences: fsspec's SFTPFileSystem may have slightly different behavior than fs.sshfs in edge cases (symlinks, permissions, error handling). Manual testing against a real SSH server is advisable.

Medium Risks

  • Coordination with PR #21646: The open PR by bernt-matthias modifies the same file and adds template support. Merging order matters -- ideally PR #21646 goes first, then the fsspec migration incorporates those improvements.
  • SSH key handling: The current implementation supports pkey (private key) and config_path (~/.ssh/config). Need to verify that fsspec's SFTPFileSystem supports equivalent functionality. fsspec uses paramiko directly, which supports key-based auth, but the exact parameter mapping may differ.
  • config_path support: The current plugin supports config_path for SSH config file parsing. fsspec's SFTPFileSystem may not have a direct equivalent -- paramiko can read SSH config, but it may require explicit handling.
  • compress option: The current plugin supports a compress flag. Need to verify fsspec/paramiko support for this.

Mitigation

  • All risks are manageable through careful parameter mapping and manual testing
  • The migration can preserve the Galaxy-level configuration schema while adapting to fsspec's paramiko-based API internally

Recommendation: PRIORITIZE NOW

Rationale:

  1. This is small, well-understood work with a proven pattern (S3, GCS migrations as templates)
  2. Strategic value is high -- it advances the systematic deprecation of PyFilesystem2
  3. The dependency removal (fs.sshfs) is a net simplification
  4. With PR #21646 already in flight, there is active momentum on the SSH file source
  5. Doing all 5 fsspec migrations (#21865-#21869) together would be most efficient since the pattern is the same
  6. No new dependencies needed -- this is the cleanest of the 5 migrations

Suggested approach:

  • Merge PR #21646 first (SSH improvements on current stack)
  • Then do the fsspec migration, incorporating the improvements from PR #21646
  • Alternatively, coordinate with bernt-matthias to do the fsspec migration in PR #21646 itself, combining both efforts

Implementation Plan: Issue #21867 - Migrate SSH file source plugin to fsspec

Recommended Approach

Rewrite lib/galaxy/files/sources/ssh.py to extend FsspecFilesSource instead of PyFilesystem2FilesSource, using fsspec.implementations.sftp.SFTPFileSystem as the underlying filesystem. Follow the established pattern from the S3 migration (PR #20794).

Prerequisites

  1. Merge or coordinate with PR #21646 ("Improvements for ssh file sources" by bernt-matthias). This PR adds template support, makes user mandatory, and disables keepalive. Either:
    • Merge PR #21646 first, then do the fsspec migration on top, OR
    • Incorporate PR #21646 improvements into the fsspec migration PR

Affected Files

Primary Changes

1. lib/galaxy/files/sources/ssh.py -- Complete rewrite (~70-90 lines)

Key changes:

  • Import SFTPFileSystem from fsspec.implementations.sftp instead of SSHFS from fs.sshfs
  • Config classes change base from BaseFileSourceConfiguration to FsspecBaseFileSourceConfiguration (and template equivalent)
  • Plugin class changes base from PyFilesystem2FilesSource to FsspecFilesSource
  • _open_fs() gets new signature with cache_options parameter
  • Map Galaxy config fields to fsspec/paramiko fields (user->username, passwd->password)
  • Handle root path via _to_filesystem_path() and _adapt_entry_path() overrides instead of opendir()

Key implementation details for _open_fs():

  • Create SFTPFileSystem(host=config.host, port=config.port, username=config.user, password=config.passwd, **cache_options)
  • Handle pkey (private key) -- paramiko supports this but may need to convert string to key object
  • Handle config_path -- may need to use paramiko.SSHConfig to parse the SSH config file
  • Handle compress -- pass to paramiko Transport
  • Handle path (root directory) via _to_filesystem_path() override

2. lib/galaxy/dependencies/conditional-requirements.txt

  • Remove: fs.sshfs # type: ssh
  • No new line needed (fsspec and paramiko are core deps)

3. lib/galaxy/dependencies/pinned-requirements.txt

  • Remove fs.sshfs pinned entry (if present)

4. packages/app/setup.cfg

  • Update ssh extras_require if it references fs.sshfs

Parameter Mapping

Galaxy Config Current (fs.sshfs) Target (fsspec SFTP)
host host host
port port port
user user username
passwd passwd password
pkey pkey (paramiko key handling)
timeout timeout timeout
compress compress (paramiko Transport option)
config_path config_path (manual SSH config parsing)
path opendir(path) _to_filesystem_path()

Implementation Steps

Step 1: Parameter Mapping Research

Verify the exact parameter mapping and test against fsspec's SFTPFileSystem documentation.

Step 2: Implement the Migration

  1. Update imports (drop fs.sshfs, add fsspec.implementations.sftp)
  2. Change config base classes to FsspecBaseFileSource*Configuration
  3. Change plugin base class to FsspecFilesSource
  4. Rewrite _open_fs() with new signature and parameter mapping
  5. Add _to_filesystem_path() and _adapt_entry_path() overrides for root path
  6. Handle pkey and config_path edge cases

Step 3: Handle Root Path

The current implementation uses handle.opendir(config.path) to scope to a subdirectory. With fsspec:

  • Override _to_filesystem_path() to prepend the configured root path
  • Override _adapt_entry_path() to strip the root path from entries
  • Same pattern used by S3 with bucket paths

Step 4: Update Dependencies

  • Remove fs.sshfs from conditional-requirements.txt
  • Update pinned-requirements.txt
  • Update packages/app/setup.cfg if needed

Testing Strategy

Unit Tests

  • Add tests in test/unit/files/test_ssh.py (coordinate with PR #21646 which creates this file)
  • Mock the SFTPFileSystem to test configuration parsing, root path handling, error handling

Integration Tests

  • Manual testing against a real SSH/SFTP server (Docker-based SFTP server recommended)
  • Test: list, download, upload, password auth, key auth, root path scoping, error cases

Regression Testing

  • Verify existing Galaxy SSH file source configurations still work unchanged
  • plugin_type "ssh" remains the same -- Galaxy config files need no updates

Estimated Timeline

  • Research/prep: 1 hour
  • Implementation: 2-3 hours
  • Testing: 1-2 hours
  • Total: 4-6 hours for an experienced contributor

Risk Mitigation

  • Keep Galaxy-level configuration schema identical to avoid breaking existing setups
  • Test pkey and config_path handling carefully as these are the most complex parameter mappings
  • Consider keeping fs.sshfs as a fallback temporarily (probably not needed)

Summary: Issue #21867 - Migrate SSH file source plugin to fsspec

Top-Line Summary

Issue #21867 requests migrating Galaxy's SSH (SFTP) file source plugin from the deprecated PyFilesystem2-based fs.sshfs backend to fsspec.implementations.sftp.SFTPFileSystem. This is part of a coordinated series of 5 migration issues (#21865-#21869) filed by davelopez to systematically move all remaining PyFilesystem2 file source plugins to the fsspec framework. The migration is straightforward: the existing SSH plugin is only 72 lines, the target SFTPFileSystem is built into fsspec (no new dependencies needed since both fsspec and paramiko are already core Galaxy dependencies), and the pattern has been proven by successful S3 and GCS migrations. The recommended approach is to coordinate with the existing PR #21646 (SSH improvements by bernt-matthias), then rewrite the plugin to extend FsspecFilesSource with appropriate parameter mapping from Galaxy's config schema to paramiko's API conventions.

Importance Assessment Summary

Dimension Rating Notes
User demand LOW No community requests; internal engineering initiative
Strategic value HIGH Part of systematic PyFilesystem2 deprecation; net dependency reduction
Effort estimate SMALL ~4-6 hours; 2-4 files; proven pattern from S3/GCS migrations
Risk LOW-MEDIUM Main risks: parameter mapping for pkey/config_path; coordination with PR #21646
Recommendation PRIORITIZE NOW Small effort, high strategic value, proven pattern

Key Questions for Group Discussion

  1. Coordination with PR #21646: bernt-matthias has an open PR improving the SSH file source (template support, mandatory user, keepalive fixes). Should that merge first and then the fsspec migration follow, or should the two be combined into a single PR?

  2. Batch migration strategy: All 5 fsspec migration issues (#21865-#21869) follow the same pattern. Should they be done as a single batch PR, individual PRs, or assigned to different contributors? Doing them together would be most efficient.

  3. SSH config file support: The current plugin supports config_path for reading ~/.ssh/config. fsspec's SFTPFileSystem doesn't directly expose this. Is this feature actively used? If so, it needs explicit paramiko.SSHConfig integration. If not, it could be deprecated.

  4. Private key handling: The current pkey parameter accepts a string. Need to clarify: is this a file path or key content? The fsspec/paramiko API may need different handling depending on the answer.

  5. PyFilesystem2 retirement timeline: After this batch of 5, there will still be ~7 plugins on PyFilesystem2 (posix, azure, rspace, onedata, anvil, basespace). Is there a target date for fully retiring PyFilesystem2?

Concerns

Scope Creep

  • The migration should be a pure backend swap. Resist the temptation to add new features (new auth methods, connection pooling, etc.) in the same PR.
  • PR #21646's improvements (template support, mandatory user) are orthogonal and should be handled separately.

Breaking Changes

  • Risk is low if the Galaxy-level configuration schema is preserved unchanged. The plugin_type remains "ssh", so existing file_sources_conf.yml entries continue to work.
  • The main risk area is pkey and config_path parameters, which may map differently to paramiko's API. These need careful testing.

Maintenance Burden

  • Net reduction in maintenance burden: removes the fs.sshfs dependency and consolidates onto the actively-maintained fsspec framework.
  • The fsspec base class (_fsspec.py) is well-designed with clear extension points, making future SSH-specific enhancements easier.
  • No ongoing additional maintenance cost expected.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment