Skip to content

Instantly share code, notes, and snippets.

@dannon
Created February 24, 2026 13:20
Show Gist options
  • Select an option

  • Save dannon/c8c8dafccfc07943dc9e645827f54e80 to your computer and use it in GitHub Desktop.

Select an option

Save dannon/c8c8dafccfc07943dc9e645827f54e80 to your computer and use it in GitHub Desktop.
Triage: galaxyproject/galaxy #21868 - Migrate googledrive file source plugin to fsspec

Issue #21868: Migrate googledrive file source plugin to fsspec

  • URL: galaxyproject/galaxy#21868
  • State: OPEN
  • Author: davelopez (David Lopez)
  • Labels: area/backend
  • Project: Galaxy Dev - weeklies (Triage/Discuss)
  • Created: 2026-02-17T16:42:28Z
  • Updated: 2026-02-17T16:44:34Z
  • Comments: 0
  • Reactions: None
  • Assignees: None
  • Milestone: None

Description

Summary Migrate the Galaxy Google Drive file source plugin from the deprecated fs backend to fsspec, utilizing the gdrive_fsspec.GoogleDriveFileSystem implementation.

Relevant documentation:

Context

This is part of a coordinated series of fsspec migration issues filed by davelopez on the same day:

Issue Title State
#21865 Migrate dropbox file source plugin to fsspec OPEN
#21866 Migrate ftp file source plugin to fsspec OPEN
#21867 Migrate ssh file source plugin to fsspec OPEN
#21868 Migrate googledrive file source plugin to fsspec OPEN
#21869 Migrate webdav file source plugin to fsspec OPEN

All five issues follow the same pattern: replace the deprecated PyFilesystem2 (fs) backend with an fsspec-compatible implementation.

Prior Art

The fsspec base class (FsspecFilesSource) was added in PR #20698 (merged 2025-08-20). Subsequent migrations already completed:

  • PR #20794: S3 file source adapted to fsspec
  • PR #20799: Fix fsspec fs path handling
  • PR #21590: Google Cloud Storage migrated from fs-gcsfs to gcsfs (fsspec) -- merged 2026-01-19

Code Research: Issue #21868

Current Implementation

The Google Drive file source is implemented in:

lib/galaxy/files/sources/googledrive.py (58 lines)

from fs.googledrivefs.googledrivefs import GoogleDriveFS
from google.oauth2.credentials import Credentials
from ._pyfilesystem2 import PyFilesystem2FilesSource

class GoogleDriveFilesSource(PyFilesystem2FilesSource[...]):
    plugin_type = "googledrive"
    required_module = GoogleDriveFS
    required_package = "fs.googledrivefs"

    def _open_fs(self, context):
        credentials = Credentials(token=context.config.access_token)
        handle = GoogleDriveFS(credentials)
        return handle

Key characteristics:

  • Inherits from PyFilesystem2FilesSource (the deprecated base class in _pyfilesystem2.py)
  • Uses fs.googledrivefs.googledrivefs.GoogleDriveFS as the filesystem backend
  • Authentication via OAuth2 access token only (via google.oauth2.credentials.Credentials)
  • Configuration accepts access_token with alias choices: oauth2_access_token, accessToken, access_token
  • Very simple plugin -- only 58 lines total

Deprecated Base Class: PyFilesystem2FilesSource

lib/galaxy/files/sources/_pyfilesystem2.py (141 lines)

This is the base class being replaced. It uses the fs (PyFilesystem2) library and provides:

  • _list() with pagination via filterdir()
  • _realize_to() via fs.download()
  • _write_from() via fs.upload() with makedirs()
  • Search via filterdir() with wildcard patterns
  • Context manager pattern for filesystem handles

Target Base Class: FsspecFilesSource

lib/galaxy/files/sources/_fsspec.py (324 lines)

The modern replacement base class. Key differences from PyFilesystem2:

  1. _open_fs() signature adds cache_options: CacheOptionsDictType parameter
  2. Configuration classes inherit from FsspecBaseFileSourceTemplateConfiguration / FsspecBaseFileSourceConfiguration (which include cache options: use_listings_cache, listings_expiry_time, max_paths)
  3. Provides: pagination, search via glob, recursive listing with MAX_ITEMS_LIMIT (1000), timestamp extraction, hash extraction hooks
  4. No context manager pattern (fsspec doesn't use with for filesystem handles)
  5. Path adaptation hooks: _adapt_entry_path() and _to_filesystem_path()

Already-Migrated fsspec Sources (Reference Patterns)

Sources currently using FsspecFilesSource:

Source Library Complexity Auth Pattern
s3fs.py s3fs Moderate (bucket paths) Key/Secret
googlecloudstorage.py gcsfs Moderate (bucket + OAuth) OAuth2/ServiceAccount/Anon
huggingface.py huggingface_hub Moderate (repo listing) Token
azureflat.py adlfs Moderate (container paths) Connection string/SAS
ascp_fsspec.py custom High (custom fsspec impl) SSH key
temp.py / memory.py fsspec built-in Simple (testing only) None

The Google Cloud Storage source (googlecloudstorage.py) is the closest reference since it also handles Google OAuth2 credentials. It demonstrates passing OAuth tokens as a dict:

token = {
    "access_token": config.token,
    "refresh_token": config.refresh_token,
    "client_id": config.client_id,
    "client_secret": config.client_secret,
    "token_uri": config.token_uri,
}
fs = GCSFileSystem(project=config.project, token=token, **cache_options)

Target Library: gdrive_fsspec

From github.com/fsspec/gdrive-fsspec:

GoogleDriveFileSystem supports:

  • Service account credentials: GoogleDriveFileSystem(creds=sa_creds, token="service_account")
  • OAuth user credentials: GoogleDriveFileSystem(token="browser") or token="cache"
  • Anonymous access: GoogleDriveFileSystem(token="anon")
  • Standard fsspec interface (ls, get_file, put_file, walk, glob, etc.)

The key question for this migration is how to pass the existing OAuth2 access token to gdrive_fsspec. The library may accept a token dict (like gcsfs) or may need a Google credentials object.

Files That Must Be Modified

Primary changes:

  1. lib/galaxy/files/sources/googledrive.py -- Complete rewrite: swap base class, imports, config classes, and _open_fs implementation.

  2. lib/galaxy/dependencies/conditional-requirements.txt (line 28) -- Replace fs.googledrivefs with gdrive_fsspec.

  3. lib/galaxy/dependencies/__init__.py (line 272-273) -- Rename check_fs_googledrivefs to check_gdrive_fsspec.

Secondary changes:

  1. packages/files/setup.cfg -- Consider adding gdrive_fsspec to test extras.

  2. pyproject.toml -- Add gdrive_fsspec if it should be a default dependency.

  3. test/unit/files/test_googledrive.py -- Update test if config or import changes require it.

  4. test/unit/files/googledrive_file_sources_conf.yml -- Update config keys if necessary.

Potentially affected:

  1. lib/galaxy/files/templates/examples/production_google_drive.yml -- Verify template still works.

  2. lib/galaxy/files/templates/models.py -- OAuth2 configuration model for googledrive; review for compatibility.

  3. client/src/api/schema/schema.ts -- Auto-generated; will update if API schema changes.

Sources Still on PyFilesystem2

Covered by sibling issues (#21865-21869):

  • dropbox.py, ftp.py, ssh.py, webdav.py

Not covered by any current issue:

  • azure.py, posix.py, rspace.py, onedata.py, anvil.py, basespace.py

Demand Research: Issue #21868

Engagement Metrics

  • Reactions on issue: Zero (no thumbs-up or other reactions)
  • Comments: Zero
  • Assignees: None
  • Milestone: None

Related Issues

This is part of a batch of 5 issues filed simultaneously by davelopez on 2026-02-17:

Issue Title Reactions Comments
#21865 Migrate dropbox file source plugin to fsspec 0 0
#21866 Migrate ftp file source plugin to fsspec 0 0
#21867 Migrate ssh file source plugin to fsspec 0 0
#21868 Migrate googledrive file source plugin to fsspec 0 0
#21869 Migrate webdav file source plugin to fsspec 0 0

The broader fsspec migration effort traces back to issue #20415 (referenced in PR #20698).

Duplicate / Related Issues

No duplicate issues found. No community help forum threads requesting this specific migration.

Community Sentiment

This is entirely developer-driven technical debt work. There are no user reports of problems with the current Google Drive implementation that would be solved by this migration. The motivation is architectural: moving away from a deprecated library ecosystem.

Indirect Demand Signals

  • Production usage: Google Drive is available as a configured file source template (production_google_drive.yml) with full OAuth2 integration, suggesting it is actively used in production Galaxy instances.
  • Library health: The PyFilesystem2 (fs) ecosystem is deprecated/less maintained compared to fsspec. The fs.googledrivefs package has limited maintenance activity. The gdrive_fsspec package is under the official fsspec GitHub organization, indicating institutional backing and better long-term support.
  • Ecosystem trend: fsspec has become the de facto standard for filesystem abstraction in the Python data ecosystem (used by pandas, dask, xarray, etc.).

Demand Assessment

LOW direct user demand, HIGH indirect/strategic demand.

This is maintenance-driven debt reduction rather than a user-requested feature. No users have asked for this. However, the risk of staying on deprecated libraries increases over time -- eventual incompatibilities, security issues, or Python version support gaps could force an urgent migration later. Proactive migration while the effort is small is the prudent approach.

Importance Assessment: Issue #21868

User Demand: LOW

  • Zero reactions, zero comments on the issue
  • No community discussion threads requesting this migration
  • Developer-driven technical debt initiative, not user-reported
  • However, Google Drive is used in production (has OAuth2 template configuration in production_google_drive.yml)

Strategic Value: HIGH

  • Part of a planned migration of all file sources from deprecated PyFilesystem2 to fsspec (#21865-#21869)
  • The fs (PyFilesystem2) ecosystem is deprecated and less maintained than fsspec
  • fsspec is the modern standard for filesystem abstraction in the Python ecosystem (used by pandas, dask, xarray, etc.)
  • Staying on deprecated libraries increases maintenance risk and security exposure over time
  • Completing this migration (along with the other 4 sibling issues) unblocks eventual removal of the _pyfilesystem2.py base class and the fs dependency
  • The gdrive_fsspec package is under the official fsspec GitHub organization, indicating better institutional support
  • Consistent architecture: having all file sources on the same base class simplifies maintenance, testing, and feature development (e.g., cache options, pagination improvements apply to all sources at once)

Effort Estimate: SMALL

  • Current implementation is only 58 lines
  • The migration pattern is well-established with 3+ sources already migrated successfully
  • PR #21590 (GCS migration, the closest analog) was +135/-171 lines, touching 8 files
  • The FsspecFilesSource base class handles all the heavy lifting
  • Main work: swap base class, swap imports, adjust _open_fs to use gdrive_fsspec, update dependency declarations
  • Estimated: 100-150 additions, 50-70 deletions, 6-8 files

Risk Assessment: LOW-MEDIUM

Risks:

  1. OAuth2 credential mapping (MEDIUM risk): The current implementation uses google.oauth2.credentials.Credentials(token=access_token) passed to GoogleDriveFS(credentials). The new gdrive_fsspec library uses a different authentication model (token parameter, creds dict). This mapping needs careful implementation and testing to ensure backward compatibility.

  2. Configuration compatibility (LOW risk): The YAML configuration format uses token, refresh_token, token_uri, client_id, client_secret fields. These must continue to work. Since the configuration models are separate from the filesystem implementation, the risk is manageable.

  3. Library maturity (LOW risk): gdrive_fsspec is relatively new but is part of the official fsspec organization. Its API stability and completeness for Galaxy's use cases (ls, get_file, put_file, walk, glob) should be verified.

  4. Testing gap (LOW risk): The existing test requires live Google Drive credentials (GALAXY_TEST_GOOGLE_DRIVE_ACCESS_TOKEN), making CI verification difficult. However, the fsspec base class itself is well-tested, and the plugin surface area is small.

Mitigations:

  • PR #21590 (GCS migration) provides a proven pattern for Google OAuth2 credential handling in fsspec
  • The fsspec base class is well-tested with memory and temp filesystem backends
  • The scope is small enough that manual testing by a developer with Google Drive access is feasible
  • Can be reverted independently if issues arise

Recommendation: PRIORITIZE NOW

Rationale:

This is low-effort, low-risk work that contributes to an important strategic goal (eliminating the PyFilesystem2 dependency). It follows an established pattern with clear prior art. The five fsspec migration issues (#21865-21869) should ideally be worked as a batch since they all follow the same pattern, and completing the set enables removing the deprecated _pyfilesystem2.py base class entirely.

The Google Drive migration specifically is one of the simpler ones in the batch given:

  • The existing GCS migration as a direct reference for Google OAuth2 handling
  • The small size of the current implementation (58 lines)
  • The straightforward authentication model (single access token)

This is a good candidate for a newer contributor familiar with the codebase patterns, or for batch implementation by the original issue author (davelopez) who designed and implemented the fsspec base class.

Implementation Plan: Issue #21868

Migrate googledrive file source plugin to fsspec

Recommended Approach

Follow the exact pattern established by PR #21590 (GCS migration). The implementation is straightforward: swap the base class from PyFilesystem2FilesSource to FsspecFilesSource, replace the fs.googledrivefs import with gdrive_fsspec, and update the _open_fs method to use the new library's authentication model.

Step-by-Step Plan

Step 1: Update dependency declarations

lib/galaxy/dependencies/conditional-requirements.txt (line 28):

-fs.googledrivefs # type: googledrive
+gdrive_fsspec # type: googledrive

lib/galaxy/dependencies/__init__.py (around line 272):

-    def check_fs_googledrivefs(self):
+    def check_gdrive_fsspec(self):
         return "googledrive" in self.file_sources

Step 2: Rewrite the plugin source

lib/galaxy/files/sources/googledrive.py -- full rewrite:

try:
    from gdrive_fsspec import GoogleDriveFileSystem
except ImportError:
    GoogleDriveFileSystem = None

from typing import (
    Annotated,
    Union,
)

from pydantic import (
    AliasChoices,
    Field,
)

from galaxy.files.models import FilesSourceRuntimeContext
from galaxy.files.sources._fsspec import (
    CacheOptionsDictType,
    FsspecBaseFileSourceConfiguration,
    FsspecBaseFileSourceTemplateConfiguration,
    FsspecFilesSource,
)
from galaxy.util.config_templates import TemplateExpansion

AccessTokenField = Field(
    ...,
    validation_alias=AliasChoices("oauth2_access_token", "accessToken", "access_token"),
)


class GoogleDriveFileSourceTemplateConfiguration(FsspecBaseFileSourceTemplateConfiguration):
    access_token: Annotated[Union[str, TemplateExpansion], AccessTokenField]


class GoogleDriveFilesSourceConfiguration(FsspecBaseFileSourceConfiguration):
    access_token: Annotated[str, AccessTokenField]


class GoogleDriveFilesSource(
    FsspecFilesSource[GoogleDriveFileSourceTemplateConfiguration, GoogleDriveFilesSourceConfiguration]
):
    plugin_type = "googledrive"
    required_module = GoogleDriveFileSystem
    required_package = "gdrive_fsspec"

    template_config_class = GoogleDriveFileSourceTemplateConfiguration
    resolved_config_class = GoogleDriveFilesSourceConfiguration

    def _open_fs(
        self,
        context: FilesSourceRuntimeContext[GoogleDriveFilesSourceConfiguration],
        cache_options: CacheOptionsDictType,
    ):
        if GoogleDriveFileSystem is None:
            raise self.required_package_exception
        # Pass the OAuth2 access token to gdrive_fsspec.
        # NOTE: The exact token format needs verification against the
        # gdrive_fsspec API. It may accept a raw token string, a dict,
        # or require a google credentials object.
        fs = GoogleDriveFileSystem(
            token=context.config.access_token,
            **cache_options,
        )
        return fs


__all__ = ("GoogleDriveFilesSource",)

Key implementation note: The exact way to pass OAuth2 credentials to gdrive_fsspec.GoogleDriveFileSystem needs verification. Options to investigate:

  1. Raw access token string: token=access_token
  2. Token dict (like gcsfs): token={"access_token": ..., "refresh_token": ..., ...}
  3. Google credentials object: creds=Credentials(token=...)

The GCS implementation (googlecloudstorage.py) uses a dict approach. The gdrive_fsspec library documentation suggests it uses token and creds parameters.

Step 3: Update test configuration

test/unit/files/googledrive_file_sources_conf.yml: Review field names. Current format:

- type: googledrive
  id: test1
  doc: Test access to a Google drive.
  token: ${user.preferences['googledrive|access_token']}
  refresh_token: ${user.preferences['googledrive|refresh_token']}
  token_uri: "https://www.googleapis.com/oauth2/v4/token"
  client_id: ${user.preferences['googledrive|client_id']}
  client_secret: ${user.preferences['googledrive|client_secret']}

This may need updates to match the new configuration model.

test/unit/files/test_googledrive.py: Minimal changes expected. The test structure should remain the same; only import paths or config field names might change.

Step 4: Update package configuration

packages/files/setup.cfg: Consider adding gdrive_fsspec to test extras:

[options.extras_require]
test =
    pytest
    gcsfs
    gdrive_fsspec
    s3fs>=2023.1.0

pyproject.toml: Add gdrive_fsspec alongside other fsspec implementations if it should be available by default.

Step 5: Verify template compatibility

lib/galaxy/files/templates/examples/production_google_drive.yml: Verify the template still works with the new implementation. The configuration keys oauth2_client_id and oauth2_client_secret are handled by the template system before reaching the plugin.

lib/galaxy/files/templates/models.py: The GoogleDriveFileSourceConfiguration and GoogleDriveFileSourceTemplateConfiguration models may need updates if the resolved config class interface changes.

Testing Strategy

  1. Automated unit tests: The existing test (test_googledrive.py) requires live credentials. Consider adding a mock-based test using the BaseFileSourceTestSuite pattern established in PR #20698 with a mocked GoogleDriveFileSystem.

  2. Integration testing: Run test_googledrive.py with valid Google Drive credentials:

    GALAXY_TEST_GOOGLE_DRIVE_ACCESS_TOKEN=... GALAXY_TEST_GOOGLE_DRIVE_REFRESH_TOKEN=... pytest test/unit/files/test_googledrive.py
  3. Manual testing checklist:

    • Configure Google Drive file source via template in a running Galaxy instance
    • Browse files and folders in Google Drive
    • Download a file from Google Drive to Galaxy
    • Upload a file from Galaxy to Google Drive (if writable enabled)
    • Verify OAuth2 flow works correctly end-to-end
    • Test with expired/invalid token (verify proper error message)
  4. Regression: Ensure the production_google_drive.yml template configuration continues to work without changes.

Estimated Scope

Based on PR #21590 (GCS migration):

  • Lines changed: ~100-150 additions, ~50-70 deletions
  • Files touched: 6-8 files
  • Time estimate: 2-4 hours for implementation + testing (assuming access to Google Drive credentials)

Dependencies

  • gdrive_fsspec package must be installable from PyPI
  • No blocking Galaxy PRs
  • Can be done independently or in parallel with other fsspec migrations (#21865-21869)

Triage Summary: Issue #21868

Top-Line Summary

Issue #21868 requests migrating Galaxy's Google Drive file source plugin from the deprecated PyFilesystem2 (fs.googledrivefs) backend to the modern fsspec framework (gdrive_fsspec). This is one of five coordinated migration issues (#21865-#21869) filed by davelopez to eliminate Galaxy's dependency on the unmaintained PyFilesystem2 ecosystem. The recommended approach follows the established pattern from PR #21590 (GCS migration) and involves swapping the base class from PyFilesystem2FilesSource to FsspecFilesSource, replacing the fs.googledrivefs import with gdrive_fsspec, and updating the _open_fs method to use the new library's authentication model. This is small, well-scoped work with clear prior art.

Importance Assessment Summary

Dimension Rating
User demand LOW (zero reactions/comments, developer-driven initiative)
Strategic value HIGH (part of systematic PyFilesystem2 deprecation, fsspec is the modern standard)
Effort SMALL (58-line plugin, established migration pattern, ~100-150 lines changed)
Risk LOW-MEDIUM (OAuth2 credential mapping needs verification, integration tests require live credentials)
Recommendation PRIORITIZE NOW -- low effort, high strategic value, proven pattern

Key Questions for Group Discussion

  1. Should all five fsspec migration issues (#21865-21869) be worked as a single coordinated batch, or can they be picked up independently? Working them together enables removing _pyfilesystem2.py sooner.
  2. Who has access to Google Drive credentials for manual testing? The CI test requires GALAXY_TEST_GOOGLE_DRIVE_ACCESS_TOKEN and GALAXY_TEST_GOOGLE_DRIVE_REFRESH_TOKEN.
  3. Is there a plan for the remaining PyFilesystem2-based sources not covered in this batch (azure, posix, rspace, onedata, anvil, basespace)?
  4. Should we invest in mock-based unit tests for these cloud file sources, or is the integration test pattern sufficient?

Concerns

  • Scope creep: Keep each migration as a standalone PR. Do not combine multiple file source migrations into one PR, even though they follow the same pattern.
  • Breaking changes: The OAuth2 credential mapping between fs.googledrivefs and gdrive_fsspec must preserve backward compatibility for existing configured Galaxy instances. The YAML configuration format must not change without a documented migration path.
  • Maintenance burden: Minimal ongoing burden. The fsspec base class handles all generic file source operations; the plugin only implements _open_fs. Moving to gdrive_fsspec (under the official fsspec GitHub org) should actually reduce maintenance burden compared to fs.googledrivefs.
  • Library maturity: gdrive_fsspec is relatively new. Its API stability and completeness for Galaxy's use cases (ls, download, upload, walk, glob) should be verified before merging.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment