Skip to content

Instantly share code, notes, and snippets.

@mvdbeek
Created November 24, 2025 15:18
Show Gist options
  • Select an option

  • Save mvdbeek/29924e2abc2382c2ac2042ca4aec377a to your computer and use it in GitHub Desktop.

Select an option

Save mvdbeek/29924e2abc2382c2ac2042ca4aec377a to your computer and use it in GitHub Desktop.

Aspera ascp File Source Implementation Plan

Overview

This document outlines the implementation plan for creating a Galaxy file source plugin that uses Aspera ascp for high-speed file downloads. The implementation will be a configured plugin (not stock) with download-only functionality, using a custom fsspec filesystem.

Requirements Summary

  • Plugin Type: Configured plugin only (requires explicit configuration)
  • Features: Download-only (no upload or browsing)
  • Authentication: Static configuration with embedded SSH key as string
  • Implementation: Custom fsspec filesystem with subprocess calls to ascp

Architecture

Component Overview

┌─────────────────────────────────────────────────┐
│ Galaxy File Source Plugin (ascp.py)              │
│  - Configuration models                          │
│  - Plugin registration                           │
│  - URL matching                                  │
└───────────────────┬─────────────────────────────┘
                    │
                    ▼
┌─────────────────────────────────────────────────┐
│ Custom fsspec Filesystem (ascp_fsspec.py)       │
│  - AbstractFileSystem implementation             │
│  - Temp key file management                      │
│  - ascp subprocess execution                     │
└─────────────────────────────────────────────────┘
                    │
                    ▼
┌─────────────────────────────────────────────────┐
│ ascp Command Line Tool                          │
│  ascp -T -l 300m -P 33001 -i <key> user@host:.. │
└─────────────────────────────────────────────────┘

Implementation Details

1. Custom fsspec Filesystem

File: lib/galaxy/files/sources/ascp_fsspec.py

Class: AscpFileSystem(AbstractFileSystem)

Purpose: Wraps ascp command-line tool as an fsspec filesystem.

Initialization Parameters:

  • ascp_path (str): Path to ascp binary (default: "ascp")
  • ssh_key (str): SSH private key content as string
  • user (str): Username for ascp connection (e.g., "era-fasp")
  • host (str): Hostname (e.g., "fasp.sra.ebi.ac.uk")
  • rate_limit (str): Transfer rate limit (default: "300m")
  • port (int): SSH port (default: 33001)
  • disable_encryption (bool): Use -T flag for maximum speed (default: True)

Key Methods:

  1. _get_file(rpath, lpath, **kwargs)

    • Downloads a file from remote path to local path
    • Creates secure temporary file for SSH key:
      with tempfile.NamedTemporaryFile(mode='w', delete=False, suffix='.key') as key_file:
          key_file.write(self.ssh_key)
          key_file_path = key_file.name
      os.chmod(key_file_path, 0o600)  # Required by ascp
    • Builds ascp command:
      ascp -T -l 300m -P 33001 -i <temp_key_file> user@host:rpath lpath
    • Executes subprocess with error handling
    • Cleans up temporary key file
  2. _parse_url(url) (helper method)

    • Parses ascp://user@host:port/path URLs
    • Extracts components for command construction
  3. Error Handling:

    • Check ascp binary existence
    • Parse stderr for authentication failures
    • Handle network errors
    • Raise MessageException with user-friendly messages

Properties:

  • protocol = "ascp" (or support both "ascp" and "fasp")

2. Plugin Class

File: lib/galaxy/files/sources/ascp.py

Configuration Models

Template Configuration:

class AscpFilesSourceTemplateConfiguration(FsspecBaseFileSourceTemplateConfiguration):
    """Configuration with template expansion support"""
    ascp_path: Union[str, TemplateExpansion] = "ascp"
    ssh_key: Union[str, TemplateExpansion]  # Required
    user: Union[str, TemplateExpansion]
    host: Union[str, TemplateExpansion]
    rate_limit: Union[str, TemplateExpansion] = "300m"
    port: Union[int, TemplateExpansion] = 33001
    disable_encryption: Union[bool, TemplateExpansion] = True

Resolved Configuration:

class AscpFilesSourceConfiguration(FsspecBaseFileSourceConfiguration):
    """Runtime configuration with actual values"""
    ascp_path: str = "ascp"
    ssh_key: str  # Required field
    user: str
    host: str
    rate_limit: str = "300m"
    port: int = 33001
    disable_encryption: bool = True

Plugin Class: AscpFilesSource

class AscpFilesSource(FsspecFilesSource[
    AscpFilesSourceTemplateConfiguration,
    AscpFilesSourceConfiguration
]):
    plugin_type = "ascp"
    required_module = AscpFileSystem
    required_package = None  # Custom implementation

    def _open_fs(self, context: FilesSourceRuntimeContext, cache_options):
        """Instantiate the custom ascp fsspec filesystem"""
        config = context.config
        return AscpFileSystem(
            ascp_path=config.ascp_path,
            ssh_key=config.ssh_key,
            user=config.user,
            host=config.host,
            rate_limit=config.rate_limit,
            port=config.port,
            disable_encryption=config.disable_encryption,
            **cache_options,
        )

    def score_url_match(self, url: str) -> int:
        """Match ascp:// and fasp:// URLs"""
        if url.startswith("ascp://") or url.startswith("fasp://"):
            return len("ascp://")
        return 0

3. Documentation

File: lib/galaxy/files/sources/ascp_README.md

Contents:

  • Overview of Aspera ascp file source
  • Requirements:
    • ascp binary installation
    • SSH private key for authentication
  • Configuration guide with examples
  • Usage examples
  • Troubleshooting common issues
  • Security considerations

4. Example Configuration

File: lib/galaxy/files/sources/ascp_example_config.yml

# Aspera ascp File Source Configuration Example

# Example 1: EBI SRA Downloads
- type: ascp
  id: ebi_aspera
  label: "EBI Aspera Downloads"
  doc: "High-speed downloads from EBI SRA using Aspera"

  # Aspera configuration
  ascp_path: "/usr/local/bin/ascp"
  user: "era-fasp"
  host: "fasp.sra.ebi.ac.uk"
  port: 33001
  rate_limit: "300m"
  disable_encryption: true

  # SSH private key embedded as string
  ssh_key: |
    -----BEGIN RSA PRIVATE KEY-----
    MIIEowIBAAKCAQEAz6scc2q19eXLfYNLcmBMjWtNoFRTVATvxbNXZJmMhHFL04TP
    ... (key content) ...
    -----END RSA PRIVATE KEY-----

  # Standard Galaxy file source options
  writable: false
  browsable: false

# Example 2: Custom Aspera Endpoint
- type: ascp
  id: custom_aspera
  label: "Custom Aspera Server"
  doc: "Downloads from custom Aspera endpoint"

  ascp_path: "ascp"  # Uses PATH
  user: "myuser"
  host: "aspera.example.com"
  port: 33001
  rate_limit: "500m"
  disable_encryption: false  # Enable encryption

  ssh_key: |
    -----BEGIN RSA PRIVATE KEY-----
    ... (custom key) ...
    -----END RSA PRIVATE KEY-----

  writable: false
  browsable: false

5. Unit Tests

File: test/unit/files/test_ascp.py

Test Cases:

  1. test_plugin_loading()

    • Verify plugin can be instantiated
    • Test with minimal configuration
  2. test_configuration_parsing()

    • Test all configuration fields
    • Verify defaults are applied correctly
    • Test template expansion if applicable
  3. test_url_matching()

    • Test score_url_match() for ascp:// URLs
    • Test for fasp:// URLs
    • Test non-matching URLs return 0
  4. test_temp_key_file_creation()

    • Mock file operations
    • Verify temp file has correct permissions (0600)
    • Verify content is written correctly
  5. test_ascp_command_construction()

    • Mock subprocess.run()
    • Capture command arguments
    • Verify correct flags: -T, -l, -P, -i
    • Verify user@host:path format
  6. test_download_success()

    • Mock successful subprocess execution
    • Verify file is "downloaded" (mocked)
    • Verify temp key cleanup
  7. test_download_failure()

    • Mock subprocess failure
    • Verify appropriate exception is raised
    • Verify error message is user-friendly
  8. test_missing_ascp_binary()

    • Mock missing ascp executable
    • Verify FileNotFoundError or similar
  9. test_key_cleanup_on_error()

    • Simulate error during transfer
    • Verify temp key file is still removed
  10. test_authentication_failure()

    • Mock ascp authentication error
    • Verify error handling

Test Configuration File: test/unit/files/ascp_file_sources_conf.yml

- type: ascp
  id: test_ascp
  label: "Test Aspera"
  ascp_path: "/usr/bin/ascp"
  user: "test-user"
  host: "test.example.com"
  ssh_key: |
    -----BEGIN RSA PRIVATE KEY-----
    (test key content)
    -----END RSA PRIVATE KEY-----
  writable: false
  browsable: false

Security Considerations

SSH Key Handling

  1. Temporary File Security:

    • Create temp files with restrictive permissions (0600)
    • Use tempfile.NamedTemporaryFile for automatic cleanup
    • Delete temp files in finally block for guaranteed cleanup
  2. Key Storage:

    • Keys stored in configuration files should have appropriate file permissions
    • Consider using encrypted configuration or secrets management
    • Never log or print SSH key content
  3. Command Execution:

    • Use subprocess.run() with explicit arguments (not shell=True)
    • Capture output to prevent credential leakage
    • Sanitize error messages before showing to users

Implementation:

import tempfile
import os
import subprocess

def _get_file(self, rpath, lpath, **kwargs):
    # Create secure temp file
    key_fd, key_path = tempfile.mkstemp(suffix='.key', text=True)
    try:
        # Write key with proper permissions
        os.chmod(key_path, 0o600)
        with os.fdopen(key_fd, 'w') as key_file:
            key_file.write(self.ssh_key)

        # Build command
        cmd = [
            self.ascp_path,
            "-i", key_path,
            "-T" if self.disable_encryption else "",
            "-l", self.rate_limit,
            "-P", str(self.port),
            f"{self.user}@{self.host}:{rpath}",
            lpath
        ]
        cmd = [arg for arg in cmd if arg]  # Remove empty strings

        # Execute
        result = subprocess.run(
            cmd,
            capture_output=True,
            text=True,
            check=True,
        )
    finally:
        # Always cleanup
        try:
            os.unlink(key_path)
        except:
            pass  # Best effort cleanup

Usage Example

Configuration in file_sources_conf.yml:

- type: ascp
  id: ebi_sra
  label: "EBI SRA Aspera"
  doc: "Download FASTQ files from EBI SRA using Aspera"
  user: "era-fasp"
  host: "fasp.sra.ebi.ac.uk"
  port: 33001
  rate_limit: "300m"
  ssh_key: |
    -----BEGIN RSA PRIVATE KEY-----
    MIIEowIBAAKCAQEAz6scc2q19eXLfYNLcmBMjWtNoFRTVATvxbNXZJmMhHFL04TP
    rlojfBFH/3NO/Nvjg0d7vMkzU5Pq9LHlvK+9CmhJXzLzlFdWxXVVqwxLLvJGEZvD
    ... (rest of key) ...
    -----END RSA PRIVATE KEY-----

Using in Galaxy:

  1. Via URL in tool/workflow:

    ascp://vol1/fastq/ERR164/ERR164407/ERR164407.fastq.gz
    
  2. API Usage:

    # Galaxy will match the URL to the configured ascp plugin
    # and use the credentials from configuration
  3. Download Command Generated:

    ascp -T -l 300m -P 33001 \
      -i /tmp/tmpXXXXXX.key \
      era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/ERR164/ERR164407/ERR164407.fastq.gz \
      /path/to/galaxy/dataset.dat

File Structure Summary

lib/galaxy/files/sources/
├── ascp_fsspec.py              # ~150 lines - Custom fsspec filesystem
├── ascp.py                      # ~100 lines - Plugin class
├── ascp_README.md               # Documentation
└── ascp_example_config.yml      # Configuration examples

test/unit/files/
├── test_ascp.py                 # ~200 lines - Unit tests
└── ascp_file_sources_conf.yml   # Test configuration

Implementation Checklist

  • Create ascp_fsspec.py with AscpFileSystem class
  • Create ascp.py with plugin class and configuration models
  • Create ascp_README.md documentation
  • Create ascp_example_config.yml with examples
  • Create test_ascp.py with comprehensive unit tests
  • Create ascp_file_sources_conf.yml for testing
  • Test plugin loading and configuration
  • Test URL matching
  • Test download functionality (with mocks)
  • Test error handling
  • Test temp file security and cleanup
  • Integration test with actual ascp binary (manual)
  • Documentation review
  • Code review

Future Enhancements (Out of Scope)

The following features are not part of this initial implementation but could be added later:

  1. Upload Support: Implement _put_file() for bidirectional transfers
  2. Directory Browsing: Implement ls() using SSH/SFTP
  3. Stock Plugin: Make it load automatically for fasp:// URLs
  4. Progress Monitoring: Add callback support for transfer progress
  5. Multiple Authentication: Support user preferences and environment variables
  6. Endpoint Discovery: Auto-detect Aspera endpoints from URLs
  7. Bandwidth Policies: Per-user or per-group rate limiting
  8. Resume Support: Use ascp's resume capabilities for interrupted transfers

References

  • Galaxy File Sources Documentation: lib/galaxy/files/sources/__init__.py
  • Fsspec Documentation: https://filesystem-spec.readthedocs.io/
  • Aspera ascp Documentation: IBM Aspera CLI documentation
  • Similar Implementation: GridFTP plugin (if available)
  • S3FS Implementation: lib/galaxy/files/sources/s3fs.py

Notes

  • This is a configured plugin only, not a stock plugin
  • No modifications needed to Galaxy core files
  • Plugin will be available after Galaxy restart
  • Requires ascp binary to be installed and accessible
  • SSH key must be valid for the target endpoint
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment