This document outlines the implementation plan for creating a Galaxy file source plugin that uses Aspera ascp for high-speed file downloads. The implementation will be a configured plugin (not stock) with download-only functionality, using a custom fsspec filesystem.
- Plugin Type: Configured plugin only (requires explicit configuration)
- Features: Download-only (no upload or browsing)
- Authentication: Static configuration with embedded SSH key as string
- Implementation: Custom fsspec filesystem with subprocess calls to ascp
┌─────────────────────────────────────────────────┐
│ Galaxy File Source Plugin (ascp.py) │
│ - Configuration models │
│ - Plugin registration │
│ - URL matching │
└───────────────────┬─────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────┐
│ Custom fsspec Filesystem (ascp_fsspec.py) │
│ - AbstractFileSystem implementation │
│ - Temp key file management │
│ - ascp subprocess execution │
└─────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────┐
│ ascp Command Line Tool │
│ ascp -T -l 300m -P 33001 -i <key> user@host:.. │
└─────────────────────────────────────────────────┘
File: lib/galaxy/files/sources/ascp_fsspec.py
Purpose: Wraps ascp command-line tool as an fsspec filesystem.
Initialization Parameters:
ascp_path(str): Path to ascp binary (default: "ascp")ssh_key(str): SSH private key content as stringuser(str): Username for ascp connection (e.g., "era-fasp")host(str): Hostname (e.g., "fasp.sra.ebi.ac.uk")rate_limit(str): Transfer rate limit (default: "300m")port(int): SSH port (default: 33001)disable_encryption(bool): Use -T flag for maximum speed (default: True)
Key Methods:
-
_get_file(rpath, lpath, **kwargs)- Downloads a file from remote path to local path
- Creates secure temporary file for SSH key:
with tempfile.NamedTemporaryFile(mode='w', delete=False, suffix='.key') as key_file: key_file.write(self.ssh_key) key_file_path = key_file.name os.chmod(key_file_path, 0o600) # Required by ascp
- Builds ascp command:
ascp -T -l 300m -P 33001 -i <temp_key_file> user@host:rpath lpath
- Executes subprocess with error handling
- Cleans up temporary key file
-
_parse_url(url)(helper method)- Parses ascp://user@host:port/path URLs
- Extracts components for command construction
-
Error Handling:
- Check ascp binary existence
- Parse stderr for authentication failures
- Handle network errors
- Raise
MessageExceptionwith user-friendly messages
Properties:
protocol = "ascp"(or support both "ascp" and "fasp")
File: lib/galaxy/files/sources/ascp.py
Template Configuration:
class AscpFilesSourceTemplateConfiguration(FsspecBaseFileSourceTemplateConfiguration):
"""Configuration with template expansion support"""
ascp_path: Union[str, TemplateExpansion] = "ascp"
ssh_key: Union[str, TemplateExpansion] # Required
user: Union[str, TemplateExpansion]
host: Union[str, TemplateExpansion]
rate_limit: Union[str, TemplateExpansion] = "300m"
port: Union[int, TemplateExpansion] = 33001
disable_encryption: Union[bool, TemplateExpansion] = TrueResolved Configuration:
class AscpFilesSourceConfiguration(FsspecBaseFileSourceConfiguration):
"""Runtime configuration with actual values"""
ascp_path: str = "ascp"
ssh_key: str # Required field
user: str
host: str
rate_limit: str = "300m"
port: int = 33001
disable_encryption: bool = Trueclass AscpFilesSource(FsspecFilesSource[
AscpFilesSourceTemplateConfiguration,
AscpFilesSourceConfiguration
]):
plugin_type = "ascp"
required_module = AscpFileSystem
required_package = None # Custom implementation
def _open_fs(self, context: FilesSourceRuntimeContext, cache_options):
"""Instantiate the custom ascp fsspec filesystem"""
config = context.config
return AscpFileSystem(
ascp_path=config.ascp_path,
ssh_key=config.ssh_key,
user=config.user,
host=config.host,
rate_limit=config.rate_limit,
port=config.port,
disable_encryption=config.disable_encryption,
**cache_options,
)
def score_url_match(self, url: str) -> int:
"""Match ascp:// and fasp:// URLs"""
if url.startswith("ascp://") or url.startswith("fasp://"):
return len("ascp://")
return 0File: lib/galaxy/files/sources/ascp_README.md
Contents:
- Overview of Aspera ascp file source
- Requirements:
- ascp binary installation
- SSH private key for authentication
- Configuration guide with examples
- Usage examples
- Troubleshooting common issues
- Security considerations
File: lib/galaxy/files/sources/ascp_example_config.yml
# Aspera ascp File Source Configuration Example
# Example 1: EBI SRA Downloads
- type: ascp
id: ebi_aspera
label: "EBI Aspera Downloads"
doc: "High-speed downloads from EBI SRA using Aspera"
# Aspera configuration
ascp_path: "/usr/local/bin/ascp"
user: "era-fasp"
host: "fasp.sra.ebi.ac.uk"
port: 33001
rate_limit: "300m"
disable_encryption: true
# SSH private key embedded as string
ssh_key: |
-----BEGIN RSA PRIVATE KEY-----
MIIEowIBAAKCAQEAz6scc2q19eXLfYNLcmBMjWtNoFRTVATvxbNXZJmMhHFL04TP
... (key content) ...
-----END RSA PRIVATE KEY-----
# Standard Galaxy file source options
writable: false
browsable: false
# Example 2: Custom Aspera Endpoint
- type: ascp
id: custom_aspera
label: "Custom Aspera Server"
doc: "Downloads from custom Aspera endpoint"
ascp_path: "ascp" # Uses PATH
user: "myuser"
host: "aspera.example.com"
port: 33001
rate_limit: "500m"
disable_encryption: false # Enable encryption
ssh_key: |
-----BEGIN RSA PRIVATE KEY-----
... (custom key) ...
-----END RSA PRIVATE KEY-----
writable: false
browsable: falseFile: test/unit/files/test_ascp.py
Test Cases:
-
test_plugin_loading()- Verify plugin can be instantiated
- Test with minimal configuration
-
test_configuration_parsing()- Test all configuration fields
- Verify defaults are applied correctly
- Test template expansion if applicable
-
test_url_matching()- Test
score_url_match()for ascp:// URLs - Test for fasp:// URLs
- Test non-matching URLs return 0
- Test
-
test_temp_key_file_creation()- Mock file operations
- Verify temp file has correct permissions (0600)
- Verify content is written correctly
-
test_ascp_command_construction()- Mock
subprocess.run() - Capture command arguments
- Verify correct flags: -T, -l, -P, -i
- Verify user@host:path format
- Mock
-
test_download_success()- Mock successful subprocess execution
- Verify file is "downloaded" (mocked)
- Verify temp key cleanup
-
test_download_failure()- Mock subprocess failure
- Verify appropriate exception is raised
- Verify error message is user-friendly
-
test_missing_ascp_binary()- Mock missing ascp executable
- Verify FileNotFoundError or similar
-
test_key_cleanup_on_error()- Simulate error during transfer
- Verify temp key file is still removed
-
test_authentication_failure()- Mock ascp authentication error
- Verify error handling
Test Configuration File: test/unit/files/ascp_file_sources_conf.yml
- type: ascp
id: test_ascp
label: "Test Aspera"
ascp_path: "/usr/bin/ascp"
user: "test-user"
host: "test.example.com"
ssh_key: |
-----BEGIN RSA PRIVATE KEY-----
(test key content)
-----END RSA PRIVATE KEY-----
writable: false
browsable: false-
Temporary File Security:
- Create temp files with restrictive permissions (0600)
- Use
tempfile.NamedTemporaryFilefor automatic cleanup - Delete temp files in finally block for guaranteed cleanup
-
Key Storage:
- Keys stored in configuration files should have appropriate file permissions
- Consider using encrypted configuration or secrets management
- Never log or print SSH key content
-
Command Execution:
- Use
subprocess.run()with explicit arguments (not shell=True) - Capture output to prevent credential leakage
- Sanitize error messages before showing to users
- Use
import tempfile
import os
import subprocess
def _get_file(self, rpath, lpath, **kwargs):
# Create secure temp file
key_fd, key_path = tempfile.mkstemp(suffix='.key', text=True)
try:
# Write key with proper permissions
os.chmod(key_path, 0o600)
with os.fdopen(key_fd, 'w') as key_file:
key_file.write(self.ssh_key)
# Build command
cmd = [
self.ascp_path,
"-i", key_path,
"-T" if self.disable_encryption else "",
"-l", self.rate_limit,
"-P", str(self.port),
f"{self.user}@{self.host}:{rpath}",
lpath
]
cmd = [arg for arg in cmd if arg] # Remove empty strings
# Execute
result = subprocess.run(
cmd,
capture_output=True,
text=True,
check=True,
)
finally:
# Always cleanup
try:
os.unlink(key_path)
except:
pass # Best effort cleanup- type: ascp
id: ebi_sra
label: "EBI SRA Aspera"
doc: "Download FASTQ files from EBI SRA using Aspera"
user: "era-fasp"
host: "fasp.sra.ebi.ac.uk"
port: 33001
rate_limit: "300m"
ssh_key: |
-----BEGIN RSA PRIVATE KEY-----
MIIEowIBAAKCAQEAz6scc2q19eXLfYNLcmBMjWtNoFRTVATvxbNXZJmMhHFL04TP
rlojfBFH/3NO/Nvjg0d7vMkzU5Pq9LHlvK+9CmhJXzLzlFdWxXVVqwxLLvJGEZvD
... (rest of key) ...
-----END RSA PRIVATE KEY------
Via URL in tool/workflow:
ascp://vol1/fastq/ERR164/ERR164407/ERR164407.fastq.gz -
API Usage:
# Galaxy will match the URL to the configured ascp plugin # and use the credentials from configuration
-
Download Command Generated:
ascp -T -l 300m -P 33001 \ -i /tmp/tmpXXXXXX.key \ era-fasp@fasp.sra.ebi.ac.uk:vol1/fastq/ERR164/ERR164407/ERR164407.fastq.gz \ /path/to/galaxy/dataset.dat
lib/galaxy/files/sources/
├── ascp_fsspec.py # ~150 lines - Custom fsspec filesystem
├── ascp.py # ~100 lines - Plugin class
├── ascp_README.md # Documentation
└── ascp_example_config.yml # Configuration examples
test/unit/files/
├── test_ascp.py # ~200 lines - Unit tests
└── ascp_file_sources_conf.yml # Test configuration
- Create
ascp_fsspec.pywithAscpFileSystemclass - Create
ascp.pywith plugin class and configuration models - Create
ascp_README.mddocumentation - Create
ascp_example_config.ymlwith examples - Create
test_ascp.pywith comprehensive unit tests - Create
ascp_file_sources_conf.ymlfor testing - Test plugin loading and configuration
- Test URL matching
- Test download functionality (with mocks)
- Test error handling
- Test temp file security and cleanup
- Integration test with actual ascp binary (manual)
- Documentation review
- Code review
The following features are not part of this initial implementation but could be added later:
- Upload Support: Implement
_put_file()for bidirectional transfers - Directory Browsing: Implement
ls()using SSH/SFTP - Stock Plugin: Make it load automatically for fasp:// URLs
- Progress Monitoring: Add callback support for transfer progress
- Multiple Authentication: Support user preferences and environment variables
- Endpoint Discovery: Auto-detect Aspera endpoints from URLs
- Bandwidth Policies: Per-user or per-group rate limiting
- Resume Support: Use ascp's resume capabilities for interrupted transfers
- Galaxy File Sources Documentation:
lib/galaxy/files/sources/__init__.py - Fsspec Documentation: https://filesystem-spec.readthedocs.io/
- Aspera ascp Documentation: IBM Aspera CLI documentation
- Similar Implementation: GridFTP plugin (if available)
- S3FS Implementation:
lib/galaxy/files/sources/s3fs.py
- This is a configured plugin only, not a stock plugin
- No modifications needed to Galaxy core files
- Plugin will be available after Galaxy restart
- Requires ascp binary to be installed and accessible
- SSH key must be valid for the target endpoint