I'm review a PR. It adds this file lib/galaxy/files/sources/omero.py - which is implementing a Galaxy File Source plugins. I'm worried about the size of the file - it seems large and it interwines plugin implementation logic and API communication a lot - would it make sense to extract the API interaction layer out and reduce the size of the plugin or does it make more sense as is?
Based on my analysis, here's my assessment:
The OMERO file source (569 lines) is actually comparable in size to similar plugins:
- invenio.py: 555 lines
- dataverse.py: 534 lines
- elabftw.py: 959 lines (largest)
Looking at the OMERO plugin vs similar ones (Invenio/Dataverse), I see two different architectural patterns:
- Plugin class (~100-200 lines): Handles Galaxy integration, path parsing, delegating
- Interactor class (~300-400 lines): Handles all API communication
- Single plugin class (569 lines): Everything in one place
The current structure makes sense as-is for these reasons:
-
OMERO API is tightly coupled to the hierarchy: The BlitzGateway connection, object retrieval, and HQL queries are specific to how OMERO organizes data (Projects → Datasets → Images). Separating this would create an artificial boundary.
-
Less abstraction overhead: Unlike Invenio/Dataverse which use generic REST APIs (records, files, containers), OMERO uses domain-specific concepts that don't map cleanly to a generic interactor pattern.
-
Connection management: The
_connection()context manager is used throughout - splitting would require passing connections between classes or duplicating this logic. -
Size is not excessive: At 569 lines, it's well within acceptable range for a focused plugin. The separation in Invenio/Dataverse makes sense there because they inherit from a common RDM base class - but OMERO has unique requirements.
Keep as-is. The file is well-organized with clear method grouping (listing, counting, downloading), good documentation, and reasonable complexity. Extracting an API layer would add complexity without clear benefits.