Skip to content

Instantly share code, notes, and snippets.

@dannon
Created February 24, 2026 13:21
Show Gist options
  • Select an option

  • Save dannon/99fcae62dad6655a7b2989d992a5ad5c to your computer and use it in GitHub Desktop.

Select an option

Save dannon/99fcae62dad6655a7b2989d992a5ad5c to your computer and use it in GitHub Desktop.
Triage: galaxyproject/galaxy #21873 - Dataset name can be changed before job starts

Issue #21873: Dataset name can be changed before job starts

  • State: OPEN
  • Author: kostrykin
  • Labels: (none)
  • Comments: 0
  • Projects: Galaxy Dev - weeklies (Triage/Discuss)
  • Milestone: (none)

Description

Galaxy lets me change the name of a dataset before the job starts (it is still gray). When the job finishes, the dataset name reverts to the default.

Galaxy Version

Galaxy Version: version_major 25.1, version_minor 2.dev0 (usegalaxy.eu)

Browser and Operating System

  • Operating System: macOS
  • Browser: Firefox

To Reproduce

Video: https://github.com/user-attachments/assets/d86b26c5-c293-464b-bcd2-5be1efc217ad

Expected behavior

Galaxy should not allow to change the name of a dataset before the job finishes, I think.

Issue #21873: Code Research - Dataset name reversion on job completion

Bug Summary

When a user renames a dataset output while its creating job is still queued/running (the dataset appears gray in the history panel), the name reverts to the tool-generated default once the job completes.

Key Code Paths Investigated

1. Initial Output Name Setting (Job Creation)

File: lib/galaxy/tools/actions/__init__.py (lines 623-625)

During tool execution, output datasets are created and named:

data.name = self.get_output_name(
    output, data, tool, on_text, trans, incoming, history, wrapped_params.params, job_params
)

The name is typically "Tool Name on data X" (from _get_default_data_name, line 1086-1092) or a label-based template (line 1070-1073).

2. Dataset Name Update API

File: lib/galaxy/webapps/galaxy/services/history_contents.py (lines 623-659, 898-914, 938-941)

The API allows updating dataset attributes (including name) at any time. The only state check is error_if_uploading (line 934) -- there is no check preventing renames while a job is running. The user's rename via the "Edit Dataset Attributes" UI does update the database successfully.

3. Extended Metadata Strategy (PRIMARY CAUSE)

The extended metadata strategy is the default for modern Galaxy deployments (including usegalaxy.eu). This is the most likely cause.

Serialization at job prep time - lib/galaxy/metadata/__init__.py (lines 224-235):

export_directory = os.path.join(metadata_dir, "outputs_new")
with DirectoryModelExportStore(export_directory, for_edit=True, ...) as export_store:
    export_store.export_job(job, tool=tool)
    for dataset in datasets_dict.values():
        export_store.add_dataset(dataset)  # Serializes dataset attributes including name

This happens during setup_external_metadata, which is called as part of command building during prepare() -- BEFORE the job runs. The dataset's name at this point is the tool-generated default.

Metadata setting process - lib/galaxy/metadata/set_metadata.py (lines 295-301):

export_store = store.DirectoryModelExportStore(
    tool_job_working_directory / "metadata/outputs_populated",
    serialize_dataset_objects=True,
    for_edit=True,
    strip_metadata_files=False,
    serialize_jobs=True,
)

The dataset is loaded from outputs_new (which has the OLD pre-rename name), processed, then serialized to outputs_populated.

Re-import on job finish - lib/galaxy/jobs/__init__.py (lines 2128-2138):

if extended_metadata:
    import_options = store.ImportOptions(allow_dataset_object_edit=True, allow_edit=True)
    import_model_store = store.get_import_model_store_for_directory(
        os.path.join(self.working_directory, "metadata", "outputs_populated"),
        app=self.app,
        import_options=import_options,
        ...
    )
    import_model_store.perform_import(history=job.history, job=job)

The overwrite - lib/galaxy/model/store/__init__.py (lines 517-539):

if "id" in dataset_attrs and self.import_options.allow_edit and not self.sessionless:
    model_class = getattr(model, dataset_attrs["model_class"])
    dataset_instance = self.sa_session.get(model_class, dataset_attrs["id"])
    attributes = [
        "name",       # <-- THIS IS THE PROBLEM
        "extension",
        "info",
        "blurb",
        "peek",
        "designation",
        "visible",
        "metadata",
        "tool_version",
        "validated_state",
        "validated_state_message",
    ]
    for attribute in attributes:
        if attribute in dataset_attrs:
            value = dataset_attrs[attribute]
            setattr(dataset_instance, attribute, value)  # Overwrites the user's renamed name

The import fetches the dataset from the real database (which has the user's renamed name) and then overwrites it with the serialized name from outputs_populated (which has the old tool-generated name).

4. Non-Extended Metadata Strategy (SECONDARY CAUSE)

File: lib/galaxy/jobs/__init__.py (lines 2019-2022) and lib/galaxy/job_execution/setup.py (line 36):

TOOL_PROVIDED_JOB_METADATA_KEYS = ["name", "info", "dbkey", "created_from_basename"]

for context_key in TOOL_PROVIDED_JOB_METADATA_KEYS:
    if context_key in context:
        context_value = context[context_key]
        setattr(dataset, context_key, context_value)

In the non-extended (directory) metadata strategy, the name would only be overwritten if the tool provides a "name" key in its galaxy.json metadata file. Most tools do NOT write to galaxy.json, so this path would typically preserve the user's rename. However, tools that do write galaxy.json with a name entry would also exhibit this bug.

Theories

Theory 1 (MOST PROBABLE): Extended metadata serialization race condition

The extended metadata strategy serializes dataset attributes (including name) at job preparation time, before the job runs. If the user renames the dataset after this serialization but before the job finishes, the import on job completion overwrites the user's rename with the stale serialized name. This is the most likely cause given that usegalaxy.eu uses the extended metadata strategy.

The timeline:

  1. Tool executed, output HDA created with name "Tool X on data 1"
  2. setup_external_metadata serializes dataset to outputs_new (name = "Tool X on data 1")
  3. Job is queued, user renames dataset to "My Custom Name" (DB updated)
  4. Job runs, set_metadata_portable loads from outputs_new, writes to outputs_populated
  5. finish() imports from outputs_populated, overwrites name back to "Tool X on data 1"

Theory 2: Tool-provided metadata override

For tools that write galaxy.json with a "name" entry, both the extended and non-extended metadata strategies will overwrite the dataset name. This is less likely for the general case but could affect specific tools.

Theory 3: Post-job actions from workflows

If the dataset was created by a workflow step with a RenameDatasetAction post-job action, the name would be set during job finish (line 2210-2212) regardless of what the user renamed it to. However, this is workflow-specific and wouldn't explain the general case reported.

Conclusion

Theory 1 is the most probable root cause. The fix should either:

  • (a) Exclude "name" from the attributes overwritten during the extended metadata import when the user has modified it since job creation, or
  • (b) Re-read the current name from the DB at serialization time (during set_metadata_portable) rather than relying on the stale outputs_new export, or
  • (c) Skip overwriting "name" in the import if it hasn't changed from the tool-generated default (i.e., only overwrite name if the tool/metadata process explicitly set a new one).

Issue #21873: Importance Assessment - Dataset name reversion

Severity: Low

This is a data presentation / cosmetic issue, not a data integrity or security issue. The actual dataset content, metadata, and provenance are unaffected. Only the user-facing display name is reverted. No data is lost or corrupted.

Blast Radius: All users (with extended metadata strategy)

  • Extended metadata strategy users: All users on Galaxy instances using the extended metadata strategy (which includes usegalaxy.eu, usegalaxy.org, and most modern deployments) are affected when they rename an output dataset while the creating job is still pending or running.
  • Directory metadata strategy users: Only affected if the specific tool writes a "name" entry to galaxy.json, which is uncommon.
  • Practical impact: The number of users who actually rename datasets while jobs are still running is likely a small subset of total users. Most users wait until the job completes before organizing/renaming outputs. However, for power users who queue many jobs and organize results proactively, this is a recurring annoyance.

Workaround Existence: Acceptable

Users can simply rename the dataset again after the job completes. The name will persist correctly once the job is in a terminal state. This is a minor inconvenience, not a blocker.

Regression Status: Long-standing

This behavior has been present since the extended metadata strategy was introduced. It is not a recent regression from a specific version. The extended metadata architecture inherently serializes dataset state at job preparation time, and the re-import on completion has always overwritten all listed attributes including name. The relevant code in lib/galaxy/model/store/__init__.py (the import attributes list) and lib/galaxy/metadata/__init__.py (the export at setup time) has been stable for multiple release cycles.

User Impact Signals

  • Issue reactions: 0 reactions on the issue (newly filed)
  • Duplicate reports: No known duplicates found
  • Comments: 0 comments
  • The low engagement suggests this is noticed but not a high-priority pain point for most users.

Recommendation: Backlog (next release or opportunistic fix)

Rationale: This is a legitimate UX bug that should be fixed, but it does not warrant a hotfix or urgent prioritization. The workaround is trivial (re-rename after job completion), no data is lost, and the blast radius in practice is limited to a specific user workflow pattern. It should be addressed in the next release cycle as a quality-of-life improvement.

Priority: Low-medium. Good candidate for a contributor looking for a well-scoped bug fix. The fix is relatively contained (a few files) but requires careful thought about which attributes should and shouldn't be overwritten during the import step.

Issue #21873: Fix Plan - Preserve user-modified dataset name across job completion

Root Cause

When the extended metadata strategy is used, dataset attributes (including name) are serialized at job preparation time into the outputs_new directory. On job completion, these attributes are imported back and unconditionally overwrite the current database values, including any user-made renames.

Proposed Fix

The most robust approach: do not overwrite name during the extended metadata import unless the metadata process itself explicitly changed it (e.g., via tool-provided galaxy.json metadata).

Approach: Skip name overwrite in the import path

The core change is in lib/galaxy/model/store/__init__.py in the _import_datasets method. When importing with allow_edit=True for an existing dataset (the extended metadata completion path), the name attribute should not be blindly overwritten from the serialized data. Instead, the name should only be updated if it was explicitly changed by the tool/metadata process (i.e., if it differs from the name that was originally serialized into outputs_new).

However, the simplest and most correct approach is to recognize that name is a user-facing attribute that should not be managed by the metadata collection system at all. The metadata system's job is to set technical metadata (extension, peek, blurb, metadata, tool_version, etc.), not to manage the display name.

Implementation

Option A (Recommended): Remove name from the extended metadata import attribute list

File: lib/galaxy/model/store/__init__.py, around line 521

Change:

attributes = [
    "name",
    "extension",
    "info",
    ...
]

To:

attributes = [
    "extension",
    "info",
    ...
]

Then separately handle the name attribute only if it was explicitly provided by tool-provided metadata (galaxy.json). This can be done by checking if the serialized name differs from what was originally set during job creation, or by having the metadata process flag when it has explicitly set a name.

However, this simple removal could break tools that legitimately set the output name via the metadata process. The TOOL_PROVIDED_JOB_METADATA_KEYS already handles tool-provided name setting in set_metadata.py (line 516-519), where setattr(dataset, context_key, context_value) is called for keys from galaxy.json including "name". This means the dataset's name in the outputs_populated export would already reflect the tool-provided name, so simply not importing name here would lose tool-provided name changes.

Option B (Safer): Only overwrite name if it actually changed via the metadata process

Track whether the name was changed during the metadata process. Compare the name in outputs_populated against the name in outputs_new (the pre-metadata-process state). If they differ, the metadata process changed the name (e.g., via galaxy.json), and it should be applied. If they are the same, the metadata process did not change the name, and the current DB value (which may reflect user edits) should be preserved.

File changes for Option B:

  1. lib/galaxy/metadata/set_metadata.py - During extended metadata collection, record the original name from outputs_new alongside the final name in outputs_populated. This could be done by adding an _original_name field to the exported dataset attributes, or by exporting a separate manifest.

  2. lib/galaxy/model/store/__init__.py - In _import_datasets, when processing the name attribute, check if it was changed by the metadata process:

    if attribute == "name":
        original_name = dataset_attrs.get("_original_name", value)
        if value == original_name:
            # Name wasn't changed by metadata process, preserve DB value
            continue

Option C (Simplest practical fix): Store user-modified flag

Add a boolean column or flag to track whether the user has manually modified the dataset name. During the import, skip overwriting name if this flag is set.

File changes for Option C:

  1. lib/galaxy/model/__init__.py - Add a name_user_set boolean field to HistoryDatasetAssociation
  2. Database migration in lib/galaxy/model/migrations/alembic/versions_gxy/
  3. lib/galaxy/webapps/galaxy/services/history_contents.py - Set name_user_set = True when user updates the name
  4. lib/galaxy/model/store/__init__.py - Skip name overwrite if name_user_set is True

Recommended Approach: Option B

Option B is the safest because:

  • It does not require a database migration
  • It preserves tool-provided name changes (from galaxy.json)
  • It preserves user renames
  • It is contained to the metadata/store layer

Affected Files

File Change
lib/galaxy/metadata/set_metadata.py Record original dataset name before metadata processing
lib/galaxy/model/store/__init__.py Conditionally skip name overwrite during import

Testing Strategy

Unit Tests

  1. Test that user-renamed dataset name persists after job completion with extended metadata:

    • Create a job with an output dataset
    • Simulate the metadata export (outputs_new) with the default name
    • Update the dataset name in the DB (simulating user rename)
    • Import from outputs_populated with the default name
    • Verify the user's rename is preserved
  2. Test that tool-provided name changes are still applied:

    • Create a job with an output dataset
    • Simulate tool writing a new name via galaxy.json
    • Verify the tool-provided name is applied after import

Integration Tests

  1. Run a simple tool, rename the output while the job is queued, verify the name persists after completion
  2. Run a tool that sets output name via galaxy.json, verify the tool-provided name is applied

Manual Testing on usegalaxy.eu (or test instance with extended metadata)

Reproduce the exact scenario from the issue report:

  1. Run a tool
  2. While the output is gray (job queued/running), rename it via the edit attributes UI
  3. Wait for job to complete
  4. Verify the renamed name persists

Migration Considerations

  • No database migration required for Option B
  • No API changes required
  • No client-side changes required
  • Backward compatible -- existing behavior for tools that don't rename datasets is unchanged

Risk Assessment

  • Low risk: The change is contained to the metadata import path
  • Edge cases: Tools that write "name" to galaxy.json will still work as expected (the name change will be detected as a metadata-process change and applied)
  • Regression risk: Minimal, as the current behavior is clearly incorrect (overwriting user input)

Issue #21873: Triage Summary - Dataset name can be changed before job starts

Top-Line Summary

When a user renames a dataset output while its creating job is still pending or running, the name reverts to the tool-generated default upon job completion. The most probable root cause is the extended metadata strategy's serialization/import cycle: dataset attributes (including name) are serialized to disk at job preparation time, and when the job completes, the import unconditionally overwrites the database with the stale serialized values -- clobbering any user edits made in the interim. The recommended fix is to detect whether the name attribute was actually changed by the metadata process (e.g., via tool-provided galaxy.json) and only apply it in that case, preserving user renames otherwise. The key files are lib/galaxy/model/store/__init__.py (the import attribute overwrite at line 521-539) and lib/galaxy/metadata/set_metadata.py (the serialization/processing pipeline).

Importance Assessment Summary

Dimension Assessment
Severity Low -- cosmetic/UX issue, no data loss or corruption
Blast radius All users on extended metadata strategy (most modern deployments including usegalaxy.eu/org), but only when renaming during active jobs
Workaround Acceptable -- rename the dataset again after job completion
Regression status Long-standing, not a recent regression; inherent to the extended metadata architecture
User signals 0 reactions, 0 comments, 0 duplicates
Priority recommendation Backlog / next release -- good quality-of-life fix, no urgency

Questions for Group Discussion

  1. Should renaming be prevented while the job is running (the reporter's suggestion), or should the rename be preserved? Preventing renaming is simpler to implement (client or API guard) but reduces user flexibility. Preserving the rename is more correct but touches the metadata import path.

  2. Are there other user-modifiable attributes that suffer the same overwrite problem? The import list includes name, info, visible, and others. If users can edit info (annotation) while a job runs, that would also be overwritten. Should we address all such attributes together?

  3. Should name be in the metadata import attribute list at all? The metadata system is designed for technical metadata (extension, peek, dbkey, etc.), not user-facing display attributes. Removing name from the import list might be the cleanest fix, but we need to verify that no tools depend on the metadata import path for name-setting (as opposed to galaxy.json tool-provided metadata, which is handled separately).

  4. What about the info field? info is also in the import list and is also user-editable. The same race condition likely applies. If we fix name, we should consider info too.

Effort Estimate

  • Implementation: Small -- 1-2 files, ~20-50 lines changed
  • Testing: Medium -- need to test both extended and directory metadata strategies, with and without tool-provided metadata name changes
  • Total: 1-2 days of focused work including tests

Difficulty of Recreating/Testing

  • Easy to reproduce: Run any tool on a Galaxy instance with extended metadata, rename the output before the job completes, observe the name revert.
  • Automated testing: Moderately straightforward. Integration tests can exercise the full flow. Unit tests can mock the serialization/import cycle. The existing test infrastructure for metadata strategies should provide good scaffolding.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment