Skip to content

Instantly share code, notes, and snippets.

@jmchilton
Created January 13, 2026 19:41
Show Gist options
  • Select an option

  • Save jmchilton/caf9dbe998d7515cddcf4e78ed0f015a to your computer and use it in GitHub Desktop.

Select an option

Save jmchilton/caf9dbe998d7515cddcf4e78ed0f015a to your computer and use it in GitHub Desktop.
Triage documents for Galaxy Issue #21536: Upload does not respect sharing

Issue #21536 Code Research: Upload Does Not Respect Sharing

Bug Summary

Issue: #21536 Title: "25.0 upload does not respect sharing" Reporter: bernt-matthias Galaxy Version: 25.0

When a user has a non-shareable (private) scratch object store configured, uploading data fails with:

galaxy.objectstore.ObjectCreationProblemSharingDisabled

Error Message: "Job attempted to create sharable output datasets in a storage location with sharing disabled"

Complete Traceback

File "/gpfs1/data/galaxy_server/galaxy/lib/galaxy/jobs/runners/__init__.py", line 206, in put
  queue_job = job_wrapper.enqueue()
File "/gpfs1/data/galaxy_server/galaxy/lib/galaxy/jobs/__init__.py", line 1799, in enqueue
  self._set_object_store_ids(job)
File "/gpfs1/data/galaxy_server/galaxy/lib/galaxy/jobs/__init__.py", line 1825, in _set_object_store_ids
  self._set_object_store_ids_full(job)
File "/gpfs1/data/galaxy_server/galaxy/lib/galaxy/jobs/__init__.py", line 1916, in _set_object_store_ids_full
  object_store_populator.set_object_store_id(dataset, require_shareable=require_shareable)
File "/gpfs1/data/galaxy_server/galaxy/lib/galaxy/objectstore/__init__.py", line 2099, in set_object_store_id
  self.set_dataset_object_store_id(data.dataset, require_shareable=require_shareable)
File "/gpfs1/data/galaxy_server/galaxy/lib/galaxy/objectstore/__init__.py", line 2109, in set_dataset_object_store_id
  raise ObjectCreationProblemSharingDisabled()
galaxy.objectstore.ObjectCreationProblemSharingDisabled

Code Path Analysis

1. Job Runner - put() Method

File: /lib/galaxy/jobs/runners/__init__.py Lines: 203-214

def put(self, job_wrapper: "MinimalJobWrapper") -> None:
    """Add a job to the queue (by job identifier), indicate that the job is ready to run."""
    put_timer = ExecutionTimer()
    try:
        queue_job = job_wrapper.enqueue()
    except Exception as e:
        queue_job = False
        # Required for exceptions thrown by object store incompatibility.
        # tested by test/integration/objectstore/test_private_handling.py
        message = e.client_message if hasattr(e, "client_message") else str(e)
        job_wrapper.fail(message, exception=e)
        log.debug(f"Job [{job_wrapper.job_id}] failed to queue {put_timer}")
        return

2. Job Wrapper - enqueue() Method

File: /lib/galaxy/jobs/__init__.py Lines: 1781-1794

def enqueue(self):
    job = self.get_job()
    # Change to queued state before handing to worker thread so the runner won't pick it up again
    if self.is_task:
        self.change_state(Job.states.QUEUED, flush=False, job=job)
    elif not self.queue_with_limit(job, self.job_destination):
        return False
    job.update_output_states(self.app.application_stack.supports_skip_locked())
    # Set object store after job destination so can leverage parameters...
    self._set_object_store_ids(job)  # <-- Entry point for object store assignment
    # Now that we have the object store id, check if we are over the limit
    self._pause_job_if_over_quota(job)
    self.sa_session.commit()
    return True

3. _set_object_store_ids_full() - The Core Logic

File: /lib/galaxy/jobs/__init__.py Lines: 1845-1924

Key snippet showing require_shareable determination:

def _set_object_store_ids_full(self, job: Job):
    user = job.user
    object_store_id = self.get_destination_configuration("object_store_id", None)
    # ... object_store_id resolution logic ...

    require_shareable = job.requires_shareable_storage(self.app.security_agent)  # <-- KEY LINE

    if split_object_stores is None:
        object_store_populator = ObjectStorePopulator(self.app, user)
        if object_store_id:
            object_store_populator.object_store_id = object_store_id
        for dataset_assoc in job.output_datasets + job.output_library_datasets:
            dataset = dataset_assoc.dataset
            object_store_populator.set_object_store_id(dataset, require_shareable=require_shareable)

4. Job.requires_shareable_storage() - Shareability Check

File: /lib/galaxy/model/__init__.py Lines: 2215-2226

def requires_shareable_storage(self, security_agent):
    # An easy optimization would be to calculate this in galaxy.tools.actions when the
    # job is created and all the output permissions are already known. Having to reload
    # these permissions in the job code shouldn't strictly be needed.

    requires_sharing = False
    for dataset_assoc in self.output_datasets + self.output_library_datasets:
        if not security_agent.dataset_is_private_to_a_user(dataset_assoc.dataset.dataset):
            requires_sharing = True
            break

    return requires_sharing

5. dataset_is_private_to_a_user() - Privacy Check

File: /lib/galaxy/model/security.py Lines: 1152-1163

def dataset_is_private_to_a_user(self, dataset):
    """
    If the Dataset object has exactly one access role and that is
    the current user's private role then we consider the dataset private.
    """
    access_roles = dataset.get_access_roles(self)

    if len(access_roles) != 1:
        return False
    else:
        access_role = access_roles[0]
        return access_role.type == Role.types.PRIVATE

6. ObjectStorePopulator.set_dataset_object_store_id() - Where Exception is Raised

File: /lib/galaxy/objectstore/__init__.py Lines: 2105-2116

def set_dataset_object_store_id(self, dataset: "Dataset", require_shareable: bool = True) -> None:
    # Create an empty file immediately.  The first dataset will be
    # created in the "default" store, all others will be created in
    # the same store as the first.
    dataset.object_store_id = self.object_store_id
    try:
        concrete_store = self.object_store.create(dataset)
        if concrete_store.private and require_shareable:  # <-- EXCEPTION TRIGGER
            raise ObjectCreationProblemSharingDisabled()
    except ObjectInvalid:
        raise ObjectCreationProblemStoreFull()
    self.object_store_id = dataset.object_store_id

Upload Dataset Permission Flow

Upload creates HDA via upload_common.py

File: /lib/galaxy/tools/actions/upload_common.py Lines: 131-151

def __new_history_upload(trans, uploaded_dataset, history=None, state=None):
    if not history:
        history = trans.history
    hda = HistoryDatasetAssociation(
        name=uploaded_dataset.name,
        extension=uploaded_dataset.file_type,
        dbkey=uploaded_dataset.dbkey,
        history=history,
        create_dataset=True,
        sa_session=trans.sa_session,
    )
    trans.sa_session.add(hda)
    # ... state setup ...
    history.add_dataset(hda, genome_build=uploaded_dataset.dbkey, quota=False)
    permissions = trans.app.security_agent.history_get_default_permissions(history)  # <-- KEY LINE
    trans.app.security_agent.set_all_dataset_permissions(hda.dataset, permissions, new=True, flush=False)
    trans.sa_session.commit()
    return hda

History Default Permissions

File: /lib/galaxy/model/security.py Lines: 865-873

def history_get_default_permissions(self, history):
    permissions = {}
    for dhp in history.default_permissions:
        action = self.get_action(dhp.action)
        if action in permissions:
            permissions[action].append(dhp.role)
        else:
            permissions[action] = [dhp.role]
    return permissions

User Default Permissions Setup

File: /lib/galaxy/model/security.py Lines: 717-726

def create_user_role(self, user, app):
    # Create private user role if necessary
    self.get_private_user_role(user, auto_create=True)
    # Create default user permissions if necessary
    if not user.default_permissions:
        if hasattr(app.config, "new_user_dataset_access_role_default_private"):
            permissions = app.config.new_user_dataset_access_role_default_private
            self.user_set_default_permissions(user, default_access_private=permissions)
        else:
            self.user_set_default_permissions(user, history=True, dataset=True)

Configuration Option

File: /lib/galaxy/config/schemas/config_schema.yml Lines: 2888-2896

new_user_dataset_access_role_default_private:
  type: bool
  default: false
  required: false
  desc: |
    By default, users' data will be public, but setting this to true will cause
    it to be private.  Does not affect existing users and data, only ones created
    after this option is set.  Users may still change their default back to
    public.

Existing Test Coverage

File: /test/integration/objectstore/test_private_handling.py

Two test classes exist:

  1. TestPrivatePreventsSharingObjectStoreIntegration - Tests with new_user_dataset_access_role_default_private = True (works)
  2. TestPrivateCannotWritePublicDataObjectStoreIntegration - Tests with new_user_dataset_access_role_default_private = False (expects error)
class TestPrivateCannotWritePublicDataObjectStoreIntegration(BaseObjectStoreIntegrationTestCase):
    @classmethod
    def handle_galaxy_config_kwds(cls, config):
        config["new_user_dataset_access_role_default_private"] = False  # <-- This is the bug scenario
        cls._configure_object_store(PRIVATE_OBJECT_STORE_CONFIG_TEMPLATE, config)

    def test_both_types(self):
        with self.dataset_populator.test_history() as history_id:
            response = self.dataset_populator.new_dataset_request(
                history_id, content=TEST_INPUT_FILES_CONTENT, wait=True, assert_ok=False
            )
            job = response.json()["jobs"][0]
            final_state = self.dataset_populator.wait_for_job(job["id"])
            assert final_state == "error"  # <-- Currently this is "expected" behavior

Root Cause Analysis

Problem Summary

The issue occurs when:

  1. User has a private (non-shareable) object store configured (e.g., scratch storage)
  2. User's new_user_dataset_access_role_default_private is False (default Galaxy config)
  3. User uploads a file

Why it Fails

  1. Upload creates HDA: When uploading, __new_history_upload() creates a new HDA and applies history's default permissions
  2. Permissions are public by default: If new_user_dataset_access_role_default_private = False, the dataset has no DATASET_ACCESS role restrictions (or multiple roles), making it "shareable"
  3. Job enqueue checks shareability: job.requires_shareable_storage() checks if any output dataset is NOT private to a single user
  4. Dataset appears non-private: Since there's no single private access role, dataset_is_private_to_a_user() returns False
  5. require_shareable becomes True: The job requires shareable storage
  6. Object store is private: When trying to create the dataset in the private object store, the check concrete_store.private and require_shareable triggers ObjectCreationProblemSharingDisabled

Theories for Root Cause

Theory 1: Missing Object Store Awareness in Permission Logic

The upload permission setup doesn't consider what object store will be used. When a private object store is the target, the dataset permissions should automatically be set to private.

Evidence:

  • __new_history_upload() applies history default permissions without checking object store
  • No coordination between object store selection and permission assignment

Theory 2: Object Store Selection Happens Too Late

The object store is selected during job.enqueue() which happens after the dataset permissions are already set during the upload action.

Evidence:

  • Dataset created with permissions in upload_common.py
  • Object store assignment happens in _set_object_store_ids() called from enqueue()
  • By the time object store is known, permissions are already committed

Theory 3: Check Logic Should Be Inverted

The current logic requires shareable storage if dataset is NOT private. Perhaps for upload jobs specifically, the logic should be:

  • If object store is private, automatically make dataset permissions private
  • OR: For upload jobs, don't require shareable storage since the output hasn't been shared yet

Evidence:

  • The comment in requires_shareable_storage() says: "An easy optimization would be to calculate this in galaxy.tools.actions when the job is created"
  • This suggests awareness that the current timing/location of the check is suboptimal

Proposed Investigation Areas

  1. Should upload tool automatically set private permissions when targeting private object store?

    • Would require knowing object store at dataset creation time
    • May need new API/mechanism for this
  2. Should requires_shareable_storage() have different logic for upload vs other jobs?

    • Upload outputs haven't been shared yet, so they don't "require" sharing
    • Could check if the dataset has actually been shared rather than just permission state
  3. Should user/history preferred_object_store_id influence default permissions?

    • If user's preferred store is private, should their default permissions be private?
    • This could be a configuration option

Key Files for Fixing

  1. /lib/galaxy/tools/actions/upload_common.py - Dataset permission setup
  2. /lib/galaxy/jobs/__init__.py - _set_object_store_ids_full() and requires_shareable_storage() call
  3. /lib/galaxy/model/__init__.py - requires_shareable_storage() logic
  4. /lib/galaxy/objectstore/__init__.py - set_dataset_object_store_id() check logic

Issue #21536 Git History Analysis: Upload Does Not Respect Sharing

Issue Summary

  • Issue: #21536
  • Title: "25.0 upload does not respect sharing"
  • Reporter: bernt-matthias
  • Galaxy Version: 25.0
  • Created: 2026-01-05
  • State: Open

Key Finding: Not a Regression

After thorough git history analysis, this is NOT a regression in 25.0. The behavior described in the issue has been present since the private object store feature was introduced in Galaxy 23.1. The existing test case TestPrivateCannotWritePublicDataObjectStoreIntegration explicitly tests for and expects this "failure" behavior.

Timeline of Relevant Changes

February 7, 2023 - Initial Private Object Store Implementation

Commit: e2530b46a99 - "private objectstores & dataset.sharable" Author: John Chilton PR: #14073 - "Empower Users to Select Storage Destination"

This commit introduced:

  1. private attribute on object stores
  2. requires_shareable_storage() method on Job model
  3. ObjectCreationProblemSharingDisabled exception
  4. The check in set_dataset_object_store_id() that raises the exception
  5. Integration tests including TestPrivateCannotWritePublicDataObjectStoreIntegration

Key code added to lib/galaxy/jobs/__init__.py:

require_shareable = job.requires_shareable_storage(self.app.security_agent)
# ...
object_store_populator.set_object_store_id(dataset, require_shareable=require_shareable)

Key code added to lib/galaxy/objectstore/__init__.py:

if concrete_store.private and require_shareable:
    raise ObjectCreationProblemSharingDisabled()

February 9, 2023 - Full Object Store Selection Implementation

Commit: 46b6d85c675 - "implement preferred object store id" Author: John Chilton

This commit added _set_object_store_ids_full() which is the code path mentioned in the issue traceback. It integrated the requires_shareable_storage check into the full object store selection logic.

March 19, 2024 - Improved Error Messages

Commit: 6cd4f68c364 - "Improved error messages for private object stores." Author: John Chilton

Added better error messages for ObjectCreationProblemSharingDisabled, making the error clearer to users.

July 29, 2024 - Type Annotations

Commit: bc250177c35 - "Fixes for errors reported by mypy 1.11.0 in BaseObjectStore" Author: Nicola Soranzo

Added type annotations to set_object_store_id() and set_dataset_object_store_id(). No functional changes.

Release Timeline

Version Date Contains Feature
v23.1 Sep 25, 2023 Yes - Feature introduced
v24.0 Apr 2, 2024 Yes
v24.2.0 - Yes
v25.0.0 Jun 18, 2025 Yes

Code Authors

Author Contributions
John Chilton Original implementation of private object stores, requires_shareable_storage, full object store selection logic
Nicola Soranzo Type annotations

Analysis

Why This Behavior Exists

The private object store feature was intentionally designed to prevent jobs from creating "public" (non-private) datasets in private object stores. The logic is:

  1. When a job is enqueued, requires_shareable_storage() checks if any output dataset is NOT private to a single user
  2. If new_user_dataset_access_role_default_private = False (Galaxy default), uploaded datasets are "public" by default
  3. When the job tries to create the dataset in a private object store, it fails because public data shouldn't be stored in private storage

The Test Case Proves Intent

The test TestPrivateCannotWritePublicDataObjectStoreIntegration in test/integration/objectstore/test_private_handling.py was added alongside the feature and explicitly expects this to fail:

class TestPrivateCannotWritePublicDataObjectStoreIntegration(BaseObjectStoreIntegrationTestCase):
    @classmethod
    def handle_galaxy_config_kwds(cls, config):
        config["new_user_dataset_access_role_default_private"] = False  # <-- Public by default
        cls._configure_object_store(PRIVATE_OBJECT_STORE_CONFIG_TEMPLATE, config)  # <-- Private store

    def test_both_types(self):
        # ...
        assert final_state == "error"  # <-- Expected to fail

PR #14073 Documentation

From the PR description, the author explicitly noted:

"Integration test case to ensure cannot upload public datasets to a private objectstore."

And included a TODO item:

"Test Case: Ensure a private objectstore isn't selected as user default if user default for new histories is public."

Conclusions

  1. Not a Regression: This behavior has existed since Galaxy 23.1 when private object stores were introduced.

  2. By Design (Partially): The test case shows this was known behavior, but the user experience is poor. The author noted potential improvements in the code comments.

  3. Configuration Mismatch: The issue occurs when:

    • User has new_user_dataset_access_role_default_private = False (default)
    • User's preferred object store is private
    • This combination should probably be prevented or handled more gracefully
  4. Potential Fixes:

    • Make upload jobs automatically set private permissions when targeting private object store
    • Prevent users from selecting a private object store as preferred when their data defaults to public
    • Change requires_shareable_storage() logic for upload jobs specifically

Related Links

  • Issue: galaxyproject/galaxy#21536
  • Original PR: galaxyproject/galaxy#14073
  • Test File: test/integration/objectstore/test_private_handling.py
  • Key Files:
    • lib/galaxy/jobs/__init__.py (lines ~1850-1925)
    • lib/galaxy/objectstore/__init__.py (lines ~2095-2116)
    • lib/galaxy/model/__init__.py (lines ~2215-2226)

Issue #21536 Plan Assessment

1. Plan Comparison Table

Criteria Plan 1: Auto-Private Permissions Plan 2: Special Upload Handling Plan 3: Defer Shareability
Complexity Medium Low High
Lines of Code ~50-60 (3 new helper functions + modify 1 function) ~5-10 (modify 1 function) ~80-100 (modify 5+ files, add schema field)
Files Modified 1 (upload_common.py) 1 (jobs/__init__.py) 5+ (objectstore/__init__.py, jobs/__init__.py, model/__init__.py, managers/histories.py, schema/schema.py)
Test Modifications Modify 1 test class Modify 1 test class Modify 1 test class + add 4+ new tests
Risk Level Low-Medium Low Medium-High
UX Impact Silent permission change, dataset becomes private No permission change, sharing blocked later No permission change, sharing blocked later

2. Plan Quality Assessment

Plan 1: Auto-Private Permissions

Completeness: 7/10

  • Addresses the immediate upload failure issue
  • Does NOT handle job destination object_store_id overrides (happens after dataset creation)
  • Missing library upload handling
  • May not handle __DATA_FETCH__ tool explicitly

Correctness: 8/10

  • Technically sound approach - preemptively sets correct permissions
  • Mirrors the object store resolution logic from _set_object_store_ids_full
  • Risk: Object store resolution could diverge from what job actually uses

Test Coverage: 6/10

  • Proposes modifying existing test
  • Missing tests for:
    • Edge case when all stores are private
    • Library uploads
    • __DATA_FETCH__ tool
    • Job destination override scenario

Edge Cases Handled:

  • History preferred object store
  • User preferred object store
  • All-private object stores scenario
  • Anonymous users (noted but not solved)
  • Library uploads
  • Job destination overrides

Unresolved Questions Impact: Medium

  • Question about warning users is UX concern, not blocking
  • Library uploads and job destination override are significant gaps

Plan 2: Special Upload Handling

Completeness: 8/10

  • Directly addresses the root cause for upload jobs
  • Handles both upload1 and __DATA_FETCH__
  • Relies on existing ensure_shareable() safeguards for share-time checks

Correctness: 9/10

  • Simple, targeted fix with minimal blast radius
  • Leverages existing safeguards already in codebase
  • Clear justification: upload data hasn't been shared yet

Test Coverage: 8/10

  • Proposes comprehensive test modification
  • Includes test for upload success AND sharing failure
  • Provides specific assertions for error codes/messages

Edge Cases Handled:

  • upload1 tool
  • __DATA_FETCH__ tool
  • Post-upload sharing attempts blocked
  • Library uploads (noted as question)
  • Other data source tools (question raised)

Unresolved Questions Impact: Low-Medium

  • Question about other tool IDs is valid but likely minor
  • Library uploads noted as consideration

Plan 3: Defer Shareability Check

Completeness: 9/10

  • Most comprehensive solution
  • Adds UI feedback via ShareHistoryExtra.non_shareable field
  • Updates all relevant share paths

Correctness: 7/10

  • Sound in principle but invasive
  • Deprecating requires_shareable_storage is significant
  • Schema changes have API implications
  • May break existing integrations relying on early failure

Test Coverage: 7/10

  • Proposes 4+ new tests covering various share scenarios
  • However, test descriptions are high-level without implementation detail
  • Missing tests for deprecation warnings

Edge Cases Handled:

  • History sharing with private datasets
  • Publishing blocked
  • Share with users blocked
  • Library copying blocked
  • Workflows with private outputs
  • Pages embedding private datasets (noted as failing at access time)

Unresolved Questions Impact: High

  • UI indication question affects user experience
  • share_option=no_changes bypass is significant
  • Dataset movement between stores is out of scope but relevant
  • Need to audit all set_object_store_id callers

3. Probability Assessment

Plan Fix Probability Reasoning
Plan 1 65% Will fix common upload cases. Risk of divergence between permission-time and job-time object store resolution. May not handle all paths (job destination override, library uploads).
Plan 2 90% Directly targets the issue with minimal code change. Relies on well-tested existing safeguards. Clear scope limited to upload tools. Small surface area = less room for bugs.
Plan 3 75% Will fix the issue but introduces new complexity. Higher risk of regressions due to touching 5+ files. Deprecation may cause downstream issues. Most thorough but most risky.

Detailed Reasoning

Plan 1 (65%):

  • The object store resolution in upload_common.py may not perfectly match the actual resolution in _set_object_store_ids_full due to job destination parameters only available later
  • The job runner's put() method can override object store selection via object_store_id in destination params
  • Two independent implementations of "which object store will be used" creates risk of divergence

Plan 2 (90%):

  • Single point of change with clear condition (tool_id in ("upload1", "__DATA_FETCH__"))
  • The existing test already verifies the exact failure path
  • Existing ensure_shareable() calls provide verified safety net
  • Minimal code change = minimal risk

Plan 3 (75%):

  • Comprehensive but touches critical paths
  • Schema changes require API version considerations
  • Deprecating a method used in job processing is risky
  • More moving parts = more potential for bugs

4. Recommendation

Primary Recommendation: Implement Plan 2 First

Rationale:

  1. Lowest risk, highest probability of success - 5-10 lines of code vs 50-100
  2. Targeted fix - Only affects upload tools, doesn't change permission system
  3. Leverages existing safeguards - ensure_shareable() already tested and working
  4. Fastest to implement and verify - Can be done in <1 hour
  5. Reversible - Easy to revert if issues found

Secondary Consideration: Plan 1 as Enhancement

After Plan 2 is validated, Plan 1 could be implemented as a UX enhancement:

  • Auto-setting private permissions provides cleaner UX (user doesn't see "cannot share" errors)
  • Could be added as optional behavior controlled by config
  • Not required for correctness, just improves experience

Plan 3: Not Recommended for This Issue

Plan 3 is architecturally interesting but:

  • Overkill for the specific bug report
  • High risk of regressions
  • Should be considered as separate refactoring effort if desired

Suggested Combined Approach

  1. Phase 1: Implement Plan 2 (immediate fix)
  2. Phase 2: Consider Plan 1 as UX enhancement (optional, config-controlled)
  3. Phase 3: If broader architectural changes desired, evaluate Plan 3 as separate effort

5. Critical Questions

Must Answer Before Implementation

  1. Are there other upload-like tools besides upload1 and __DATA_FETCH__?

    • Search codebase for tool_type="data_source" or similar
    • Impact: May need to expand the tool ID list in Plan 2
  2. Does the existing test TestPrivateCannotWritePublicDataObjectStoreIntegration reflect intentional policy or was it testing the status quo?

    • If intentional policy, changing behavior requires stakeholder approval
    • The git history suggests it was documenting current behavior, not asserting correctness
  3. Should we consult with @jmchilton (original author) before changing this behavior?

    • Original PR #14073 introduced this behavior with specific tests
    • May have context on why early failure was chosen over late failure

Risks Requiring Mitigation

Risk Mitigation
Library uploads may have different requirements Search for library upload paths and verify ensure_shareable() coverage
Users accustomed to early failure Document behavior change in release notes
Other callers of requires_shareable_storage Search codebase for callers before deprecating (Plan 3 only)
Workflows creating data in private storage Verify workflow output handling with private stores

Pre-Implementation Checklist

  • Search for all tools with tool_type="data_source"
  • Verify ensure_shareable() is called in all share paths
  • Review library upload permission flow
  • Check for external callers of requires_shareable_storage in plugins/tools
  • Confirm test behavior with maintainers if possible

Summary

Recommended Action: Implement Plan 2 (Special Upload Handling) as the immediate fix. It has the highest probability of success (90%), lowest complexity, and smallest blast radius. The fix directly addresses the bug by recognizing that upload jobs create new data that hasn't been shared yet, making the preemptive shareability check unnecessary. Existing safeguards already prevent sharing from private storage at share-time.

Issue #21536 Implementation Plan: Auto-Private Permissions for Private Object Stores

Overview

When uploading files to Galaxy, the system currently fails if the user's default permissions are "public" but their preferred object store is private (non-shareable). This plan implements Theory 1: Missing Object Store Awareness - automatically setting dataset permissions to private when the target object store requires it. The fix modifies the upload dataset creation flow (upload_common.py) to check whether the resolved object store is private and, if so, override the default permissions to ensure the dataset is private. This enables uploads to succeed without requiring users to manually change their default permission settings.

Code Changes Required

1. lib/galaxy/tools/actions/upload_common.py

Function: __new_history_upload()

Change description:

  1. Determine the effective object store that will be used for this upload
  2. Check if that object store is private
  3. If private, override permissions to make the dataset private to the user

Modified code:

def __new_history_upload(trans, uploaded_dataset, history=None, state=None):
    if not history:
        history = trans.history
    hda = HistoryDatasetAssociation(
        name=uploaded_dataset.name,
        extension=uploaded_dataset.file_type,
        dbkey=uploaded_dataset.dbkey,
        history=history,
        create_dataset=True,
        sa_session=trans.sa_session,
    )
    trans.sa_session.add(hda)
    if state:
        hda.state = state
    else:
        hda.state = hda.states.QUEUED
    history.add_dataset(hda, genome_build=uploaded_dataset.dbkey, quota=False)

    # Get default permissions
    permissions = trans.app.security_agent.history_get_default_permissions(history)

    # Check if target object store requires private datasets
    if _target_object_store_requires_private(trans, history):
        permissions = _make_permissions_private(trans, permissions)

    trans.app.security_agent.set_all_dataset_permissions(hda.dataset, permissions, new=True, flush=False)
    trans.sa_session.commit()
    return hda

New helper function: _resolve_effective_object_store_id()

def _resolve_effective_object_store_id(trans, history):
    """
    Resolve the effective object store ID that will be used for uploads.

    Priority order (matching _set_object_store_ids_full in jobs/__init__.py):
    1. History's preferred_object_store_id
    2. User's preferred_object_store_id
    3. Default object store (return None to let object store handle it)
    """
    if history and history.preferred_object_store_id:
        return history.preferred_object_store_id
    user = trans.user
    if user and user.preferred_object_store_id:
        return user.preferred_object_store_id
    return None

New helper function: _target_object_store_requires_private()

def _target_object_store_requires_private(trans, history):
    """
    Check if the target object store for this upload requires private datasets.
    """
    object_store = trans.app.object_store

    if not object_store.object_store_allows_id_selection():
        if hasattr(object_store, 'private'):
            return object_store.private
        return False

    object_store_id = _resolve_effective_object_store_id(trans, history)

    if object_store_id is None:
        private_stores = object_store.object_store_ids(private=True)
        non_private_stores = object_store.object_store_ids(private=False)
        if private_stores and not non_private_stores:
            return True
        return False

    concrete_store = object_store.get_concrete_store_by_object_store_id(object_store_id)
    if concrete_store:
        return concrete_store.private

    return False

New helper function: _make_permissions_private()

def _make_permissions_private(trans, permissions):
    """
    Modify permissions dict to make dataset private to the current user.
    """
    security_agent = trans.app.security_agent
    private_role = security_agent.get_private_user_role(trans.user, auto_create=True)

    private_permissions = dict(permissions)
    dataset_access_action = security_agent.permitted_actions.DATASET_ACCESS
    private_permissions[dataset_access_action] = [private_role]

    manage_action = security_agent.permitted_actions.DATASET_MANAGE_PERMISSIONS
    if manage_action not in private_permissions:
        private_permissions[manage_action] = [private_role]
    elif private_role not in private_permissions[manage_action]:
        private_permissions[manage_action] = list(private_permissions[manage_action]) + [private_role]

    return private_permissions

Implementation Steps

  1. Add helper functions to upload_common.py
  2. Modify __new_history_upload()
  3. Run existing tests - TestPrivateCannotWritePublicDataObjectStoreIntegration should now PASS
  4. Add new tests
  5. Manual testing

Testing Strategy

Existing Tests to Modify

File: test/integration/objectstore/test_private_handling.py

Class: TestPrivateCannotWritePublicDataObjectStoreIntegration

This test currently expects upload to FAIL. After fix, it should PASS and verify dataset is private.

Manual Testing Steps

  1. Setup test environment with private object store
  2. Configure user with public default permissions
  3. Test basic upload - verify upload succeeds, dataset is private
  4. Test fetch upload - verify same behavior
  5. Test with user/history preference

Risks and Considerations

Backward Compatibility

  1. Behavior change: Upload fails → Upload succeeds but dataset is private
  2. This is better UX - no data loss, user can still access their data
  3. Risk level: LOW

Edge Cases

  1. Anonymous users: Don't have private role - need to handle
  2. Library uploads: Different permission logic
  3. Multiple object stores (distributed): Non-deterministic if mixed private/public
  4. Job destination overrides: Happens after dataset creation

User Experience Implications

  1. Silent permission change: Users won't be warned
  2. May be confusing if they try to share later

Unresolved Questions

  1. Should we warn users when auto-setting private permissions?
  2. Should library uploads be handled the same way?
  3. What about the job destination object_store_id override case?
  4. Should we add a config option to control this behavior?
  5. Is the test TestPrivateCannotWritePublicDataObjectStoreIntegration intentional?

Issue #21536 Implementation Plan: Defer Shareability Check

Overview

This plan implements Theory 3: defer the shareability check from upload/object-store-assignment time to share time. Instead of failing uploads when a private object store is used with public default permissions, we allow any dataset to be stored in private storage but prevent sharing operations (making public, privately sharing with users, enabling link access, publishing) on datasets stored in non-shareable object stores. This is a more permissive approach that fixes the bug while maintaining the integrity guarantee that datasets in private storage cannot be accessed by unauthorized users.

Code Changes Required

1. Remove require_shareable Check at Object Store Assignment

File: /lib/galaxy/objectstore/__init__.py Function: ObjectStorePopulator.set_dataset_object_store_id() (Lines 2105-2116)

New Code:

def set_dataset_object_store_id(self, dataset: "Dataset") -> None:
    dataset.object_store_id = self.object_store_id
    try:
        self.object_store.create(dataset)
    except ObjectInvalid:
        raise ObjectCreationProblemStoreFull()
    self.object_store_id = dataset.object_store_id

Also update set_object_store_id() (Lines 2102-2103) to remove require_shareable parameter.

2. Remove requires_shareable_storage Call from Job Wrapper

File: /lib/galaxy/jobs/__init__.py Function: _set_object_store_ids_full() (Lines ~1900-1916)

Remove the require_shareable calculation and parameter passing.

3. Deprecate requires_shareable_storage Method

File: /lib/galaxy/model/__init__.py Function: Job.requires_shareable_storage() (Lines 2215-2226)

Add deprecation warning and have it always return False.

4. Add Shareability Check to History Sharing

File: /lib/galaxy/managers/histories.py Function: HistoryManager.get_sharing_extra_information() (Lines 422-476)

Add check for datasets in non-shareable storage before the user permissions loop.

5. Update ShareHistoryExtra Schema

File: /lib/galaxy/schema/schema.py

Add new field:

non_shareable: List[HDABasicInfo] = Field(
    default_factory=list,
    title="Non-shareable datasets",
    description="Datasets stored in private storage that cannot be shared",
)

6. Add Shareability Check to make_members_public

File: /lib/galaxy/managers/histories.py Function: HistoryManager.make_members_public() (Lines 486-499)

Add first-pass check for non-shareable datasets before attempting to make them public.

7. Existing Share-Time Checks (Already in Place)

These methods already call dataset.ensure_shareable():

  • make_dataset_public() in /lib/galaxy/model/security.py (line 1196)
  • privately_share_dataset() in /lib/galaxy/model/security.py (line 973)
  • set_dataset_permission() in /lib/galaxy/model/security.py (lines 930-931)

Implementation Steps

  1. Update schema with non_shareable field
  2. Modify ObjectStorePopulator (remove shareability check)
  3. Update Job Wrapper (remove requires_shareable_storage call)
  4. Deprecate requires_shareable_storage method
  5. Update History Manager sharing methods
  6. Update exception type in ensure_shareable (optional)
  7. Update/add tests
  8. Run full test suite
  9. Manual testing

Testing Strategy

Existing Tests to Modify

File: /test/integration/objectstore/test_private_handling.py

  • TestPrivateCannotWritePublicDataObjectStoreIntegration: Change to expect upload success but sharing failure

New Tests to Add

  1. History sharing blocked when containing private datasets
  2. History publishing blocked when containing private datasets
  3. Share with users blocked when containing private datasets
  4. Copying dataset to library blocked from private storage

Risks and Considerations

Backward Compatibility

  1. API behavior change: Uploads succeed where they previously failed
  2. Error location change: Errors at share-time instead of upload-time
  3. requires_shareable_storage method deprecation

Already Shared Datasets

Not possible - existing ensure_shareable() checks already prevent this.

Implications for Published Histories, Workflows, Pages

  1. Publishing calls make_members_public() which checks shareability
  2. Workflow outputs can be in private storage but cannot be shared
  3. Pages embedding private datasets will fail at access time

Unresolved Questions

  1. Should we add UI indication that a dataset is non-shareable?
  2. Should share_option=no_changes bypass the non-shareable check?
  3. Should we add a mechanism to move datasets between object stores?
  4. Should error messages guide users to solutions?
  5. Need to search for other callers of set_object_store_id with require_shareable param

Plan: Fix Issue #21536 - Special Handling for Upload Jobs

Overview

Upload jobs should not require shareable storage for newly created datasets. The current logic checks job.requires_shareable_storage() which returns True when dataset permissions are "public" (no private access role). But for upload jobs, the dataset hasn't been shared yet - it simply has default permissions that could allow sharing later. The fix inverts the check logic for upload jobs: allow non-shareable storage since the permission system already prevents sharing from private storage via dataset.ensure_shareable() calls throughout Galaxy. This approach treats upload jobs as creating inherently "new" data where shareability requirements should be deferred until actual sharing is attempted.

Code Changes Required

1. File: lib/galaxy/jobs/__init__.py

Function: _set_object_store_ids_full (lines 1845-1924)

Current Code (line 1904):

require_shareable = job.requires_shareable_storage(self.app.security_agent)

Change Description: For upload jobs (and __DATA_FETCH__), override require_shareable to False since the data is new and hasn't been shared. The existing pattern in Galaxy uses tool_id in ("upload1", "__DATA_FETCH__") for upload detection (see line 2236).

New Code:

# Upload jobs create new data that hasn't been shared yet - don't require
# shareable storage. The permission system will prevent sharing from private
# storage later via dataset.ensure_shareable() checks.
if job.tool_id in ("upload1", "__DATA_FETCH__"):
    require_shareable = False
else:
    require_shareable = job.requires_shareable_storage(self.app.security_agent)

Implementation Steps

  1. Add upload job detection in _set_object_store_ids_full:

    • Edit lib/galaxy/jobs/__init__.py
    • Before line 1904, add check for upload tool IDs
    • Skip requires_shareable_storage() call for upload jobs
  2. Update test expectations:

    • Edit test/integration/objectstore/test_private_handling.py
    • Change TestPrivateCannotWritePublicDataObjectStoreIntegration to expect success
    • Add verification that upload works but subsequent sharing fails
  3. Add new test cases:

    • Add test verifying upload to private storage succeeds with public default permissions
    • Add test verifying uploaded dataset cannot be shared afterward
    • Add test for __DATA_FETCH__ tool if not covered
  4. Manual testing:

    • Configure Galaxy with private object store
    • Set new_user_dataset_access_role_default_private = False
    • Upload a file - should succeed
    • Try to make the uploaded dataset public - should fail

Testing Strategy

Existing Tests to Modify

File: test/integration/objectstore/test_private_handling.py

Class: TestPrivateCannotWritePublicDataObjectStoreIntegration

Current Expectation: Upload fails with error New Expectation: Upload succeeds, sharing fails

Modified Test:

class TestPrivateCannotWritePublicDataObjectStoreIntegration(BaseObjectStoreIntegrationTestCase):
    @classmethod
    def handle_galaxy_config_kwds(cls, config):
        config["new_user_dataset_access_role_default_private"] = False
        cls._configure_object_store(PRIVATE_OBJECT_STORE_CONFIG_TEMPLATE, config)

    def test_upload_succeeds_but_sharing_blocked(self):
        """Test that upload works but sharing is blocked for private storage."""
        with self.dataset_populator.test_history() as history_id:
            # Upload should succeed now
            hda = self.dataset_populator.new_dataset(
                history_id, content=TEST_INPUT_FILES_CONTENT, wait=True
            )
            content = self.dataset_populator.get_history_dataset_content(history_id, hda["id"])
            assert content.startswith(TEST_INPUT_FILES_CONTENT)

            # But making it public should fail since it's in private storage
            response = self.dataset_populator.make_dataset_public_raw(history_id, hda["id"])
            api_asserts.assert_status_code_is(response, 400)
            api_asserts.assert_error_code_is(response, 400008)
            api_asserts.assert_error_message_contains(response, "Attempting to share a non-shareable dataset.")

Manual Testing Steps

  1. Setup:

    • Configure object_store_conf.xml with private="true"
    • Set new_user_dataset_access_role_default_private: false in galaxy.yml
    • Restart Galaxy
  2. Test Upload:

    • Create new user or use existing user with public defaults
    • Upload any file via web UI
    • Verify upload completes successfully (dataset goes to "ok" state)
  3. Test Sharing Block:

    • Go to dataset permissions
    • Try to "Remove restrictions" / make public
    • Verify error message about non-shareable dataset

Risks and Considerations

Backward Compatibility

Risk: Users relying on current behavior (upload failing) may have workarounds that break.

Mitigation: This is a bug fix - the current behavior is incorrect. Users should not have to change new_user_dataset_access_role_default_private just to upload files.

Impact: Low - the change makes Galaxy work as expected; no API changes.

Edge Cases

1. Library uploads:

  • Check if library upload (library_folder_id on job) needs same handling
  • Library datasets may have different permission semantics

2. Data fetch via collection/URL:

  • __DATA_FETCH__ tool creates data similarly to upload
  • Should have same exemption
  • Already handled in proposed fix

3. Post-job-action changes permissions:

  • Some workflows may change permissions after job completes
  • Low risk: These actions run after object store assignment

What Happens When User Tries to Share Dataset from Private Storage Later?

Existing Safeguards (already in Galaxy):

  1. Dataset.ensure_shareable() - Called by:

    • make_dataset_public() in lib/galaxy/model/security.py:1196
    • set_dataset_permission() when setting DATASET_ACCESS in lib/galaxy/model/security.py:931
    • privately_share_dataset() in lib/galaxy/model/security.py:973
  2. set_all_dataset_permissions() - Checks shareability for non-new datasets

  3. API level - make_dataset_public_raw already tested in existing integration test

Conclusion: Multiple layers of protection exist. User will get clear error message when attempting to share.

Security Considerations

No security regression: The fix does not bypass security. It changes when the shareability check occurs:

  • Before: At job enqueue time (preemptive, based on permission state)
  • After: At sharing attempt time (reactive, based on actual user action)

The dataset remains in private storage with enforced access restrictions.

Unresolved Questions

  1. Should this apply to all tools that create "new" data, or only upload tools?

    • Current approach: Only upload1 and __DATA_FETCH__
    • Alternative: Check if job has no input datasets (pure data creation)
  2. Are there other tool IDs that should be included?

    • upload1 - main upload tool
    • __DATA_FETCH__ - fetch data tool
    • Any others? Check for tools with tool_type="data_source"

Issue #21536 Triage Summary: Upload Does Not Respect Sharing

Top-Line Summary

Issue: When a user with a non-shareable scratch object store uploads data while having default public permissions (new_user_dataset_access_role_default_private = False), uploads fail with ObjectCreationProblemSharingDisabled. This is not a regression - the behavior has existed since Galaxy 23.1 when private object stores were introduced (PR #14073). However, it represents a design gap where the system preemptively blocks uploads based on permission state rather than actual sharing status.

Most Probable Fix: Modify _set_object_store_ids_full() in lib/galaxy/jobs/__init__.py to skip the require_shareable check for upload tools (upload1, __DATA_FETCH__). This is a ~5-10 line change with 90% probability of success. Existing ensure_shareable() safeguards already prevent sharing from private storage at share-time.

Root Cause: Upload jobs create new data that hasn't been shared yet. The current logic treats "could potentially be shared" (based on permission state) the same as "has been shared", causing the preemptive failure.

Discussion Questions

  1. Was the current behavior intentional policy? The test TestPrivateCannotWritePublicDataObjectStoreIntegration expects failure, but it may document status quo rather than assert correctness. Should we consult @jmchilton (original author) before changing?

  2. Should we prevent the problematic configuration? Should Galaxy warn/prevent users from setting a private object store as preferred when their default permissions are public?

  3. Are there other data-source tools? Beyond upload1 and __DATA_FETCH__, should other tools with tool_type="data_source" receive the same treatment?

  4. Library uploads? Do library uploads to private storage need separate handling?

Effort & Difficulty Assessment

Aspect Assessment
Effort to Fix Low - ~5-10 LOC, single file change
Difficulty to Recreate Easy - Configure private object store + public default permissions
Difficulty to Test Easy - Existing integration test covers this exact scenario
Risk of Regression Low - Minimal code change, existing safeguards verified

Artifacts Generated

  • ISSUE_21536_CODE_RESEARCH.md - Detailed code path analysis
  • ISSUE_21536_HISTORY.md - Git history showing this isn't a regression
  • ISSUE_21536_PLAN_AUTO_PRIVATE_PERMISSIONS.md - Plan 1: Auto-set private permissions
  • ISSUE_21536_PLAN_SPECIAL_UPLOAD_HANDLING.md - Plan 2: Special upload handling (recommended)
  • ISSUE_21536_PLAN_DEFER_SHAREABILITY.md - Plan 3: Defer shareability check
  • ISSUE_21536_PLAN_ASSESSMENT.md - Comparative analysis of all plans

Recommended Implementation

Phase 1 (Immediate): Implement Plan 2 - Special Upload Handling

# In lib/galaxy/jobs/__init__.py, _set_object_store_ids_full():
if job.tool_id in ("upload1", "__DATA_FETCH__"):
    require_shareable = False
else:
    require_shareable = job.requires_shareable_storage(self.app.security_agent)

Phase 2 (Optional): Consider Plan 1 as UX enhancement to auto-set private permissions, preventing the "cannot share" error message users would see when trying to share later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment