You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have a 25.1 test Galaxy instance set up and as an anon user I have a quota of 100 MB.
However there is seemingly no limit on how many things I can fetch from remote repositories. At the moment my history has 3 GB and I can request more datasets to be fetched without any issues.
Looking at the logs the fetching jobs do not seem to be ever paused, so this is likely different code path than galaxyproject/galaxy#20637 (also does not go through pulsar)
Issue #21642: Code Research - Remote Repository Data Fetching Not Respecting Storage Quota
Problem Statement
Galaxy 25.1 test instance with an anonymous user having 100 MB quota allows fetching unlimited data (3 GB observed) from remote repositories. Fetching jobs are never paused. The code path differs from issue #20637 and doesn't involve Pulsar.
Key Code Locations
File Structure:
/lib/galaxy/tools/data_fetch.py (Lines 51-73) - Main do_fetch() function
/lib/galaxy/tools/execute.py (Lines 206-235) - Tool execution with Celery decision
The Problem: When is_fetch_with_celery_enabled() returns True (Galaxy 25.1 default), the __DATA_FETCH__ tool execution bypasses the traditional JobWrapper.enqueue() method that performs quota checks. Instead, it executes as a Celery task chain that completely skips quota validation.
Theory 1: Missing Quota Check in Celery Tasks (MOST LIKELY - 95% probability)
In /lib/galaxy/celery/tasks.py, the setup_fetch_data() function (line 245) sets object store IDs but never calls _pause_job_if_over_quota(). Similarly, the fetch_data() function (line 334) transitions the job directly to RUNNING state without any quota verification.
The quota check method exists in /lib/galaxy/jobs/__init__.py lines 1804-1809 but is only called from JobWrapper.enqueue() which is bypassed entirely for Celery-based fetch jobs.
Theory 2: Session Management Issue in Setup Task (60% probability)
The setup_fetch_data() is a setup callback that returns values to the main task. Even if quota check were added here, changes to job state might not persist properly due to database session lifecycle in async contexts. The actual quota check needs to occur in the main fetch_data() task with proper session management.
Theory 3: Quota Tracking Before Job Execution (30% probability)
The quota agent's is_over_quota() method only checks current disk usage. If output datasets created during job setup don't have their sizes reflected in the quota calculation, quota checks might incorrectly show available space. Multiple concurrent fetch jobs could each pass checks individually.
Critical Code Sections
Normal Quota Check (JobWrapper.enqueue, lines 1800-1809):
def_pause_job_if_over_quota(self, job):
quota_source_map=self.app.object_store.get_quota_source_map()
ifself.app.quota_agent.is_over_quota(quota_source_map, job):
log.info("(%d) User (%s) is over quota: job paused", job.id, job.user_id)
self.pause(job, message)
Celery Setup Task (Missing Check, line 245):
defsetup_fetch_data(...):
mini_job_wrapper._set_object_store_ids(job) # Object store assigned# *** MISSING: _pause_job_if_over_quota() call here ***returnmini_job_wrapper.working_directory, ...
Celery Main Task (No Check Before Execution, lines 334-335):
The root cause is that Celery-based data fetch jobs bypass the traditional job enqueue path where quota checks are performed. The fix should add quota checking to the Celery task chain, preferably in setup_fetch_data() or at the start of fetch_data() before changing the job state to RUNNING.
April 1, 2022 - Initial Celery Fetch Implementation
Commit:aa67d9dd387Author: mvdbeek
Created setup_fetch_data and fetch_data Celery tasks
No quota checks were included in the new path
This was the initial implementation of Celery-based data fetching
March 10, 2023 - Celery Fetch Made Configurable
Commit:f35e8f8288eAuthor: John Davis
PR: #15767
Target: release_23.0
Added is_fetch_with_celery_enabled() function
Allowed disabling Celery fetch as a workaround, but not a fix
Made Celery fetch the default behavior
February 10, 2025 - Quota Check Added to Job Enqueue
Commit:ecaa747104aAuthor: davelopez
Added _pause_job_if_over_quota() to MinimalJobWrapper.enqueue()
CRITICAL ISSUE: This only fixes the traditional path, NOT the Celery path
Celery tasks still bypass enqueue() entirely
The Core Problem
The Celery path in execute() at line 206 of /lib/galaxy/tools/execute.py completely bypasses the job enqueue mechanism:
# Celery Path (NO quota check)setup_fetch_data.s() |fetch_data.s() |set_job_metadata.s() |finish_job.si()
# Traditional Path (WITH quota check as of Feb 10, 2025)tool.app.job_manager.enqueue(job2, tool=tool)
Key Authors
Author
Contribution
mvdbeek
Original Celery fetch implementation, currently assigned to issue
John Davis
Made Celery fetch configurable (PR #15767)
davelopez
Added quota check to MinimalJobWrapper.enqueue() (Feb 2025)
Regression Assessment
Type: REGRESSION
Introduced: Galaxy 23.0 (March 2023) when Celery fetch became default
Duration: ~2 years undetected (Mar 2023 → Feb 2025)
Severity: High (quota is a security/fairness control)
Root Cause Summary
The Celery implementation didn't replicate all job enqueue-time checks. When the Celery path was introduced, it was designed for performance but inadvertently bypassed the quota enforcement that exists in the traditional job handling path.
The recent fix by davelopez (Feb 2025) added quota checking to MinimalJobWrapper.enqueue(), but this doesn't help the Celery path because Celery tasks never call enqueue() - they directly execute the fetch operations.
Key Files
File
Line
Description
/lib/galaxy/tools/execute.py
206
Decides which path (Celery vs traditional) to use
/lib/galaxy/celery/tasks.py
227-335
Celery tasks with no quota checks
/lib/galaxy/jobs/__init__.py
1612
New quota check (doesn't help Celery path)
/lib/galaxy/config/__init__.py
1428
is_fetch_with_celery_enabled() function
Related PRs and Issues
PR #15767 - Made Celery fetch configurable (March 2023)
Issue #20637 - Similar quota issue with different code path (mentioned in original issue)
Tests: /test/integration/test_quota.py - add Celery data_fetch specific tests
Summary
Issue #21642 represents a critical vulnerability in Galaxy's quota enforcement system that allows storage exhaustion on public instances. The problem is architectural—Celery-based data fetch jobs bypass the traditional job enqueue path where quota checks are performed. This requires a hotfix that enforces quotas in the Celery task chain. All public Galaxy instances with quotas enabled are at risk of disk space exhaustion.
GitHub Issue #21642: Remote repository data fetching not respecting storage quota in Galaxy 25.1 when Celery-based fetch is enabled (default).
Root Cause Analysis
Confirmed Root Cause (95% confidence):
When is_fetch_with_celery_enabled() returns True (Galaxy 25.1 default), the __DATA_FETCH__ tool execution completely bypasses the traditional JobWrapper.enqueue() method where quota checks are performed.
Step 1: Add Quota Check Method to MinimalJobWrapper
File:/lib/galaxy/jobs/__init__.py
Add a new method check_and_pause_if_over_quota() to MinimalJobWrapper class (around line 1560, after the pause() method):
defcheck_and_pause_if_over_quota(self, job=None) ->bool:
"""Check if user is over quota and pause job if so. Returns True if job was paused due to quota, False otherwise. """ifjobisNone:
job=self.get_job()
# Get quota source map from object storequota_source_map=self.app.object_store.get_quota_source_map()
# Check quota using the quota agentifself.app.quota_agent.is_over_quota(quota_source_map, job):
log.info("(%d) User (%s) is over quota: job paused", job.id, job.user_id)
message="Execution of this dataset's job is paused because you were over your disk quota at the time it was ready to run"self.pause(job, message)
returnTruereturnFalse
Step 2: Modify setup_fetch_data() to Check Quota
File:/lib/galaxy/celery/tasks.py
Modify setup_fetch_data() function (starting at line 227) to check quota after setting object store IDs:
@galaxy_task(bind=True)defsetup_fetch_data(
self,
job_id: int,
raw_tool_source: str,
app: MinimalManagerApp,
sa_session: galaxy_scoped_session,
task_user_id: Optional[int] =None,
):
tool=cached_create_tool_from_representation(app=app, raw_tool_source=raw_tool_source)
job=sa_session.get(Job, job_id)
assertjobjob.handler=self.request.hostnamejob.job_runner_name="celery"mini_job_wrapper=MinimalJobWrapper(job=job, app=app, tool=tool)
mini_job_wrapper.change_state(model.Job.states.QUEUED, flush=False, job=job)
mini_job_wrapper._set_object_store_ids(job)
# NEW: Check quota after object store is assignedifmini_job_wrapper.check_and_pause_if_over_quota(job):
sa_session.commit()
# Return None to signal the task chain should not continuereturnNone# ... rest of the function unchanged
Step 3: Modify fetch_data() to Handle Paused Jobs
File:/lib/galaxy/celery/tasks.py
Modify fetch_data() function (starting at line 323) to handle None return from setup_fetch_data():
@galaxy_task(action="Run fetch_data")deffetch_data(
setup_return,
job_id: int,
app: MinimalManagerApp,
sa_session: galaxy_scoped_session,
task_user_id: Optional[int] =None,
) ->str:
# NEW: If setup_return is None, job was paused due to quotaifsetup_returnisNone:
log.info("(%d) Fetch job was paused (likely due to quota), skipping execution", job_id)
returnNonejob=sa_session.get(Job, job_id)
assertjob# NEW: Double-check job state - don't proceed if pausedifjob.state==model.Job.states.PAUSED:
log.info("(%d) Job is paused, skipping fetch execution", job_id)
returnNone# ... rest of the function unchanged
Step 4: Update finish_job() to Handle Paused Jobs
File:/lib/galaxy/celery/tasks.py
Modify finish_job() function to handle paused jobs:
@galaxy_taskdeffinish_job(
job_id: int,
raw_tool_source: str,
app: MinimalManagerApp,
sa_session: galaxy_scoped_session,
task_user_id: Optional[int] =None,
):
tool=cached_create_tool_from_representation(app=app, raw_tool_source=raw_tool_source)
job=sa_session.get(Job, job_id)
assertjob# NEW: Don't finish if job is paused (quota exceeded)ifjob.state==model.Job.states.PAUSED:
log.info("(%d) Job is paused, skipping finish", job_id)
return# ... rest of the function unchanged
Step 5: Update set_job_metadata() to Handle Paused Jobs
File:/lib/galaxy/celery/tasks.py
Modify set_job_metadata() function to handle None input:
@galaxy_task(action="set metadata for job")defset_job_metadata(
tool_job_working_directory,
extended_metadata_collection: bool,
job_id: int,
sa_session: galaxy_scoped_session,
task_user_id: Optional[int] =None,
) ->None:
# NEW: If working directory is None, job was pausediftool_job_working_directoryisNone:
log.info("(%d) Job metadata skipped - job was paused", job_id)
returnNone# ... rest of the function unchanged
Test Strategy
Unit Tests
File:/test/unit/celery/test_fetch_data_quota.py (new file)
Issue: Remote repository data fetching does not respect storage quota in Galaxy 25.1+
Root Cause: When Celery-based data fetch is enabled (default since Galaxy 23.0), the __DATA_FETCH__ tool execution bypasses the traditional JobWrapper.enqueue() method where quota checks are performed. The Celery task chain (setup_fetch_data → fetch_data → set_job_metadata → finish_job) directly executes without calling _pause_job_if_over_quota(). This is a regression introduced in Galaxy 23.0 (March 2023, PR #15767) when Celery fetch became the default, and has been undetected for approximately 2 years.
Most Probable Fix: Add quota checking to the setup_fetch_data() Celery task in /lib/galaxy/celery/tasks.py by calling a new check_and_pause_if_over_quota() method on MinimalJobWrapper after object store IDs are set. All downstream tasks in the chain must be updated to handle paused jobs gracefully.
Importance Assessment Summary
Criterion
Assessment
Severity
CRITICAL - Enables complete bypass of quota enforcement
Blast Radius
HIGH - Affects all public-facing Galaxy instances with quotas enabled
Workaround
PAINFUL - No practical workaround without disabling core functionality
Regression Status
REGRESSION - Introduced in Galaxy 23.0 (March 2023)
Priority Recommendation
HOTFIX - Should be backported to all supported releases
Discussion Questions
Concurrent fetch jobs: Multiple simultaneous fetch requests could each pass quota checks before any complete. Should we implement a quota reservation mechanism to prevent race conditions?
Unknown file sizes: For remote URL fetches, the final file size isn't known until download completes. Should we implement:
A Content-Length based pre-check?
Mid-stream cancellation if quota exceeded during download?
Backporting scope: The fix should be backported to 25.1, but should it also go to 25.0 and earlier supported releases?
Testing coverage: The existing quota tests don't cover the Celery fetch path. What integration test scenarios should be prioritized?
Anonymous user impact: Anonymous users with quotas appear to be the most affected. Are there specific configurations or use cases we should test?
Effort Estimate
Aspect
Assessment
Implementation Effort
Medium - 5 files to modify, well-scoped changes
Testing Complexity
Medium - Requires Celery worker setup for integration tests
Reproduction Difficulty
Easy - Set up quota, enable Celery fetch (default), fetch data
Risk Level
Low - Changes are additive, existing code paths unchanged