Skip to content

Instantly share code, notes, and snippets.

@dannon
Created February 24, 2026 13:22
Show Gist options
  • Select an option

  • Save dannon/d0b66df5e0ac984d69f30f9ab8579259 to your computer and use it in GitHub Desktop.

Select an option

Save dannon/d0b66df5e0ac984d69f30f9ab8579259 to your computer and use it in GitHub Desktop.
Triage: galaxyproject/galaxy #21904 - dbkey filtering of dynamic options does not work properly

Issue #21904: dbkey filtering of dynamic options does not work properly

  • State: OPEN
  • Author: bernt-matthias
  • Labels: none
  • Comments: 0
  • Projects: Galaxy Dev - weeklies (Triage/Discuss)
  • Galaxy Version: 25.1

Description

A dynamic select of fasta_indexes that is filtered by dbkey:

<options from_data_table="fasta_indexes">
    <filter type="data_meta" ref="input" key="dbkey" column="dbkey"/>
</options>

is supposed to present all choices if the user gives no dbkey in the referred input.

Apparently this does not work at the moment in the UI (galaxyproject/tools-iuc#7718) but it works in tool tests (galaxyproject/tools-iuc#7720).

The odd thing is if:

  • upload without dbkey set -> no selection shows
  • specify dbkey -> one choice shown
  • unset dbkey -> all choices shown

Also strange is that when editing the dbkey the UI shows unspecified is shown 2x.

Steps to Reproduce

  1. Load samtools stats
  2. Upload a sam/bam file
  3. Select "Use a built-in genome" in reference conditional
  4. Observe that there is no selection possible
  5. Set/unset dbkey of the sam file and see difference

Expected Behavior

All choices are shown if there is no dbkey set.

Context

The tool affected is samtools_stats, which uses a macro optional_reference from IUC tools. The select param is inside a conditional (addref_cond) under the when value="cached" case, while the referenced input data param is at the tool's top level.

Issue #21904 - Code Research: dbkey filtering of dynamic options

Architecture Overview

The bug involves the interaction between three systems:

  1. Dynamic options (lib/galaxy/tools/parameters/dynamic_options.py) - Loads options from data tables and applies filters
  2. Tool form population (lib/galaxy/tools/parameters/populate_model.py) - Builds the JSON model consumed by the client
  3. Client form rendering (client/src/components/Form/Elements/FormSelect.vue) - Renders the select with options or "No options available"

Key Code Paths

DataMetaFilter.filter_options (dynamic_options.py:188-256)

This filter handles the <filter type="data_meta" ref="input" key="dbkey" column="dbkey"/> pattern:

  1. Gets referenced dataset via _get_ref_data(other_values, self.ref_name) (line 220)
  2. Checks if metadata key is set via r.metadata.element_is_set(self.key) (line 235)
  3. If dbkey is "?" (the no_value), element_is_set returns False, meta_value stays empty
  4. When meta_value is empty: returns ALL options via copy.deepcopy(options) (line 241-242)
  5. When meta_value has a real dbkey: filters to only matching options (line 244-249)

Error handling:

  • KeyError (ref not found in other_values) -> returns [] (line 221-223)
  • ValueError (ref is not a valid dataset type) -> returns [] (line 224-226)

_get_ref_data (dynamic_options.py:1053-1086)

Resolves the dataset reference from other_values:

  • If the ref value is None -> raises ValueError (it's not an HDA/HDCA/etc.)
  • If the ref value is a runtime value -> returns []
  • If the ref value is an HDA -> wraps in list and returns

element_is_set (model/metadata.py:180-199)

Checks if metadata value differs from the spec's no_value:

  • For dbkey: no_value = "?" (defined in datatypes/data.py:225)
  • Returns False when dbkey is "?" -> triggers "return all options" behavior
  • Returns True when dbkey is set to a real genome build

populate_model (populate_model.py:12-72)

Builds the tool form JSON. For each non-grouped parameter:

try:
    initial_value = input.get_initial_value(request_context, other_values)
    tool_dict = input.to_dict(request_context, other_values=other_values)
    ...
except ImplicitConversionRequired:
    tool_dict = input.to_dict(request_context)
    tool_dict["textable"] = True
except Exception:
    tool_dict = input.to_dict(request_context)  # <-- NO other_values!
    log.exception("tools::to_json() - Skipping parameter expansion '%s'", input.name)

Critical: The generic except Exception handler calls to_dict WITHOUT other_values. This means:

  • SelectToolParameter.to_dict(trans, other_values=None) -> other_values = {} or {} -> empty dict
  • _get_ref_data({}, 'input') raises KeyError (ref not found) -> filter returns []
  • Options are empty!

Client-side select rendering (FormSelect.vue:229, 263-265)

<Multiselect v-if="hasOptions" .../>
<slot v-else name="no-options">
    <b-alert variant="warning" show>No options available.</b-alert>
</slot>

If options.length === 0, the "No options available" alert is shown instead of the dropdown.

Tool Structure (samtools_stats)

The affected select parameter is inside a conditional (non-default case):

<inputs>
    <param name="input" type="data" format="sam,bam,cram" />  <!-- top-level -->
    ...
    <conditional name="addref_cond">
        <param name="addref_select" type="select">
            <option value="no">No</option>           <!-- DEFAULT -->
            <option value="history">Use a genome/index from the history</option>
            <option value="cached">Use a built-in genome</option>
        </param>
        <when value="no"/>
        <when value="history">...</when>
        <when value="cached">
            <param name="ref" type="select">
                <options from_data_table="fasta_indexes">
                    <filter type="data_meta" ref="input" key="dbkey" column="dbkey"/>
                </options>
                <validator type="no_options" message="No reference genome is available..."/>
            </param>
        </when>
    </conditional>
</inputs>

On initial load, the default case is "no", so the "cached" case is inactive during populate_model.

Theories

Theory 1 (MOST PROBABLE): Exception in populate_model causes silent fallback to empty options

When populate_model processes the inactive "cached" conditional case, it creates current_state = {} (empty dict since it's not the active case). Inside the recursive call, other_values = ExpressionContext({}, parent_context). The parent context should still have the input HDA via the ExpressionContext chain.

However, if get_initial_value or to_dict throws ANY exception (e.g., the no_options validator raising an error during option generation, a database session issue, or a type mismatch), the generic except Exception handler fires:

tool_dict = input.to_dict(request_context)  # without other_values!

This results in empty options because the data_meta filter can't find the referenced dataset.

The no_options validator on the select would cause validation to fail on the options when they're empty, but it runs inside get_initial_value via get_options, which catches ImplicitConversionRequired but not general exceptions like those from validators.

Theory 2: Conditional inactive case has stale/missing state for referenced params

In populate_model, for inactive conditional cases, current_state = {}. The other_values chain should resolve input from the parent context, but there might be an edge case where the ExpressionContext lookup doesn't propagate correctly for some parameter configurations.

The fact that set/unset dbkey "fixes" it suggests the rebuild POST after the explicit action sends a different/more complete form state that avoids the error path.

Theory 3: The initial GET request doesn't include dataset reference in incoming params

On the initial GET /api/tools/{tool_id}/build, incoming is essentially empty (just query string params). The populate_state function uses get_initial_value for the data param, which selects the most recent matching HDA. However, if the from_json call in check_param processes the value differently on initial GET vs POST rebuild, the state could contain different types of values.

On GET: state['input'] = HDA object (from get_initial_value) On POST: state['input'] = HDA object (resolved from {values: [{id, src}]})

Both should be HDA objects, so this theory is less likely unless there's a subtle type difference.

Relevant Recent Commits

  • 0f72943993d "Skip data_meta filter in run form" - Added short-circuit for workflow run form (USE_HISTORY mode). Does NOT apply to regular tool form.
  • eae78408fbd "Fix dynamic filter option access when building command line" - Made trans parameter optional in filter_options to fix crash.
  • These changes are specifically for workflow contexts and shouldn't affect the regular tool form.

Key Files

  • /lib/galaxy/tools/parameters/dynamic_options.py - DataMetaFilter class (lines 146-256), _get_ref_data function (lines 1053-1086)
  • /lib/galaxy/tools/parameters/populate_model.py - populate_model function, especially exception handling (lines 54-68)
  • /lib/galaxy/tools/parameters/basic.py - SelectToolParameter.get_options (line 1000), get_initial_value (line 1154)
  • /lib/galaxy/model/metadata.py - MetadataCollection.element_is_set (line 180)
  • /lib/galaxy/datatypes/data.py - dbkey MetadataElement definition (line 218-225)
  • /client/src/components/Form/Elements/FormSelect.vue - Client rendering of options
  • /test/functional/tools/dbkey_filter_input.xml - Existing test tool for this pattern

Issue #21904 - Importance Assessment

Severity: Medium-High (Functional Breakage)

This is a functional breakage that prevents users from using built-in/cached reference genomes for tools that use the data_meta filter pattern with dbkey. It does NOT cause data loss, crashes, or security issues, but it renders a core workflow impossible through the UI.

Blast Radius: Broad - Affects All Users of Affected Tools

  • Scope: Any tool using the <filter type="data_meta" ref="..." key="dbkey" column="dbkey"/> pattern with from_data_table inside a conditional. This is the standard IUC macro pattern used in the optional_reference and mandatory_reference macros.
  • Affected tools include: samtools_stats, samtools_markdup, samtools_calmd, and many other samtools/bioinformatics tools that use cached genome references. This is a VERY common pattern in the Galaxy tool ecosystem.
  • Confirmed on: Main, AU, EU Galaxy servers (per IUC issue #7718)
  • Environments affected: All browsers, all operating systems, Galaxy 25.1

Workaround Existence: Acceptable but Inconvenient

Two workarounds exist:

  1. Set then unset the dbkey on the input dataset - This clears the state and allows all options to appear. Works reliably but requires the user to know about it.
  2. Use "genome from history" instead of "built-in genome" - Upload/provide the reference FASTA directly. Works but defeats the purpose of having cached genomes.
  3. Set the dbkey explicitly on the input dataset to match the desired genome - This shows the correct filtered option.

Regression Status: Likely New in 25.x

  • Tool tests pass (the backend logic is correct in the test execution path)
  • The issue is specific to the UI form building path
  • Multiple recent commits have modified related code paths in dynamic_options.py including 0f72943993d (Skip data_meta filter in run form) and eae78408fbd (Fix dynamic filter option access when building command line)
  • The issue was first reported against Galaxy 25.1 and confirmed on major public servers
  • No prior reports of this behavior suggest it's a regression, not a long-standing issue

User Impact Signals

  • IUC Issue #7718: Opened by a different user (shiltemann), confirmed on Main/AU/EU servers
  • GTN Impact: A Galaxy Training Network issue was filed (training-material#6658) because users following tutorials can't complete exercises
  • Galaxy Issue #21904: Filed by bernt-matthias (IUC maintainer), indicating this is blocking tool maintenance work
  • No comments yet on the Galaxy issue, but the GTN connection means this is affecting many new users following tutorials

Recommendation: Next Release / High Priority Fix

  • NOT a hotfix - There's a viable workaround (set/unset dbkey) and no data loss risk
  • Should be fixed for next release - This affects a very common tool pattern, impacts training materials, and makes Galaxy look broken for new users
  • Priority: High within the normal release cycle. The GTN training impact elevates the urgency since it affects user onboarding
  • Fix should include a regression test that validates the tool form JSON response for a data_meta filtered select when the input dataset has unset dbkey

Issue #21904 - Fix Plan

Root Cause Analysis

Based on code research, the most probable root cause is in populate_model.py's exception handling. When the select parameter with data_meta filter is inside a non-active conditional case, any exception during get_initial_value() or to_dict() triggers the generic except Exception handler which calls to_dict(request_context) WITHOUT other_values. This causes the data_meta filter to not find the referenced dataset, returning empty options.

The secondary contributing factor may be in how the no_options validator interacts with the dynamic options generation. When options are generated for the inactive conditional case but the validator fires, it could raise an exception that gets caught by the generic handler.

Debugging Strategy (First Step)

Before implementing a fix, the exact exception needs to be identified:

  1. Add logging: In populate_model.py line 66-68, the log.exception() call should already log the exception. Check Galaxy server logs for the message "tools::to_json() - Skipping parameter expansion 'ref'" when loading samtools_stats.

  2. Reproduce locally: Load the dbkey_filter_input test tool form via the API and inspect the response JSON:

    curl "http://localhost:8080/api/tools/dbkey_filter_input/build?key=API_KEY"

    Check if the options array for the select parameter is empty.

  3. Add temporary debug logging in DataMetaFilter.filter_options to log the ref resolution path:

    log.debug(f"DataMetaFilter: ref_name={self.ref_name}, key={self.key}, ref_type={type(ref)}")

Fix Plan

Approach A: Fix the silent exception handling in populate_model (Preferred)

File: lib/galaxy/tools/parameters/populate_model.py

The generic except Exception handler at lines 66-68 silently swallows exceptions and falls back to a degraded to_dict call without other_values. This means any exception during option generation results in empty options with no user-visible error.

Proposed fix: When the generic exception handler fires, preserve other_values in the fallback to_dict call:

except Exception:
    tool_dict = input.to_dict(request_context, other_values=other_values)
    log.exception("tools::to_json() - Skipping parameter expansion '%s'", input.name)

This is the safest minimal fix. Even if get_initial_value failed, to_dict should still be able to generate options with the correct other_values.

If the exception occurs IN to_dict itself (not in get_initial_value), then passing other_values again might cause the same exception. In that case, a two-step fallback:

except Exception:
    log.exception("tools::to_json() - Skipping parameter expansion '%s'", input.name)
    try:
        tool_dict = input.to_dict(request_context, other_values=other_values)
    except Exception:
        tool_dict = input.to_dict(request_context)

Approach B: Improve DataMetaFilter error handling

File: lib/galaxy/tools/parameters/dynamic_options.py

Currently, when _get_ref_data raises KeyError or ValueError, the filter returns [] (empty list). This is overly aggressive -- when the reference can't be found, the behavior should be the same as when the metadata is unset: return ALL options.

Proposed fix: Change the error handling in DataMetaFilter.filter_options:

try:
    ref = _get_ref_data(other_values, self.ref_name)
except KeyError:
    log.warning(f"could not filter by metadata: {self.ref_name} unknown")
    return copy.deepcopy(list(options))  # Return all options instead of empty
except ValueError:
    log.warning(f"could not filter by metadata: {self.ref_name} not a data or collection parameter")
    return copy.deepcopy(list(options))  # Return all options instead of empty

This is semantically correct: if we can't determine the metadata value to filter by, we should show all options (same as when metadata is unset). This matches the existing behavior at line 241-242 where len(meta_value) == 0 returns all options.

Caution: This changes long-standing behavior. Some tools might rely on the empty-return behavior when the reference is missing. However, returning all options is the documented expected behavior ("present all choices if the user gives no dbkey").

Approach C: Combined fix (Recommended)

Apply both fixes:

  1. Fix populate_model.py exception handler to preserve other_values
  2. Fix DataMetaFilter.filter_options to return all options when ref is unavailable

This provides defense in depth: even if one fix doesn't address the exact scenario, the other will catch it.

Affected Files

  1. lib/galaxy/tools/parameters/populate_model.py (lines 54-68)
  2. lib/galaxy/tools/parameters/dynamic_options.py (lines 219-226)

Testing Strategy

Unit Tests

  1. Test DataMetaFilter with missing ref: Verify that when other_values doesn't contain the referenced dataset, filter_options returns all options (not empty).

  2. Test DataMetaFilter with unset dbkey: Verify that when the referenced dataset has dbkey="?", filter_options returns all options.

  3. Test populate_model exception handling: Verify that when get_initial_value throws, the fallback to_dict still gets other_values.

Integration Tests

  1. Test tool form build with dbkey_filter_input test tool:

    • GET /api/tools/dbkey_filter_input/build with a dataset that has dbkey="?"
    • Verify the select options are NOT empty
    • Existing test at test/functional/tools/dbkey_filter_input.xml test 2 already covers the execution path but NOT the form build path
  2. Test tool form rebuild after conditional switch:

    • Build a tool form for a tool with data_meta filter inside a conditional
    • Switch the conditional to the filtered case
    • Verify options are populated

Selenium/Browser Tests

  1. Test the user-facing workflow described in the issue:
    • Upload a BAM file without dbkey
    • Open samtools stats tool form
    • Switch to "Use a built-in genome"
    • Verify genome options are shown

Migration Considerations

None. This is a behavior fix with no database or configuration changes.

Backwards Compatibility

  • Approach A (populate_model): No backwards compatibility concerns. The exception handler is a fallback that should ideally never fire.
  • Approach B (DataMetaFilter): Minor behavior change -- tools that previously got empty options when the ref was missing will now get all options. This is arguably the correct behavior and matches documentation, but could theoretically change behavior for tools that rely on the empty state.

Effort Estimate

  • Debugging to confirm root cause: 1-2 hours (add logging, reproduce, identify exception)
  • Implementing fix: 1-2 hours (depending on approach)
  • Writing tests: 2-3 hours (unit + integration + selenium)
  • Total: 4-7 hours (approximately 1 day)

Related Issues/PRs

Issue #21904 - Triage Summary

Top-Line Summary

When a dataset with unset dbkey (dbkey="?") is used as input to a tool that has a data_meta filtered select (e.g., samtools_stats with a cached reference genome), the UI incorrectly shows "No options available" instead of listing all available options. The backend logic for handling unset dbkey is correct (and tool tests pass), so the bug is in the tool form building path. The most probable root cause is in populate_model.py's exception handling: when the select parameter is inside a non-active conditional case, any exception during option generation triggers a fallback that discards the other_values context, causing the data_meta filter to lose its dataset reference and return empty options. A secondary contributing factor is that DataMetaFilter.filter_options returns an empty list when the referenced dataset can't be found, whereas it should return all options (matching the behavior when metadata is simply unset). The recommended fix addresses both issues: (1) preserve other_values in the exception fallback, and (2) return all options when the dataset reference is unavailable.

Importance Assessment Summary

Attribute Assessment
Severity Medium-High (functional breakage)
Blast Radius Broad - affects all tools using the IUC optional_reference/mandatory_reference macro pattern (samtools suite, BWA, Bowtie2, etc.)
Regression Status Likely new in 25.x based on timing and recent related commits
Workaround Acceptable - set then unset dbkey on input dataset, or use genome from history
User Impact High - confirmed on Main/AU/EU servers, blocking GTN tutorials
Priority Recommendation Fix for next release, high priority within normal cycle

Questions for Group Discussion

  1. Reproduction: Can someone reproduce this with the built-in dbkey_filter_input test tool, or only with IUC tools? If the test tool works, the issue might be specific to the conditional nesting pattern used in optional_reference.

  2. Exception logging: Has anyone checked Galaxy server logs for "tools::to_json() - Skipping parameter expansion" messages when loading affected tools? This would confirm Theory 1 (exception handler fallback).

  3. Scope of Approach B: Changing DataMetaFilter.filter_options to return all options (instead of []) when the reference dataset is unavailable -- does anyone know of tools that intentionally rely on the empty-return behavior?

  4. Validator interaction: The no_options validator on the select fires when options are empty. Could this validator be causing a secondary exception during get_initial_value that triggers the populate_model fallback? Should we check if the validator interacts poorly with the option generation for inactive conditional cases?

  5. Duplicate dbkey entry: The reporter notes "unspecified is shown 2x" when editing dbkey. Is this a separate client-side UI issue or related to the same root cause?

Effort Estimate and Difficulty

Aspect Estimate
Total effort ~1 day (4-7 hours)
Debugging/confirmation 1-2 hours
Implementation 1-2 hours
Testing 2-3 hours
Difficulty Medium - the code paths are well-understood but the exact exception triggering the fallback needs to be identified through debugging
Recreating Easy on any Galaxy instance with cached genomes - just load samtools stats with a BAM file that has unset dbkey
Testing Medium - needs both API-level test (form build JSON validation) and ideally a Selenium test for the full user workflow
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment