jmchilton/CWL_CONDITIONALS_PICK_VALUE_COMPARE.md

## CWL_CONDITIONALS_PICK_VALUE_COMPARE.md

      
    Raw
  

              CWL_CONDITIONALS_PICK_VALUE_COMPARE.md
            
          
    pickValue Implementation: Approach Comparison

Executive Summaries

Approach A — Synthetic Tool Steps: During CWL import, inject Galaxy's bundled pick_value expression tool as a synthetic workflow step for each workflow output that uses pickValue. The tool already handles first_non_null (via first_or_error) and the_only_non_null (via only). Parser-only change — no model migrations, no runtime changes. Does NOT cover all_non_null (tool can't return arrays/collections). Estimated 1-2 files touched in parser.py.
Approach B — Native Framework Support: Add a pick_value column to the WorkflowOutput model, create duplicate-label WorkflowOutput objects across source steps, then post-process pickValue semantics in run.py after all steps complete. Covers all three modes including all_non_null. Requires DB migration, model changes, parser changes, import changes, runtime changes, and export changes. Estimated 5-7 files touched.
Pros/Cons

Approach A: Synthetic Tool Steps


Pros
Cons


Parser-only change, no model/runtime/migration
all_non_null not supported (tool returns scalar)


Reuses battle-tested pick_value tool
Synthetic steps visible in workflow editor


should_fail tests handled by tool's error modes
No round-trip CWL re-export fidelity


Low risk — no changes to execution engine
File[] output via all_non_null impossible


Small diff, fast to implement
Scatter+conditional pattern not addressed


Tool already tested in Galaxy workflow suite
cond-with-defaults (linkMerge+pickValue) unclear


Approach B: Native Framework Support


Pros
Cons


All 3 pickValue modes supported
DB migration required


Clean semantic model — pickValue is first-class
Null detection (skipped vs empty) is hard


Benefits Galaxy-native workflows long-term
Duplicate-label WorkflowOutputs may confuse editor


Scatter+conditional pattern addressable
5-7 files, medium-large change


Correct CWL export round-trip possible
all_non_null returning list vs HDCA is unresolved


No synthetic steps polluting workflow graph
Higher regression risk across workflow subsystem


Coverage Analysis (29 RED Tests)

By pickValue mode


Mode
Tests
Pattern
Tool (A)
Framework (B)


first_non_null
8
multi-source
YES
YES


pass_through_required_{false,true}_when x2 (+nojs)

multi-source
YES
YES


first_non_null_{first,second}_non_null x2 (+nojs)

multi-source
YES
YES


the_only_non_null
4
multi-source
YES
YES


pass_through_required_the_only_non_null (+nojs)

multi-source
YES
YES


the_only_non_null_single_true (+nojs)

multi-source
YES
YES


all_non_null
6
multi-source
NO
YES


all_non_null_{all_null,one,multi}_non_null x3 (+nojs)

multi-source
NO
YES


scatter+conditional
7
scatter
NO
PARTIAL


condifional_scatter_on_nonscattered_{false,true_nojs} x3

scatter+pickValue
NO
YES (Phase 6)


scatter_on_scattered_conditional (+nojs)

scatter+pickValue
NO
YES (Phase 6)


conditionals_nested_cross_scatter (+nojs)

nested scatter
NO
MAYBE


conditionals_multi_scatter (+nojs)

hybrid multi+scatter
NO
MAYBE


Complex
2
multi+linkMerge
NO
PARTIAL


cond-with-defaults-{1,2}

linkMerge+pickValue+File[]
NO
PARTIAL


Summary


Tool (A)
Framework (B)


Covered
12/29 (41%)
18-24/29 (62-83%)


first_non_null + the_only_non_null
12/12
12/12


all_non_null (multi-source)
0/6
6/6


scatter+conditional
0/9
4-9/9 (phases)


cond-with-defaults
0/2
0-2/2 (depends on linkMerge)


Implementation Effort


Dimension
Tool (A)
Framework (B)


Files touched
1 (parser.py)
5-7 (model, migration, parser, import, run, export)


Lines of code
~100-150
~300-500


DB migration
No
Yes


Runtime changes
No
Yes (run.py post-processing)


Regression risk
Low (parser only)
Medium-High (execution path)


Time estimate
1-2 days
5-8 days


Reviewability
Easy — self-contained
Harder — cross-cutting


Hardest sub-problem
Type mapping CWL->param_type
Null detection (skipped vs empty)


Recommendation

Pursue hybrid: Tool (A) first, Framework (B) later.
Rationale:

Goal is CWL conformance, not Galaxy UX. Synthetic tool steps are invisible to CWL users — they never see the Galaxy workflow graph. 12 tests going green immediately is significant.
Tool approach is low-risk and fast. Parser-only change, no migration, testable in 1-2 days.
Framework approach has unsolved hard problems. Null detection, duplicate-label editor behavior, and all_non_null return type are each individually tricky. Stacking them makes the PR risky.
The 12 easiest tests are the same 12 for both approaches. No wasted work — the parser's get_outputs_for_label() skip logic for pickValue outputs (needed by A) is compatible with later adding framework support (B) for the remaining tests.
all_non_null and scatter patterns can wait. They're harder regardless of approach and may need the pick_value tool extended anyway (for expression/scalar types).

Phase 1 (this PR): Implement Approach A for first_non_null + the_only_non_null. Target: 12 tests RED->GREEN.
Phase 2 (future PR): Either extend pick_value tool with all_non_null mode (for string[] types) OR implement Framework support. Decision deferred until Phase 1 ships and scatter+conditional patterns are better understood.
Unresolved Questions


Is pick_value tool always available during CWL workflow import, or can it be missing from tool panel?
Does workflow import API accept tool_id: "pick_value" for synthetic steps, or must we use tool_uuid?
For should_fail tests currently GREEN because import crashes: after fixing import, will the pick_value tool's runtime error correctly satisfy should_fail?
For all_non_null returning string[]: can expression tools produce JSON arrays in expression.json, or is a collection required?
cond-with-defaults.cwl uses linkMerge: merge_flattened + pickValue: all_non_null + File[] output. This may be unreachable for both approaches without collection-producing expression tools.
cond-wf-009 pattern (single outputSource + scatter + pickValue: all_non_null) is a collection-filter, not a multi-source merge. Neither approach naturally handles this — it needs its own solution.
Should synthetic step labels use __cwl_pick_value_ prefix to hide from editor, and does Galaxy handle this convention?


## CWL_CONDITIONALS_PICK_VALUE_FRAMEWORK_SUPPORT.md

      
    Raw
  

              CWL_CONDITIONALS_PICK_VALUE_FRAMEWORK_SUPPORT.md
            
          
    pickValue: Native Framework Support Plan

Problem Summary

CWL v1.2 workflows use pickValue on workflow outputs (and step inputs) to merge multiple sources, selecting non-null values. Galaxy crashes when importing these workflows because parser.py:get_outputs_for_label() hardcodes multiple=False on outputSource, and no runtime logic exists to apply pickValue semantics when collecting workflow outputs.
27 CWL v1.2 conditional tests are RED because of this gap.
pickValue Patterns in CWL Tests

Two distinct patterns exist:
Pattern A: Multiple outputSource (most tests)
outputs:
  out1:
    type: string
    outputSource: [step1/out1, step2/out1]
    pickValue: first_non_null
Steps have when expressions; some produce null. The workflow output gathers from multiple steps and picks among them.
Pattern B: Single outputSource + scatter (cond-wf-009, 010, 011)
outputs:
  out1:
    type: string[]
    outputSource: step1/out1
    pickValue: all_non_null
A single step is scattered with when; some scatter elements produce null. pickValue filters nulls from the scatter result array.
Current Architecture

Workflow Model (model/init.py)

WorkflowOutput lives on a single WorkflowStep:
class WorkflowOutput(Base):
    workflow_step_id  # FK to workflow_step
    output_name       # which output of that step
    label             # the workflow output label
A workflow output is bound to one step and one output_name. There is no mechanism for a workflow output to reference outputs from multiple steps.
WorkflowStepConnection models data flow between steps — it connects an output step to a WorkflowStepInput on the consuming step. Multi-source connections (for step inputs) work via multiple WorkflowStepConnection rows pointing to the same WorkflowStepInput.
CWL Parser (tool_util/cwl/parser.py)

WorkflowProxy.get_outputs_for_label(label) iterates CWL workflow outputs, calls split_step_references(outputSource, multiple=False) which asserts a single reference. It returns a list of {"output_name": ..., "label": ...} dicts that get placed in the step dict's workflow_outputs list.
WorkflowProxy.to_dict() produces a Galaxy workflow dict where each step has a workflow_outputs key. The problem: a CWL workflow output with outputSource: [step1/out1, step2/out1] references TWO different steps, but Galaxy's dict format puts workflow_outputs inside each step dict — there's no place for a cross-step output.
Workflow Import (managers/workflows.py)

_workflow_from_raw_description() walks step dicts. For each step, if workflow_outputs exists, it creates WorkflowOutput model objects bound to that step (line ~1941-1964). There is no mechanism to create a workflow output that spans multiple steps.
Workflow Execution (workflow/run.py)

WorkflowProgress.set_step_outputs() iterates step.workflow_outputs and calls _record_workflow_output() for each. This records the output in the invocation via workflow_invocation.add_output(workflow_output, step, output).
get_replacement_workflow_output() looks up a workflow output by going to its step and finding the output by name in self.outputs[step.id].
Null/Skipped Outputs

When when_values == [False] (step skipped entirely), the tool still executes but produces "empty" datasets. These get hidden (output.visible = False). The outputs dict still has entries — they're just empty/hidden HDAs, not Python None.
For CWL, a skipped step should produce null for its outputs. Currently Galaxy represents this as an empty HDA, which is not the same thing. The WorkflowInvocationOutputValue table stores JSON values, so it can store None.
Proposed Approach

Strategy: Duplicate-Label WorkflowOutputs + Post-Processing

Rather than fundamentally restructuring WorkflowOutput to span multiple steps (which would be a massive model change touching export, import, editor, API, and every workflow feature), use a simpler approach:
Add pick_value metadata to WorkflowOutput and handle it during output collection in run.py.
The key insight: Galaxy's workflow model already supports a workflow output being on a specific step. For Pattern A (multiple outputSource), we need multiple WorkflowOutput objects with the same label on different steps. For Pattern B (scatter+pickValue), we need pickValue logic on a single WorkflowOutput.
Currently, the label uniqueness is not enforced at the DB level — it's just convention. And set_step_outputs() already iterates all WorkflowOutput objects per step. We can:

Create multiple WorkflowOutput objects with the same label on different steps
Add a pick_value column to WorkflowOutput
At the end of execution, post-process outputs with the same label using pickValue semantics

Alternative Considered: Direct Model Restructuring

Adding multi-source workflow outputs to the model would require:

New join table workflow_output_source (workflow_output_id, step_id, output_name, position)
Changes to WorkflowOutput to remove step FK or make it nullable
Changes to every export format (ga, format2, editor dict, instance dict)
Changes to the workflow editor UI (which we're explicitly not touching)
Changes to run.py output collection
Changes to the API schema for WorkflowOutput

This is a much larger change with much broader impact. The duplicate-label approach is more contained.
Detailed Plan: Native pick_value on WorkflowOutput

Phase 1: Model Changes

Add column to workflow_output table:
# In model/__init__.py, class WorkflowOutput:
pick_value: Mapped[Optional[str]] = mapped_column(String(64), nullable=True)
Valid values: None, "first_non_null", "the_only_non_null", "all_non_null".
Migration:
# New alembic migration
def upgrade():
    add_column("workflow_output", Column("pick_value", String(64), nullable=True))
Update copy():
def copy(self, copied_step):
    copied_output = WorkflowOutput(copied_step)
    copied_output.output_name = self.output_name
    copied_output.label = self.label
    copied_output.pick_value = self.pick_value
    return copied_output
Update _serialize():
def _serialize(self, id_encoder, serialization_options):
    d = dict_for(self, output_name=self.output_name, label=self.label)
    if self.pick_value:
        d["pick_value"] = self.pick_value
    return d
Phase 2: Parser Changes (parser.py)

Fix get_outputs_for_label() to handle multiple outputSource:
def get_outputs_for_label(self, label):
    outputs = []
    for output in self._workflow.tool["outputs"]:
        source = output["outputSource"]
        pick_value = output.get("pickValue")

        # Handle both single and list outputSource
        references = split_step_references(
            source,
            multiple=True,  # Changed from False
            workflow_id=self.cwl_id,
        )

        for step, output_name in references:
            if step == label:
                output_id = output["id"]
                if "#" not in self.cwl_id:
                    _, output_label = output_id.rsplit("#", 1)
                else:
                    _, output_label = output_id.rsplit("/", 1)

                out_dict = {
                    "output_name": output_name,
                    "label": output_label,
                }
                if pick_value:
                    out_dict["pick_value"] = pick_value
                outputs.append(out_dict)
    return outputs
This means if a CWL workflow output has outputSource: [step1/out1, step2/out1], then:

get_outputs_for_label("step1") returns [{"output_name": "out1", "label": "out1", "pick_value": "first_non_null"}]
get_outputs_for_label("step2") returns [{"output_name": "out1", "label": "out1", "pick_value": "first_non_null"}]

Both steps get a WorkflowOutput with the same label but pick_value set.
Also handle input steps: The CWL output in cond-wf-003.cwl references both a step output AND a workflow input (def). The cwl_input_to_galaxy_step() method already calls get_outputs_for_label(label), so input steps would also get WorkflowOutput objects if referenced in a multi-source outputSource. This already works.
Phase 3: Import Changes (managers/workflows.py)

Update __module_from_dict to read pick_value:
In the workflow_outputs loop (~line 1944-1964), add:
for workflow_output in workflow_outputs:
    if not isinstance(workflow_output, dict):
        workflow_output = {"output_name": workflow_output}
    output_name = workflow_output["output_name"]
    # ... existing validation ...
    uuid = workflow_output.get("uuid", None)
    label = workflow_output.get("label", None)
    m = step.create_or_update_workflow_output(
        output_name=output_name,
        uuid=uuid,
        label=label,
    )
    # NEW: set pick_value
    pick_value = workflow_output.get("pick_value", None)
    if pick_value:
        m.pick_value = pick_value
    if not dry_run:
        trans.sa_session.add(m)
Relax duplicate label check: Currently found_output_names checks for duplicate output_name within a step, not duplicate labels across steps. This should be fine — duplicate labels across steps is the whole point.
Phase 4: Execution Changes (workflow/run.py)

Add a post-processing step after all steps are scheduled.
Currently set_step_outputs() calls _record_workflow_output() for each WorkflowOutput on each step as it completes. For pickValue, we need to defer final output recording until all source steps have completed, then apply pickValue logic.
Option A: Post-process at invocation completion
After all steps are scheduled in WorkflowInvoker.invoke(), before setting state to SCHEDULED, iterate all workflow outputs with pick_value set and apply the logic:
# In WorkflowProgress or WorkflowInvoker, after all steps scheduled:
def apply_pick_value_outputs(self):
    """Post-process workflow outputs that have pick_value set."""
    # Group WorkflowOutput objects by label
    outputs_by_label = defaultdict(list)
    for step in self.workflow_invocation.workflow.steps:
        for wo in step.workflow_outputs:
            if wo.pick_value:
                outputs_by_label[wo.label].append(wo)

    for label, workflow_outputs in outputs_by_label.items():
        pick_value = workflow_outputs[0].pick_value
        values = []
        for wo in workflow_outputs:
            step_outputs = self.outputs.get(wo.workflow_step.id, {})
            output = step_outputs.get(wo.output_name)
            values.append(output)

        result = apply_pick_value(pick_value, values, label)
        # Record the final aggregated output
        # Use the first workflow_output as the "primary" record
        self.workflow_invocation.add_output(
            workflow_outputs[0], workflow_outputs[0].workflow_step, result
        )
The apply_pick_value function:
def apply_pick_value(pick_value, values, label):
    """Apply CWL pickValue semantics to a list of values."""

    def is_null(v):
        # A value is null if it's None, NO_REPLACEMENT,
        # or a hidden empty HDA from a skipped step
        if v is None or v is NO_REPLACEMENT:
            return True
        if isinstance(v, dict) and v.get("__class__") == "NoReplacement":
            return True
        # For HDA outputs from skipped steps:
        if hasattr(v, "dataset") and not v.producing_job_finished:
            # Skipped step - output is null
            return True
        return False

    non_null = [(i, v) for i, v in enumerate(values) if not is_null(v)]

    if pick_value == "first_non_null":
        if not non_null:
            raise FailWorkflowEvaluation(...)  # "All sources are null"
        return non_null[0][1]

    elif pick_value == "the_only_non_null":
        if len(non_null) != 1:
            raise FailWorkflowEvaluation(...)
        return non_null[0][1]

    elif pick_value == "all_non_null":
        return [v for _, v in non_null]  # Return as list/collection
Option B: Modify _record_workflow_output to defer pick_value outputs
In set_step_outputs(), when encountering a workflow output with pick_value set, don't record it immediately — instead, accumulate it in a pending_pick_value_outputs dict. Then at the end of scheduling, resolve them.
This is cleaner because it doesn't double-record outputs.
Phase 5: Null Detection

The hardest part is reliably detecting "this output is null because the step was skipped."
Currently when a CWL step has when=False:

The tool still executes (with __when_value__: False)
Galaxy creates output HDAs that are empty and hidden
These are not semantically "null" — they're empty datasets

For pickValue to work correctly, we need to distinguish:

"Step produced an empty dataset" (not null, just empty)
"Step was skipped, output is null" (should be null)

Proposed approach: Track skipped-step outputs explicitly.
In set_step_outputs(), when progress.when_values == [False]:
if progress.when_values == [False]:
    for output_name in outputs:
        self._null_outputs[(step.id, output_name)] = True
Then in apply_pick_value, check _null_outputs instead of trying to infer nullness from HDA state.
For Pattern B (scatter+pickValue on single source), the null detection is different — individual scatter elements may be null while others are not. The scatter produces a collection, and the collection elements with when=False are the null ones. Galaxy already has skipped state for collection elements (see migration c39f1de47a04_add_skipped_state_to_collection_job_), so this may already work for detecting null elements within a scatter result.
Phase 6: Pattern B — Scatter + pickValue

For all_non_null on a scattered output, the expected behavior is:

Scatter produces a list collection
Elements from skipped iterations are null
all_non_null filters out null elements, returning a smaller list

Galaxy represents scatter results as HistoryDatasetCollectionAssociation (list collections). The filtered result would be a new collection with only the non-null elements.
This requires:

After scatter execution, identify which collection elements came from skipped iterations
Create a new filtered collection excluding those elements
The filtered collection becomes the workflow output

This is more complex than Pattern A and may warrant a separate implementation phase.
Phase 7: Export Changes

Workflow export (ga format, format2) needs to serialize pick_value:
In _workflow_to_dict_export() (managers/workflows.py), the workflow_outputs serialization already includes output_name and label. Add pick_value:
# In the step dict construction for export
for workflow_output in step.unique_workflow_outputs:
    wo_dict = {
        "output_name": workflow_output.output_name,
        "label": workflow_output.label,
        "uuid": str(workflow_output.uuid),
    }
    if workflow_output.pick_value:
        wo_dict["pick_value"] = workflow_output.pick_value
Phase 8: should_fail Tests

Several CWL v1.2 tests expect workflow execution to FAIL:

first_non_null_all_null — all sources null, first_non_null should error
the_only_non_null_multi_true — multiple non-null, the_only_non_null should error
all_non_null_multi_with_non_array_output — all_non_null on non-array type should error

These currently pass because the import crashes (so the test "succeeds" as a should_fail). After fixing the import, the pickValue runtime logic must produce the correct errors for these to keep passing.
Benefit to Galaxy-Native Workflows

Galaxy-native workflows already support when expressions (added in 23.0). If pickValue were added to the runtime layer:


Galaxy workflows could express "take first available output" — e.g., two conditional branches where exactly one runs, merged into a single output via the_only_non_null. Currently Galaxy users must use a "pick value" tool or restructure their workflow.


all_non_null for filtered scatter results — Galaxy workflows with conditional scatter could produce filtered output collections.


The UI integration could come later — the runtime layer would work, and the Galaxy workflow editor could add pickValue configuration in a future release.


Format2 support — Galaxy's format2 workflow format could natively express pickValue on outputs, making conditional workflow patterns cleaner.


The infrastructure cost is low: one new column, one new post-processing function. The conceptual fit is good since Galaxy already has when, linkMerge/merge_type, and conditional step support.
Size Estimate


Component
Effort
Risk


Model: add column + migration
Small
Low


Parser: fix get_outputs_for_label
Small
Low


Import: read pick_value from dict
Small
Low


Execution: Pattern A (multi-source)
Medium
Medium


Execution: null detection
Medium
High


Execution: Pattern B (scatter filter)
Large
High


Export: serialize pick_value
Small
Low


should_fail test compatibility
Small
Low


Total: Medium-sized change. Pattern A (multi-source pickValue) is the primary blocker for most tests. Pattern B (scatter+pickValue) is more complex and could be a separate phase.
Implementation Order


Model + migration (pick_value column on workflow_output)
Parser fix (multiple=True in get_outputs_for_label, pass pickValue through)
Import fix (read pick_value from workflow dict)
Null tracking in execution (when_values==[False] -> mark outputs null)
pickValue post-processing in run.py (Pattern A: multi-source)
Export serialization
Pattern B: scatter + pickValue filtering (separate PR if needed)

Testing Plan


Red-to-green on CWL conformance tests: The 27 RED tests listed in CWL_CONDITIONALS_STATUS.md are the primary targets.
Start with Pattern A tests (cond-wf-003 through 007 variants): these are the simplest — two sources, one skipped.
Then Pattern B tests (cond-wf-009 through 013): scatter+conditional+pickValue.
Verify should_fail tests stay green after import no longer crashes.
Run Galaxy-native workflow tests to confirm no regressions (the new column and execution logic should be no-ops when pick_value is NULL).

Review Notes

Reviewed against CWL_CONDITIONALS_STATUS.md and source code.
Factual Corrections


Pattern B scope is wrong. Plan says Pattern B is "cond-wf-009, 010, 011" but cond-wf-011 (conditionals_nested_cross_scatter) retains null values in nested arrays — pickValue: all_non_null applies only at the outermost level. cond-wf-013 (conditionals_multi_scatter) is a Pattern A+B hybrid (multiple outputSource + scatter + linkMerge + pickValue). These are distinct patterns the plan doesn't distinguish.


Step input pickValue is deprioritized. Grep of all v1.2 conditional test workflows confirms pickValue appears ONLY on workflow outputs, never on step inputs. Zero conformance tests exercise it. Deprioritize the WorkflowStepInput.pick_value question.


condifional_scatter_on_nonscattered_false semantics. This test expects out1: [] when ALL scatter elements are skipped. The entire collection is null, not individual elements — different from Phase 6's "filtering null elements from a collection."


Missing Considerations


SubworkflowStepProxy when bug (from status doc). SubworkflowStepProxy.to_dict() does NOT extract when. Not a pickValue blocker but a related gap.


Editor duplicate-label warning. _workflow_to_dict_editor() tracks output_label_index across steps and flags duplicates as upgrade_message_dict["output_label_duplicate"]. CWL-imported workflows with pickValue will trigger this. May need to suppress when pick_value is set.


Import output_name uniqueness guard. workflows.py:1949-1952 raises ObjectAttributeInvalidException for duplicate output_name within a step. Not triggered for Pattern A (different steps), but an implicit constraint.


Approach Correctness


Double-recording risk with Option A. add_output() appends without checking for duplicate labels. If set_step_outputs() records per-step AND apply_pick_value_outputs() records aggregated, there'll be duplicates. Option B (defer recording) is strongly preferred.


all_non_null list result type. apply_pick_value returns a Python list. add_output() dispatches on history_content_type — a list has none, so it'd be WorkflowInvocationOutputValue (JSON blob). May work for CWL conformance but for Galaxy-native use should be HDCA.


linkMerge + pickValue composition order is answered. CWL spec: linkMerge applies first (merge/flatten), then pickValue filters nulls. Plan should incorporate this.


Unresolved Questions


How to detect "output is null from skipped step" vs "output is an empty dataset"? The when_values tracking is per-invocation, not per-output. Need reliable null marker.
For Pattern B, should the filtered collection be a new HDCA or should Galaxy support "sparse" collections with null elements?
Should pick_value on WorkflowOutput also support step inputs? CWL allows pickValue on step inputs too (not just workflow outputs). Galaxy's WorkflowStepInput already has merge_type — should we add pick_value there as well?
The duplicate-label WorkflowOutput approach — will the workflow editor handle two outputs with the same label gracefully, or will it need special-casing?
For the all_non_null mode returning a list: if the original output type is File but all_non_null returns File[], should this create a list collection? The CWL type system expects this, but Galaxy would need to dynamically produce a collection from scalar outputs.
Should the first_non_null/the_only_non_null failures produce CWL-spec-compliant error messages? The spec says specific error conditions for each mode.
cond-with-defaults.cwl uses both linkMerge: merge_flattened AND pickValue: all_non_null on the same output. How do these compose? Does pickValue operate before or after linkMerge?


## CWL_CONDITIONALS_PICK_VALUE_VIA_TOOL_PLAN.md

      
    Raw
  

              CWL_CONDITIONALS_PICK_VALUE_VIA_TOOL_PLAN.md
            
          
    CWL pickValue via Synthetic pick_value Tool Step

Problem

CWL v1.2 workflows can declare pickValue on workflow outputs with multiple outputSource entries. Galaxy crashes at parser.py:607 because get_outputs_for_label() hardcodes multiple=False, so split_step_references() asserts on multi-element lists. 27 CWL v1.2 conditional conformance tests are red because of this.
Approach: Synthetic Tool Insertion

During CWL workflow import (WorkflowProxy.to_dict()), when a workflow output has pickValue + multiple outputSource, inject a synthetic pick_value tool step that:

Receives connections from all the source steps
Applies the pickValue logic
Produces a single output that becomes the workflow output

This reuses Galaxy's existing pick_value expression tool rather than implementing new runtime semantics.
Research Findings

Galaxy's Bundled pick_value Tool

Location: tools/expression_tools/pick_value.xml (v0.1.0, bundled)
Also at: toolshed iuc/pick_value (v0.2.0, adds format_source)
Tool type: expression (ECMAScript 5.1, runs via Galaxy's expression engine, no container needed)
Parameters:

style_cond.pick_style — one of: first, first_or_default, first_or_error, only
style_cond.type_cond.param_type — one of: data, text, integer, float, boolean
style_cond.type_cond.pick_from — repeat of {value: <optional>} entries

JS logic summary:
for (var i = 0; i < pickFrom.length; i++) {
    if (pickFrom[i].value !== null) {
        if (pickStyle == 'only' && out !== null) {
            return { '__error_message': 'Multiple null values found, only one allowed.' };
        } else if (out == null) {
            out = pickFrom[i].value;
        }
    }
}
// first_or_default: fall back to default_value
// first_or_error / only: error if out is still null
Outputs: One of text_param, integer_param, float_param, boolean_param, data_param (filtered by param_type).
Key: The tool does NOT have an all_non_null mode. It always returns a single value, never an array.
Mapping CWL pickValue Modes to pick_value Tool


CWL pickValue
pick_value pick_style
Notes


first_non_null
first_or_error
CWL spec says error if all null; first silently returns null


the_only_non_null
only
Direct match: errors if 0 or >1 non-null


all_non_null
NO MATCH
Returns string[]/File[] — pick_value can't produce arrays


first_non_null Details

CWL spec: "Return first non-null. Error if all null." Maps to first_or_error:

All null -> tool error (matches CWL spec for required outputs)
For optional outputs: could use first (returns null silently)

The first_non_null_all_null conformance test has should_fail: true, confirming the error behavior.
But there's a subtlety: cond-wf-003.cwl has outputSource: [step1/out1, def] where def is a workflow input with default: "Direct". When step1 is skipped, the first_non_null should return the def value. This requires the def workflow input to be wired as a pick_from input alongside step1/out1.
the_only_non_null Details

Direct match to only mode. Errors if 0 non-null or >1 non-null. Conformance tests pass_through_required_fail, the_only_non_null_multi_true are should_fail: true.
all_non_null Details — THE HARD CASE

CWL all_non_null returns an array of all non-null values. Example from cond-wf-007.cwl:
outputs:
  out1:
    type: string[]
    outputSource: [step1/out1, step2/out1]
    pickValue: all_non_null
Expected outputs:

val=0 (both skipped) -> out1: []
val=1 (step2 runs) -> out1: ["bar 1"]
val=3 (both run) -> out1: ["foo 3", "bar 3"]

The existing pick_value tool cannot do this. It returns a scalar. Options:

Extend pick_value tool with an all_non_null mode that returns an array/collection
Use Galaxy's multi-source-to-collection merging (already exists in replacement_for_input_connections)
Write a new expression tool specifically for all_non_null
Handle all_non_null as a runtime feature rather than a tool

How Skipped Step Outputs Work in Galaxy

When when=False, the step is skipped via __when_value__:

modules.py:2771 — slice_dict["__when_value__"] = when_value (False)
execute.py:301 — skip = slice_params.pop("__when_value__", None) is False
execute.py:249 — skip=skip passed to handle_single_execution
actions/__init__.py:794-803 — Job state set to SKIPPED, outputs handled:

if skip:
    job.state = job.states.SKIPPED
    for output_collection in output_collections.out_collections.values():
        output_collection.mark_as_populated()
    ...
    for data in out_data.values():
        data.set_skipped(object_store_populator, replace_dataset=False)

model/__init__.py:5249-5265 — set_skipped():

self.extension = "expression.json"
self.state = self.states.OK  # state is OK, not error
self.blurb = "skipped"
self.peek = json.dumps(None)
self.visible = False
# File content is literally: null
with open(self.dataset.get_file_name(), "w") as out:
    out.write(json.dumps(None))
The output HDA exists, has state=OK, but contains null JSON and has expression.json extension.
When the pick_value tool receives this HDA as an optional="true" data input, Galaxy treats it as null because the dataset content is null. This is confirmed working in existing tests like test_pick_value_preserves_datatype_and_inheritance_chain.
How the Parser Constructs Workflow Dicts

WorkflowProxy.to_dict() at parser.py:700:
def to_dict(self):
    steps = {}
    step_proxies = self.step_proxies()
    input_connections_by_step = self.input_connections_by_step(step_proxies)
    index = 0
    # First: workflow input steps
    for i, input_dict in enumerate(self._workflow.tool["inputs"]):
        steps[index] = self.cwl_input_to_galaxy_step(input_dict, i)
        index += 1
    # Then: tool/subworkflow steps
    for i, step_proxy in enumerate(step_proxies):
        input_connections = input_connections_by_step[i]
        steps[index] = step_proxy.to_dict(input_connections)
        index += 1
    return {"name": name, "steps": steps, "annotation": ...}
Each step dict has workflow_outputs list (from get_outputs_for_label), which tells Galaxy which step outputs are workflow outputs. Currently get_outputs_for_label crashes on multi-source outputs.
pick_value Usage in Galaxy Tests

Already used extensively in Galaxy workflow tests for conditionals:

test_run_workflow_pick_value_bam_pja — basic pick_value with data
test_run_workflow_conditional_step_map_over_expression_tool_pick_value — pick_value with map-over, first_or_error style
test_pick_value_preserves_datatype_and_inheritance_chain — skipped step output -> pick_value -> preserves extension

The input wiring pattern in Galaxy workflows:
pick_value:
    tool_id: pick_value
    in:
      style_cond|type_cond|pick_from_0|value:
        source: step1/out1
      style_cond|type_cond|pick_from_1|value:
        source: step2/out1
    tool_state:
      style_cond:
        pick_style: first_or_error
        type_cond:
          param_type: data
          pick_from:
          - value:
            __class__: RuntimeValue
          - value:
            __class__: RuntimeValue
Implementation Plan

Phase 1: first_non_null and the_only_non_null via pick_value Tool

These two modes map cleanly to the existing pick_value tool.
Step 1: Modify WorkflowProxy.to_dict() to detect pickValue outputs

In parser.py, scan self._workflow.tool["outputs"] for pickValue + list outputSource:
def _pick_value_outputs(self):
    """Find workflow outputs that need synthetic pick_value steps."""
    pick_value_outputs = []
    for output in self._workflow.tool["outputs"]:
        pick_value = output.get("pickValue")
        output_source = output.get("outputSource")
        if pick_value and isinstance(output_source, list) and len(output_source) > 1:
            pick_value_outputs.append({
                "output": output,
                "pick_value": pick_value,
                "sources": output_source,
            })
    return pick_value_outputs
Step 2: Generate synthetic pick_value step dicts

For each pickValue output, create a Galaxy step dict for a pick_value tool invocation:
def _make_pick_value_step(self, pv_info, step_index, cwl_ids_to_index):
    pick_value = pv_info["pick_value"]
    sources = pv_info["sources"]
    output = pv_info["output"]

    # Map CWL pickValue to pick_value pick_style
    style_map = {
        "first_non_null": "first_or_error",
        "the_only_non_null": "only",
    }
    pick_style = style_map[pick_value]

    # Determine param_type from CWL output type
    cwl_type = output.get("type", "File")
    param_type = self._cwl_type_to_pick_param_type(cwl_type)

    # Build input_connections from sources
    input_connections = {}
    pick_from_entries = []
    for i, source in enumerate(sources):
        step_name, output_name = split_step_references(
            source, multiple=False, workflow_id=self.cwl_id
        )
        # Resolve step_name to index
        sep_on = "/" if "#" in self.cwl_id else "#"
        output_step_id = self.cwl_id + sep_on + step_name
        source_index = cwl_ids_to_index[output_step_id]

        conn_key = f"style_cond|type_cond|pick_from_{i}|value"
        input_connections[conn_key] = [{
            "id": source_index,
            "output_name": output_name,
            "input_type": "dataset",
        }]
        pick_from_entries.append({
            "__index__": i,
            "value": {"__class__": "RuntimeValue"},
        })

    # Build tool_state
    tool_state = {
        "style_cond": {
            "__current_case__": {"first": 0, "first_or_default": 1,
                                 "first_or_error": 2, "only": 3}[pick_style],
            "pick_style": pick_style,
            "type_cond": {
                "__current_case__": {"data": 0, "text": 1, "integer": 2,
                                     "float": 3, "boolean": 4}[param_type],
                "param_type": param_type,
                "pick_from": pick_from_entries,
            },
        },
    }

    # Output name for pick_value tool depends on param_type
    output_name_map = {
        "data": "data_param",
        "text": "text_param",
        "integer": "integer_param",
        "float": "float_param",
        "boolean": "boolean_param",
    }

    output_label = self.jsonld_id_to_label(output["id"])

    return {
        "id": step_index,
        "tool_id": "pick_value",
        "label": f"__cwl_pick_value_{output_label}",
        "position": {"left": 0, "top": 0},
        "type": "tool",
        "annotation": f"Synthetic pick_value for CWL pickValue: {pick_value}",
        "input_connections": input_connections,
        "tool_state": tool_state,
        "workflow_outputs": [{
            "output_name": output_name_map[param_type],
            "label": output_label,
        }],
    }
Step 3: Modify to_dict() to inject synthetic steps

def to_dict(self):
    name = ...
    steps = {}
    step_proxies = self.step_proxies()
    input_connections_by_step = self.input_connections_by_step(step_proxies)
    index = 0

    for i, input_dict in enumerate(self._workflow.tool["inputs"]):
        steps[index] = self.cwl_input_to_galaxy_step(input_dict, i)
        index += 1

    for i, step_proxy in enumerate(step_proxies):
        input_connections = input_connections_by_step[i]
        steps[index] = step_proxy.to_dict(input_connections)
        index += 1

    # NEW: inject synthetic pick_value steps
    cwl_ids_to_index = self.cwl_ids_to_index(step_proxies)
    for pv_info in self._pick_value_outputs():
        if pv_info["pick_value"] in ("first_non_null", "the_only_non_null"):
            steps[index] = self._make_pick_value_step(pv_info, index, cwl_ids_to_index)
            index += 1

    return {"name": name, "steps": steps, "annotation": ...}
Step 4: Remove pickValue outputs from original step's workflow_outputs

When we create a synthetic pick_value step for a workflow output, we need to ensure the original source steps don't also claim that output as a workflow_output. The get_outputs_for_label() method currently assigns workflow outputs to the step they come from. For pickValue outputs, we need to suppress this.
Options:

Skip pickValue outputs in get_outputs_for_label() entirely (they'll be on the synthetic step)
Or: modify get_outputs_for_label() to handle multiple=True but still return them, then remove them after synthetic step creation

The cleanest approach: add a check in get_outputs_for_label() — if the output has pickValue, skip it (it'll be handled by synthetic step).
def get_outputs_for_label(self, label):
    outputs = []
    for output in self._workflow.tool["outputs"]:
        # Skip pickValue outputs — handled by synthetic pick_value steps
        if output.get("pickValue") and isinstance(output.get("outputSource"), list):
            continue
        step, output_name = split_step_references(
            output["outputSource"],
            multiple=False,
            workflow_id=self.cwl_id,
        )
        if step == label:
            ...
    return outputs
Phase 2: all_non_null — Requires New/Extended Tool

The existing pick_value tool cannot produce arrays. Options:
Option A: Extend pick_value tool with all_non_null mode

Add a new pick_style value all that returns a list collection or JSON array. This is a tool change and would need coordination with the tools-iuc maintainers. The JS expression would be:
if (pickStyle == 'all') {
    var result = [];
    for (var i = 0; i < pickFrom.length; i++) {
        if (pickFrom[i].value !== null) {
            result.push(pickFrom[i].value);
        }
    }
    return { 'output': result };
}
But expression tools currently can't produce output collections (enforced at ExpressionTool.parse_outputs():  "Expression tools may not declare output collections at this time.").
For File[] outputs, the result needs to be a dataset collection (list). For string[] outputs, it could be an expression.json containing a JSON array.
Option B: Galaxy multi-source merging as the base

Galaxy already merges multiple connections into collections via replacement_for_input_connections in run.py:466-559. When multiple connections target a single input, Galaxy creates an EphemeralCollection of type list.
For all_non_null, we could:

Let the multiple sources wire directly to a synthetic step that filters nulls
Or use a new expression tool that receives the merged collection and filters out null elements

Option C: Handle all_non_null via runtime/native semantics

Instead of a tool step, implement all_non_null as a native workflow output collection mechanism. This would require changes to modules.py and run.py to collect the workflow outputs, filter nulls, and produce a collection. More invasive but avoids tool limitations.
Recommended: Option A (extend pick_value) for scalar types; Option C for File[]

For CWL string[] type: an expression tool can return a JSON array in expression.json.
For CWL File[] type: need collection output, which expression tools can't produce. Must use runtime approach or a different tool type.
However, looking at the conformance tests:

cond-wf-005.cwl has type: string[] with all_non_null — CWL strings, which in Galaxy are expression.json parameters
cond-wf-007.cwl has type: string[] with all_non_null — same
cond-with-defaults.cwl has type: File[] with all_non_null — this is the hard case

Phase 3: first_non_null with Workflow Input Sources

cond-wf-003.cwl has outputSource: [step1/out1, def] where def is a workflow input (not a step output). The synthetic pick_value step needs to wire def's output as one of its pick_from inputs.
This should work naturally because workflow inputs are represented as input steps in Galaxy with an implicit output named "output". The cwl_ids_to_index map already includes input steps. So split_step_references("def") returns ("def", "output") and the index lookup finds the input step.
CWL Type to pick_value param_type Mapping

def _cwl_type_to_pick_param_type(self, cwl_type):
    """Map CWL type to pick_value param_type."""
    # Handle optional types like ["null", "string"]
    if isinstance(cwl_type, list):
        cwl_type = [t for t in cwl_type if t != "null"][0]
    type_map = {
        "File": "data",
        "string": "text",
        "int": "integer",
        "long": "integer",
        "float": "float",
        "double": "float",
        "boolean": "boolean",
    }
    return type_map.get(cwl_type, "data")
Test Strategy

Red-to-green targets (Phase 1):

first_non_null tests:

pass_through_required_false_when / _nojs — val=1, step1 skipped, output = def = "Direct"
pass_through_required_true_when / _nojs — val=3, step1 runs, output = step1's output
first_non_null_first_non_null / _nojs — two steps, first runs
first_non_null_second_non_null / _nojs — two steps, second runs

the_only_non_null tests:

pass_through_required_the_only_non_null / _nojs — single non-null
the_only_non_null_single_true / _nojs — single non-null

should_fail tests (already green, must stay green):

first_non_null_all_null / _nojs — all null, should error
pass_through_required_fail / _nojs — >1 non-null with the_only_non_null
the_only_non_null_multi_true / _nojs — >1 non-null

Red-to-green targets (Phase 2 — all_non_null):


all_non_null_all_null / _nojs — empty array result
all_non_null_one_non_null / _nojs — single-element array
all_non_null_multi_non_null / _nojs — multi-element array

Risks and Edge Cases


Tool availability: pick_value must be loaded at workflow import time. It's bundled in tools/expression_tools/pick_value.xml but needs to be in the tool panel. Check if CWL workflow import auto-loads tools.


tool_id vs tool_uuid: Normal CWL step imports use tool_uuid (from the CWL tool proxy). The synthetic step uses tool_id: "pick_value" directly. Need to verify the workflow import API accepts tool_id for expression tools.


should_fail test regression: Some tests are currently green because the workflow import crashes before execution. With the parser fix, these workflows will import successfully. The should_fail tests need the pick_value tool to error during execution, which first_or_error and only modes do correctly.


tool_state format: The tool_state dict format for workflow import API may need JSON encoding or specific __current_case__ values. The existing Galaxy test examples (shown above) demonstrate the correct format.


CWL outputs referencing workflow inputs as sources: Works because Galaxy input steps have index entries in cwl_ids_to_index and produce output named "output".


Scatter + conditional + pickValue: cond-wf-009.cwl has outputSource: step1/out1 (single source) with pickValue: all_non_null. This is NOT a multi-source case — it's filtering nulls from a scattered step's output array. This may need different handling (the scatter already produces a collection; we need to filter null elements).


linkMerge: merge_flattened combined with pickValue: all_non_null: cond-with-defaults.cwl uses both. The merge produces a flat list; then all_non_null filters nulls. This interaction adds complexity.


Review Notes

Reviewed against CWL_CONDITIONALS_STATUS.md and actual test file markers.
Factual Corrections


Test count is wrong. Plan says "27 CWL v1.2 conditional conformance tests are red." Actual from test_cwl_conformance_v1_2.py: 29 red, 17 green (46 total). Status doc also wrong (says 13 green, 27 red). Discrepancies:

all_non_null_all_null / _nojs: status doc lists as GREEN (should_fail), but these are NOT should_fail — they expect out1: []. They're @pytest.mark.red.
condifional_scatter_on_nonscattered_true_nojs: RED in test file, not listed in status doc.


get_outputs_for_label is also called from cwl_input_to_galaxy_step (line 746), not just tool steps. The Step 4 skip logic handles this correctly since it checks for pickValue, but this call path should be noted.


__current_case__ values verified correct. pick_style: first(0), first_or_default(1), first_or_error(2), only(3). param_type: data(0), text(1), integer(2), float(3), boolean(4). Plan's mappings match pick_value.xml.


Approach Correctness


Phase 1 approach is sound. first_or_error and only map correctly to CWL semantics. Synthetic step insertion at parse time avoids runtime changes.


should_fail regression risk is manageable. After fix, import succeeds but first_or_error/only modes error at execution — still counts as failure for should_fail tests. Needs verification.


tool_id should work. Workflow import API accepts tool_id for Galaxy-native tools. Normal CWL steps use tool_uuid (parser.py:1118), but synthetic steps can use tool_id: "pick_value" directly.


Additional Risks


Workflow re-export fidelity. CWL→Galaxy import creates a real pick_value step. Galaxy→CWL export won't round-trip back to pickValue syntax. Acceptable for CWL conformance testing but worth noting.


pickValue on step inputs. CWL spec allows it on step inputs too, not just workflow outputs. Zero conformance tests exercise this, so deprioritize.


Unresolved Questions


Does CWL workflow import API accept tool_id for non-CWL tools (like pick_value), or must we use tool_uuid? If the latter, we need to look up or generate a UUID for the bundled pick_value tool.
Is pick_value always loaded in Galaxy instances that run CWL workflows, or could it be missing?
For first_non_null on an optional CWL output, should we use first (returns null) or first_or_error (errors)? CWL spec says error for required outputs; spec unclear for optional.
For all_non_null with File[] output: should we extend pick_value tool, create a new tool, or implement as runtime logic? Expression tools can't produce collections.
For cond-wf-009.cwl (single-source scatter + pickValue: all_non_null): is this a collection-filter problem rather than a multi-source problem? Does the synthetic tool approach apply at all?
Should the synthetic step label be hidden from the user, or visible? Galaxy has __ prefix convention for internal steps — does the workflow editor handle this?
How should linkMerge + pickValue interaction work? cond-with-defaults.cwl uses both. Is linkMerge applied before pickValue?


## CWL_CONDITIONALS_STATUS.md

      
    Raw
  

              CWL_CONDITIONALS_STATUS.md
            
          
    CWL Conditional Workflow Support in Galaxy

CWL v1.2 Conditional Features

CWL v1.2 introduced three key conditional features:

when — Step-level boolean expression; if false, step is skipped and outputs are null
pickValue — Workflow output/step input directive for merging multiple sources:

first_non_null — return first non-null from source list
the_only_non_null — validate exactly one non-null, return it
all_non_null — return array of all non-null values


MultipleInputFeatureRequirement — Required when outputSource or step input source is a list

What's Implemented

Multiple input connections — WORKING

Galaxy already supports multiple connections to step inputs. 7 non-conditional multiple_input tests pass across v1.1/v1.2 (e.g. wf_multiplesources_multipletypes, wf_wc_scatter_multiple_merge, valuefrom_wf_step_multiple). The plumbing:

input_connections_by_step() at parser.py:668 calls split_step_references() with default multiple=True, building lists of connections per input name
linkMerge is parsed at parser.py:1028-1029 and stored as merge_type
MultipleInputFeatureRequirement listed in SUPPORTED_TOOL_REQUIREMENTS (not enforced, but not rejected)

This confirms Galaxy's workflow model handles multi-source connections — the gap is specifically in how workflow outputs (not inputs) reference multiple sources.
when expressions — WORKING

Galaxy has full support for step-level when expressions:
Parsing (CWL→Galaxy workflow):

parser.py:1115 — ToolStepProxy.to_dict() extracts when from CWL step

Runtime evaluation:

modules.py:474-526 — evaluate_value_from_expressions() evaluates when via do_eval() (CWL JS engine)
modules.py:1137-1161 — Subworkflow module propagates when_values through slice_collections()
modules.py:2765-2771 — CWL tool module evaluates when per scatter slice
run.py:403-418 — WorkflowProgress tracks when_values list
modules.py:3024 — Steps with when_values == [False] have outputs hidden

Bug: SubworkflowStepProxy.to_dict() at parser.py:1137-1154 does NOT extract the when expression (only ToolStepProxy does). Conditional subworkflow steps lose their when condition during import.
What's NOT Implemented

Multiple outputSource — NOT SUPPORTED

The crash point for all_non_null_all_null and all pickValue tests:
parser.py:607  get_outputs_for_label()
  → calls split_step_references(output["outputSource"], multiple=False, ...)
parser.py:982  split_step_references()
  → assert len(split_references) == 1  ← CRASH

get_outputs_for_label() hardcodes multiple=False, but the list outputSource pattern (e.g. outputSource: [step1/out1, step2/out1]) is exclusively a CWL v1.2 conditional feature — no non-conditional tests exercise this path. All the existing multiple_input tests that pass use multiple sources on step inputs, not workflow outputs.
pickValue — NOT IMPLEMENTED

No parsing, serialization, or runtime logic for pickValue exists anywhere in Galaxy source.
Test Status (v1.2 conditional tests)

GREEN (passing) — 13 tests

Simple when + single outputSource (no pickValue needed):

direct_optional_null_result / _nojs / direct_required / _nojs — single step, when=false, output=null
direct_optional_nonnull_result / _nojs — single step, when=true
condifional_scatter_on_nonscattered_true — scatter with single source

should_fail validation tests (pass because workflow import crashes on multiple outputSource):

first_non_null_all_null / _nojs — all sources null with first_non_null
pass_through_required_fail / _nojs — multiple non-null with the_only_non_null
all_non_null_multi_with_non_array_output / _nojs — all_non_null on non-array type
the_only_non_null_multi_true / _nojs — multiple non-null with the_only_non_null
conditionals_non_boolean_fail / _nojs — non-boolean when result

RED (failing) — 27 tests

All tests requiring pickValue or multiple outputSource at the workflow output level:
pickValue: first_non_null (crash on multiple outputSource):

pass_through_required_false_when / _nojs / _true_when / _nojs
first_non_null_first_non_null / _nojs / _second_non_null / _nojs

pickValue: the_only_non_null (crash on multiple outputSource):

pass_through_required_the_only_non_null / _nojs
the_only_non_null_single_true / _nojs

pickValue: all_non_null (crash on multiple outputSource):

all_non_null_all_null / _nojs / _one_non_null / _nojs / _multi_non_null / _nojs

Scatter + conditional (various failures):

condifional_scatter_on_nonscattered_false / _nojs
scatter_on_scattered_conditional / _nojs
conditionals_nested_cross_scatter / _nojs
conditionals_multi_scatter / _nojs

Complex conditional + defaults:

cond-with-defaults-1 / cond-with-defaults-2

Architecture Gaps

To implement pickValue, Galaxy would need:


Parser changes (parser.py):

get_outputs_for_label() must handle list outputSource (pass multiple=True)
Store pickValue directive on workflow output metadata
Multiple input connections already work, so the underlying model supports this


Runtime changes (modules.py / run.py):

Apply pickValue semantics when collecting workflow outputs
Handle null-filtering (first_non_null, all_non_null) and validation (the_only_non_null)


Scatter + conditional combination:

Some scatter tests produce null elements that need filtering
condifional_scatter_on_nonscattered_false expects out1: [] — all scatter elements skipped


Unresolved Questions


Should pickValue be a Galaxy-level workflow feature or only CWL?
How should null outputs from skipped steps interact with Galaxy's collection model?
Pros	Cons
Parser-only change, no model/runtime/migration	`all_non_null` not supported (tool returns scalar)
Reuses battle-tested `pick_value` tool	Synthetic steps visible in workflow editor
`should_fail` tests handled by tool's error modes	No round-trip CWL re-export fidelity
Low risk — no changes to execution engine	`File[]` output via `all_non_null` impossible
Small diff, fast to implement	Scatter+conditional pattern not addressed
Tool already tested in Galaxy workflow suite	`cond-with-defaults` (linkMerge+pickValue) unclear
Pros	Cons
All 3 pickValue modes supported	DB migration required
Clean semantic model — pickValue is first-class	Null detection (skipped vs empty) is hard
Benefits Galaxy-native workflows long-term	Duplicate-label WorkflowOutputs may confuse editor
Scatter+conditional pattern addressable	5-7 files, medium-large change
Correct CWL export round-trip possible	`all_non_null` returning list vs HDCA is unresolved
No synthetic steps polluting workflow graph	Higher regression risk across workflow subsystem
Mode	Tests	Pattern	Tool (A)	Framework (B)
first_non_null	8	multi-source	YES	YES
`pass_through_required_{false,true}_when` x2 (+nojs)		multi-source	YES	YES
`first_non_null_{first,second}_non_null` x2 (+nojs)		multi-source	YES	YES
the_only_non_null	4	multi-source	YES	YES
`pass_through_required_the_only_non_null` (+nojs)		multi-source	YES	YES
`the_only_non_null_single_true` (+nojs)		multi-source	YES	YES
all_non_null	6	multi-source	NO	YES
`all_non_null_{all_null,one,multi}_non_null` x3 (+nojs)		multi-source	NO	YES
scatter+conditional	7	scatter	NO	PARTIAL
`condifional_scatter_on_nonscattered_{false,true_nojs}` x3		scatter+pickValue	NO	YES (Phase 6)
`scatter_on_scattered_conditional` (+nojs)		scatter+pickValue	NO	YES (Phase 6)
`conditionals_nested_cross_scatter` (+nojs)		nested scatter	NO	MAYBE
`conditionals_multi_scatter` (+nojs)		hybrid multi+scatter	NO	MAYBE
Complex	2	multi+linkMerge	NO	PARTIAL
`cond-with-defaults-{1,2}`		linkMerge+pickValue+File[]	NO	PARTIAL
	Tool (A)	Framework (B)
Covered	12/29 (41%)	18-24/29 (62-83%)
first_non_null + the_only_non_null	12/12	12/12
all_non_null (multi-source)	0/6	6/6
scatter+conditional	0/9	4-9/9 (phases)
cond-with-defaults	0/2	0-2/2 (depends on linkMerge)
Dimension	Tool (A)	Framework (B)
Files touched	1 (`parser.py`)	5-7 (model, migration, parser, import, run, export)
Lines of code	~100-150	~300-500
DB migration	No	Yes
Runtime changes	No	Yes (run.py post-processing)
Regression risk	Low (parser only)	Medium-High (execution path)
Time estimate	1-2 days	5-8 days
Reviewability	Easy — self-contained	Harder — cross-cutting
Hardest sub-problem	Type mapping CWL->param_type	Null detection (skipped vs empty)
Component	Effort	Risk
Model: add column + migration	Small	Low
Parser: fix get_outputs_for_label	Small	Low
Import: read pick_value from dict	Small	Low
Execution: Pattern A (multi-source)	Medium	Medium
Execution: null detection	Medium	High
Execution: Pattern B (scatter filter)	Large	High
Export: serialize pick_value	Small	Low
should_fail test compatibility	Small	Low
CWL pickValue	pick_value `pick_style`	Notes
`first_non_null`	`first_or_error`	CWL spec says error if all null; `first` silently returns null
`the_only_non_null`	`only`	Direct match: errors if 0 or >1 non-null
`all_non_null`	NO MATCH	Returns `string[]`/`File[]` — pick_value can't produce arrays