test_conformance_v1_1_secondary_files_in_output_records (xfail in v1.1 and v1.2)
test/functional/tools/cwl_tools/v1.1/tests/record-out-secondaryFiles.cwl:
outputs:
record_output:
type:
type: record
fields:
f1:
type: File
secondaryFiles: .s2
outputBinding:
glob: A
f2:
type: { type: array, items: File }
secondaryFiles: .s3
outputBinding:
glob: [B, C]
baseCommand: touch
arguments: [A, A.s2, B, B.s3, C, C.s3]No inputs. Creates 6 files, expects record output with:
f1: File "A" with secondary file "A.s2"f2: array of [File "B" with "B.s3", File "C" with "C.s3"]
{
"f1": {"class": "File", "location": "A", "secondaryFiles": [{"location": "A.s2"}]},
"f2": [
{"class": "File", "location": "B", "secondaryFiles": [{"location": "B.s3"}]},
{"class": "File", "location": "C", "secondaryFiles": [{"location": "C.s3"}]}
]
}{
"f1": {"class": "File", "basename": "record-out-secondaryFiles.cwl", "secondaryFiles": [{"basename": "A.s2"}]},
"f2": {"class": "File", "basename": "record-out-secondaryFiles.cwl"}
}Problems:
- f1 wrong basename: "record-out-secondaryFiles.cwl" (the CWL tool file) instead of "A"
- f1 secondary files present: A.s2 IS found (secondary files partially work)
- f2 is a single File, not array: Should be
[{B}, {C}], got single File - f2 no secondary files: Missing B.s3, C.s3
- f2 wrong basename: Same tool filename
lib/galaxy/model/dataset_collections/types/record.py:44-55:
def prototype_elements(self, fields=None, **kwds):
for field in fields:
name = field.get("name", None)
assert field.get("type", "File") # NS: this assert doesn't make sense
field_dataset = DatasetCollectionElement(
element=HistoryDatasetAssociation(),
element_identifier=name,
)
yield field_datasetEvery record field becomes a plain HDA. The CWL type info in field.get("type") is
ignored. For f2 (array of File), it should create a nested list collection, but instead
creates a single HDA.
lib/galaxy/tool_util/cwl/runtime_actions.py:193-201:
elif isinstance(output, dict):
prefix = f"{output_name}|__part__|"
for record_key, record_value in output.items():
record_value_output_key = f"{prefix}{record_key}"
if isinstance(record_value, dict) and "class" in record_value:
handle_known_output(record_value, record_value_output_key)
else:
handle_known_output_json(output, output_name) # BUGThe else branch (line 201) has TWO wrong variables:
output(entire record dict) instead ofrecord_value(the field value)output_name("record_output") instead ofrecord_value_output_key("record_output|part|f2")
output_name = "record_output" is the COLLECTION name, not a dataset. It's not in
_output_dict (which only has record_output|__part__|f1 and record_output|__part__|f2).
So job_proxy.output_path("record_output") raises KeyError, crashing handle_outputs()
before it writes provided_metadata (line 228).
Consequence: The metadata JSON is never written. ALL record field HDAs keep their default
names (the CWL tool filename) and created_from_basename is never set. This explains why
f1 has wrong basename even though move_output correctly copied file "A" to f1's path and
wrote secondary files.
Lines 203-216 handle arrays at the TOP level:
elif isinstance(output, list):
elements = []
for index, el in enumerate(output):
if isinstance(el, dict) and el["class"] == "File":
elements.append({"name": str(index), "filename": output_path, ...})
...
provided_metadata[output_name] = {"elements": elements}But INSIDE the record loop (lines 195-201), only dict with "class" is handled.
List values (arrays) fall through to the broken else branch.
Even if the else branch were fixed, a list field needs special handling — it can't just be JSON-dumped into a single HDA.
runtime_actions.py:128-129:
for secondary_file in secondary_files:
if output_name is None:
raise NotImplementedError("secondaryFiles are unimplemented for dynamic list elements")The top-level list handler (lines 203-216) doesn't pass output_name to move_output.
Even if arrays-in-records were implemented, secondary files for array elements would hit
this NotImplementedError.
CwlToolSource.parse_outputs()
→ _parse_output_record()
→ ToolOutputCollection(structure=ToolOutputCollectionStructure(collection_type="record", fields=...))
→ RecordDatasetCollectionType.prototype_elements(fields)
→ DatasetCollectionElement(element=HistoryDatasetAssociation(), element_identifier=name)
→ Creates plain HDA for EVERY field (f1, f2) regardless of CWL type
CwlToolEvaluation (tools/evaluation.py:1246-1270)
→ out_data = job.io_dicts() → {
"record_output|__part__|f1": HDA_f1,
"record_output|__part__|f2": HDA_f2,
}
→ output_dict = {name: {"id": ..., "path": ...} for name, dataset in out_data.items()}
→ cwl_job_proxy = JobProxy(input_json, output_dict, ...)
handle_outputs()
→ cwltool returns: {"record_output": {"f1": {class:File,...}, "f2": [{class:File,...}, ...]}}
→ record loop: f1 handled correctly by handle_known_output ✓
→ record loop: f2 is list → else → handle_known_output_json(output, "record_output") → KeyError!
→ provided_metadata never written → all HDAs keep defaults
CwlToolRun._output_name_to_object("record_output")
→ job["output_collections"]["record_output"] → GalaxyOutput(dataset_collection)
output_to_cwl_json()
→ collection_type "record" → iterate elements → element_to_cwl_json(element)
→ f1 element → single HDA → File (but wrong basename from missing metadata)
→ f2 element → single HDA → File (should be list of Files)
lib/galaxy/model/dataset_collections/types/record.py:44-55— prototype_elements, all fields become plain HDAslib/galaxy/tool_util/cwl/runtime_actions.py:193-201— record output handling, broken else branchlib/galaxy/tool_util/cwl/runtime_actions.py:117-158— move_output with secondary files, NotImplementedError for listslib/galaxy/tool_util/cwl/runtime_actions.py:203-216— top-level list output handling (not used in records)lib/galaxy/tool_util/parser/cwl.py:275-287— _parse_output_recordlib/galaxy/tool_util/parser/output_objects.py:318-367— known_outputs, ToolOutputCollectionPartlib/galaxy/tools/evaluation.py:1246-1270— output_dict constructionlib/galaxy/tool_util/cwl/util.py:683-694— record output to CWL JSON conversion
Fix the else branch to use correct variables:
handle_known_output_json(record_value, record_value_output_key)This prevents the KeyError crash and fixes f1's metadata (basename). Non-File/non-array record fields (scalars, expressions) would also work correctly. f2 would still be wrong (JSON-serialized list in a single HDA) but at least f1 works and the function doesn't crash.
- record.py: Detect array-type fields from
field.get("type")and create nested list collections instead of plain HDAs - runtime_actions.py: Add list handling case in the record loop (similar to lines 203-216)
using
record_value_output_keyas the output name - cwl.py: Ensure the
fieldslist passed through has type info preserved - util.py: Handle nested collections within record elements during reconversion
- runtime_actions.py: Implement secondary files for list elements in
move_output(or a new handler for array elements with secondary files) - Needs a way to store secondary files per-element, potentially using the element index in the path structure
Instead of pre-creating nested collections, treat array-within-record fields as dynamic
outputs and use from_provided_metadata discovery (like top-level array outputs do).
This might be simpler since it avoids changing the pre-creation infrastructure.
- Existing tests for record outputs with plain File fields (no arrays, no secondaryFiles)? Could verify Bug 2 in isolation.
- Does the
fieldslist passed torecord.py:prototype_elementscontain CWL type info, or stripped? - Would dynamic output discovery (from_provided_metadata) work for nested collections within records?
- The
secondary_files_in_unnamed_recordstest (also xfail) — same root cause or different? - Does anyone currently use CWL record outputs successfully for simpler cases (all-File fields)?