Currently BatchDataInstance (line 534) and BatchDataInstanceInternal (line 883) are simple {src, id} models. Add map_over_type: Optional[str] = None to both. Use Optional[str] — consistent with how collection_type is modeled elsewhere.
This is the core request-layer gap — map_over_type is how clients express subcollection mapping intent in batch values, but the schema doesn't model it.
Files: lib/galaxy/tool_util_models/parameters.py
DCE is backend-produced during batch expansion — it does NOT belong in the external request layer.
Add DataRequestInternalDce with src: Literal["dce"], id: StrictInt (if not already present).
Add "dce" to internal-only types:
DataRequestInternalDereferencedTunion — addDatasetCollectionElementReference(already exists at parameters.py:1067) to cover job_internal DCE refs produced by subcollection mapping expansion- Verify
MultiDataInstanceInternalandMultiDataInstanceInternalDereferencedunions includeDataRequestInternalDce - Do NOT add
DataRequestDceto the external_DataRequestunion orBatchDataInstance.src - Do NOT add
"dce"toBatchDataInstanceInternal.src— batch expansion happens afterrequest_internal, so DCE never appears in Batch values at that layer
Files: lib/galaxy/tool_util_models/parameters.py
The encode() and decode() functions in convert.py work with generic src_dict format. Verify they handle src: "dce" in internal representations without special-casing. The dereference() function may need DCE handling if a dereference step encounters stored DCE refs.
Fix runtimeify in convert.py (line 548) — currently hardcodes DataRequestInternalHda(**value), which breaks on DCE src dicts. Needs to dispatch on src and handle DCE → dataset resolution.
Files: lib/galaxy/tool_util/parameters/convert.py
PYTHONPATH=lib python -m pytest test/unit/tool_util/test_parameter_specification.py -x --timeout=60Existing tests should still pass — we're only adding new fields/types, not changing existing validation.
Add test cases to gx_data entry. These validate the client-facing schema:
# request_valid additions — map_over_type on batch values:
- parameter: {__class__: "Batch", values: [{src: hdca, id: abcdabcd, map_over_type: paired}]}
- parameter: {__class__: "Batch", values: [{src: hdca, id: abcdabcd, map_over_type: "list:paired"}]}
# map_over_type: null should also be valid (no subcollection mapping)
- parameter: {__class__: "Batch", values: [{src: hdca, id: abcdabcd, map_over_type: null}]}
# landing_request_valid additions — landing pages can pre-fill batch params with map_over_type:
- parameter: {__class__: "Batch", values: [{src: hdca, id: abcdabcd, map_over_type: paired}]}
# request_invalid additions — dce should NOT be valid in external request:
- parameter: {__class__: "Batch", values: [{src: dce, id: abcdabcd}]}
- parameter: {src: dce, id: abcdabcd}These validate post-decode representations where map_over_type carries through:
# request_internal_valid additions:
- parameter: {__class__: "Batch", values: [{src: hdca, id: 5, map_over_type: paired}]}
# request_internal_dereferenced_valid additions:
- parameter: {__class__: "Batch", values: [{src: hdca, id: 5, map_over_type: paired}]}DCE does NOT belong in Batch values at request_internal — batch expansion hasn't happened yet, and reruns reconstruct HDCA refs via build_for_rerun.
After expansion, individual job params contain DCE refs (not wrapped in Batch — Batch is expanded away by this layer). Subcollection mapping over gx_data produces {"src": "dce", "id": <int>} via to_decoded_json — each expanded job gets a DCE representing one subcollection element whose child_collection contains the datasets the tool will process.
# job_internal_valid additions — subcollection mapping produces DCE refs:
- parameter: {src: dce, id: 5}
# job_internal_invalid — DCE with encoded ID should fail:
- parameter: {src: dce, id: abcdabcd}The current job_internal schema for gx_data only allows src: "hda" or src: "ldda" (DataRequestInternalDereferencedT). Must add DatasetCollectionElementReference to the union.
PYTHONPATH=lib python -m pytest test/unit/tool_util/test_parameter_specification.py -x --timeout=60Write specs first (red), then fix any model issues (green).
Files: test/unit/tool_util/parameter_specification.yml
Currently (meta.py:472) the async path rejects src != "hdca". Change to accept "dce" and resolve DatasetCollectionElement → child collection, matching the sync path.
This matters for job reruns where stored job state contains DCE refs from a previous expansion.
if src not in ("hdca", "dce"):
raise exceptions.ToolMetaParameterException(...)
if src == "dce":
item = app.model.context.get(DatasetCollectionElement, item_id)
collection = item.child_collection
else:
item = app.model.context.get(HistoryDatasetCollectionAssociation, item_id)
collection = item.collectionFiles: lib/galaxy/tools/parameters/meta.py
The existing test_map_over_with_nested_paired_output_format_actions uses a manual dict. Refactor it to use the tool_input_format fixture (runs 3x: flat, nested, request) so it gains request-format coverage with map_over_type.
The request-format callback produces {__class__: "Batch", values: [{src: "hdca", id: ..., map_over_type: "paired"}]}. Need to check if DescribeToolInputs supports this or if we need to extend the fluent API.
Migrate test_simple_subcollection_mapping from test_tools.py to test_tool_execute.py with request format coverage:
@requires_tool_id("cat1")
def test_simple_subcollection_mapping(
target_history: TargetHistory,
required_tool: RequiredTool,
tool_input_format: DescribeToolInputs,
):
hdca = target_history.with_example_list_of_pairs()
# legacy/nested: {"f1": {"batch": True, "values": [{"src": "hdca", "map_over_type": "paired", "id": hdca_id}]}}
# request: {"f1": {"__class__": "Batch", "values": [{"src": "hdca", "id": hdca_id, "map_over_type": "paired"}]}}
...Refactor existing test_map_over_paired_or_unpaired_with_list_paired to use tool_input_format fixture so it covers all 3 input formats including request.
Review DescribeToolInputs in populators.py to see if .when.request() callbacks can produce batch inputs with map_over_type. If not, extend the fluent API. May need a helper like:
def batch_with_map_over(hdca, map_over_type):
return {"__class__": "Batch", "values": [{**hdca.src_dict, "map_over_type": map_over_type}]}PYTHONPATH=lib python -m pytest test/unit/tool_util/test_parameter_specification.py -x./run_tests.sh -api lib/galaxy_test/api/test_tool_execute.py -k "subcollection or dce or map_over"./run_tests.sh -api lib/galaxy_test/api/test_tool_execute.py| Step | Phase | Description | Test First? |
|---|---|---|---|
| 1 | 2a-2b | Write parameter specification tests for map_over_type (expect failures) | Yes (red) |
| 2 | 1a | Add map_over_type to BatchDataInstance/BatchDataInstanceInternal | Green |
| 3 | 2d | Verify spec tests pass | Green check |
| 4 | 1b-1c | Add DCE to internal representations, fix runtimeify in convert.py |
Implementation |
| 5 | 2c | Write job_internal spec tests for DCE (red→green) | Red→Green |
| 6 | 4a-4d | Write API execution tests (expect failures for request format) | Yes (red) |
| 7 | 3a | Fix async expansion for DCE | Green |
| 8 | 4d | Extend fluent API if needed | Green |
| 9 | 5a-5c | Full test runs | Regression |