Working with Galaxy Workflow Models#
This guide describes the typed model layers available for programmatically working with Galaxy workflows. The layers are organized from highest convenience (top) to lowest level (bottom):
┌─────────────────────────────────────────────────────┐
│ ensure_format2() / ensure_native() │
│ "Give me any workflow, I'll give you typed models" │
│ Auto-detects format, converts, normalizes, expands │
│ ↳ from gxformat2.normalized │
├─────────────────────────────────────────────────────┤
│ Expanded Models │
│ ExpandedFormat2 / ExpandedNativeWorkflow │
│ All subworkflow refs resolved to inline models │
│ ↳ from gxformat2.normalized │
├─────────────────────────────────────────────────────┤
│ Normalized Models │
│ NormalizedFormat2 / NormalizedNativeWorkflow │
│ Structural guarantees: lists, ids, parsed state │
│ ↳ from gxformat2.normalized │
├─────────────────────────────────────────────────────┤
│ Schema Models (lax / strict) │
│ GalaxyWorkflow / NativeGalaxyWorkflow │
│ Direct pydantic parse — no normalization │
│ ↳ from gxformat2.schema.gxformat2 │
│ ↳ from gxformat2.schema.native │
├─────────────────────────────────────────────────────┤
│ Raw dicts │
│ yaml.ordered_load() / json.load() │
│ No validation, no types │
└─────────────────────────────────────────────────────┘
The ensure_ Layer — Recommended Entry Point#
The ensure_format2() and
ensure_native() functions are the primary way to
get typed workflow models. They accept any workflow representation and
return normalized (or expanded) models in the target format, handling
format detection and conversion automatically.
ensure_format2#
from gxformat2.normalized import ensure_format2
# From a file path (native .ga or Format2 .gxwf.yml)
nf2 = ensure_format2("workflow.ga")
nf2 = ensure_format2("workflow.gxwf.yml")
# From a raw dict (auto-detects native vs Format2)
nf2 = ensure_format2(workflow_dict)
# From any typed model — passes through or converts
nf2 = ensure_format2(native_galaxy_workflow)
nf2 = ensure_format2(normalized_native_workflow)
nf2 = ensure_format2(galaxy_workflow_schema_model)
# With expansion (resolves @import, URLs, subworkflow refs)
from gxformat2.options import ConversionOptions
expanded = ensure_format2(workflow, options=ConversionOptions(), expand=True)
Returns: NormalizedFormat2 by default,
ExpandedFormat2 when expand=True.
Overloads provide correct static types based on the expand literal.
ensure_native#
from gxformat2.normalized import ensure_native
# Same flexibility — any input format
nnw = ensure_native("workflow.gxwf.yml")
nnw = ensure_native(native_dict)
nnw = ensure_native(normalized_format2_model)
# With expansion
expanded = ensure_native(workflow, options=ConversionOptions(), expand=True)
Returns: NormalizedNativeWorkflow by
default, ExpandedNativeWorkflow when
expand=True.
Format Detection#
For raw dicts, the functions detect the format by inspecting keys:
a_galaxy_workflow == "true"→ native Galaxy formatclass == "GalaxyWorkflow"→ Format2
For file paths (str or Path), the file is loaded and then inspected.
Typed model inputs (NormalizedFormat2,
NativeGalaxyWorkflow, etc.) are
routed by isinstance checks — already-normalized models pass through
without re-processing.
Access Patterns for Applications#
Once you have a NormalizedFormat2, the
model provides structured typed access to all workflow components:
nf2 = ensure_format2(workflow)
# Workflow metadata
nf2.label # str | None
nf2.doc # str | None (joined if originally a list)
nf2.tags # list[str] | None
nf2.creator # list[CreatorPerson | CreatorOrganization] | None
nf2.license # str | None
# Inputs — always a list, shorthands expanded
for inp in nf2.inputs:
inp.id # str — always populated
inp.type_ # GalaxyType | list[GalaxyType] | None
inp.optional # bool | None
inp.default # Any
inp.doc # str | None
inp.position # StepPosition | None
# Steps — always a list, ids populated
for step in nf2.steps:
step.id # str — always populated
step.label # str | None
step.tool_id # str | None
step.type_ # WorkflowStepType | None
step.doc # str | None (joined if originally a list)
step.position # StepPosition | None
step.run # NormalizedFormat2 | str | dict | None
step.in_ # list[WorkflowStepInput] — always a list
step.out # list[WorkflowStepOutput] — always a list
step.connected_paths # frozenset[str] — input ids with a source connection
# Workflow-level tool inventory (recurses into inline subworkflows)
nf2.unique_tools # frozenset[ToolReference] — (tool_id, tool_version) pairs
# Outputs — always a list
for out in nf2.outputs:
out.id # str | None
out.outputSource # str | None
out.type_ # GalaxyType | list[GalaxyType] | None
Source Reference Resolution#
Format2 workflows reference step outputs using strings like
"step_label/output_name". When step labels themselves contain /,
parsing is ambiguous. The normalized layer provides helpers:
# Resolve a source reference against this workflow's steps
ref = nf2.resolve_source("filter_step/output1")
ref.step_label # "filter_step"
ref.output_name # "output1"
# known_labels is a cached set of all step labels/ids + input ids
nf2.known_labels # {"filter_step", "input1", "map_step", ...}
# Works with ambiguous labels containing "/"
# e.g. step labeled "Mass Spec Runs (MS/MS)"
ref = nf2.resolve_source("Mass Spec Runs (MS/MS)/spectra")
ref.step_label # "Mass Spec Runs (MS/MS)"
ref.output_name # "spectra"
The resolver tries known labels longest-first, then falls back to splitting
on the first /. Bare references (no /) default to output_name="output".
The standalone function resolve_source_reference()
is also available:
from gxformat2.normalized import resolve_source_reference, SourceReference
ref: SourceReference = resolve_source_reference("step/out", known_labels)
ConversionOptions#
Both ensure_ functions and the underlying
to_format2() /
to_native() converters accept an optional
ConversionOptions:
from gxformat2.options import ConversionOptions
options = ConversionOptions(
workflow_directory="/path/to/dir", # For resolving @import paths
url_resolver=my_resolver, # Custom fetcher for URL refs
)
expanded = ensure_format2(workflow, options=options, expand=True)
Key options:
workflow_directory— base path for@importresolutionurl_resolver— callable for fetching URL / TRS referencesencode_tool_state_json— JSON-encode tool_state in native output (default True)state_encode_to_native— Galaxy-provided callback to encode Format2 state back to nativetool_state(accepts(step_dict, state_dict), returns clean dict orNonefor default dict passthrough — no JSON encoding)state_encode_to_format2— Galaxy-provided callback to decode nativetool_stateto Format2state(accepts step dict, returns state dict orNonefor default)strict_structure— validate input against the strict schema (extra="forbid") before normalization and sanity-check conversion output against the target strict schema. RaisesValidationErroron any unrecognised keys. Threaded into nested sub-workflow normalization ($graphdedup entries and inlinerun:dicts). Default False.
See ConversionOptions for the full parameter list.
Real-World Usage#
The application modules in this project all use the ensure_ layer:
# abstract.py — CWL export
nf2 = ensure_format2(workflow_dict)
for step in nf2.steps:
if isinstance(step.run, NormalizedFormat2):
# Recurse into subworkflow
...
# cytoscape.py — visualization
nf2 = ensure_format2(workflow_path)
for step in nf2.steps:
for step_input in step.in_:
ref = nf2.resolve_source(step_input.source)
# Build edge from ref.step_label → current step
# lint.py — validation
nf2 = ensure_format2(workflow_dict, expand=True) # ExpandedFormat2
nnw = ensure_native(workflow_dict) # For native-specific checks
Normalized and Expanded Models#
For cases where you know the input format and don’t need auto-detection,
the gxformat2.normalized package provides direct constructors.
For cross-format conversion, to_format2()
and to_native() convert between formats and
return normalized (or expanded) models.
Cross-Format Conversion#
to_format2() converts native workflows to
Format2, and to_native() converts Format2
workflows to native. Both accept the source format as a raw dict, file
path, or typed model, and return normalized models in the target format:
from gxformat2.normalized import to_format2
from gxformat2.normalized import to_native
# Native → Format2
nf2 = to_format2(native_dict)
nf2 = to_format2(normalized_native_workflow)
# Format2 → Native
nnw = to_native(format2_dict)
nnw = to_native(normalized_format2_model)
# With expansion (resolves subworkflow refs in the result)
ef2 = to_format2(native_dict, options=options, expand=True)
enw = to_native(format2_dict, options=options, expand=True)
Unlike the ensure_ functions, these only accept the other format as
input — to_format2() expects native input,
to_native() expects Format2 input. Use
ensure_ when you don’t know (or don’t care about) the input format.
Normalized Models#
NormalizedFormat2 guarantees:
steps,inputs,outputsare alwayslist(never dict)Step and input
idfields are always populatedInput type shorthands expanded (
"File"→"data", etc.)docfields joined from list to string$graphmulti-workflow documents resolved$linkentries in stepstateresolved — replaced with{"__class__": "ConnectedValue"}in state, connection source added tostep.in_. After normalization,step.in_contains ALL connections andstep.stateis cleanStep
type_always populated — inferred from context when not explicit in the YAML (runpresent →subworkflow, otherwise →tool). Convenience propertiesis_tool_step,is_subworkflow_step,is_pause_step,is_pick_value_step, andconnected_paths(frozenset of input ids with a source) are also availableunique_toolsproperty returnsfrozenset[ToolReference]of all(tool_id, tool_version)pairs, recursing into inline subworkflows (unresolved@import/URL refs are skipped; use expanded models for full coverage)
NormalizedNativeWorkflow guarantees:
tool_statealways a parseddict(JSON strings auto-decoded)Optional containers default to empty (never
None)input_connectionsvalues alwayslist[NativeInputConnection](single connections wrapped during construction — noisinstancechecks needed)connected_pathsproperty returnsfrozenset[str]ofinput_connectionskeys — O(1) membership test for checking whether a state path has an incoming connectionunique_toolsproperty returnsfrozenset[ToolReference]of all(tool_id, tool_version)pairs, recursing into inline subworkflowsTags normalized (empty string → empty list)
Subworkflows recursively normalized
All normalized models provide a to_dict() method that serializes to
a JSON/YAML-compatible dict with aliases resolved and None values
stripped:
nf2 = ensure_format2(workflow)
workflow_dict = nf2.to_dict() # ready for json.dump() or yaml.dump()
nnw = ensure_native(workflow)
native_dict = nnw.to_dict() # includes "a_galaxy_workflow", "format-version", etc.
from gxformat2.normalized import normalized_format2, normalized_native
# From same-format input
nf2 = normalized_format2(format2_dict)
nf2 = normalized_format2("/path/to/workflow.gxwf.yml")
nf2 = normalized_format2(galaxy_workflow_model)
nnw = normalized_native(native_dict)
nnw = normalized_native("/path/to/workflow.ga")
nnw = normalized_native(native_galaxy_workflow_model)
# For native dicts, use ensure_format2 or to_format2 instead:
from gxformat2.normalized import ensure_format2
nf2 = ensure_format2(native_dict)
Normalized models use extra="ignore": unknown keys in the input dict
are silently dropped and never appear in to_dict() output. To reject
them instead, pass strict_structure=True — the raw dict is validated
against the strict schema (extra="forbid") before normalization and
a pydantic.ValidationError is raised on any unrecognised keys:
from pydantic import ValidationError
try:
nf2 = normalized_format2(format2_dict, strict_structure=True)
nnw = normalized_native(native_dict, strict_structure=True)
except ValidationError as e:
... # extra/unknown keys present
GalaxyUserToolStub and ImportReference (opaque passthroughs) keep
extra="allow" regardless of this flag. The flag is also available on
ConversionOptions for the to_* / ensure_* / expanded_* functions.
Expanded Models#
ExpandedFormat2 and
ExpandedNativeWorkflow inherit from their
normalized counterparts and additionally guarantee all external references
are resolved to inline model instances:
ExpandedFormat2: everystep.runisExpandedFormat2 | None(no URL strings, no@importdicts)ExpandedNativeWorkflow: everystep.subworkflowisExpandedNativeWorkflow | None(nocontent_idURL refs)
from gxformat2.normalized import expanded_format2, expanded_native
from gxformat2.options import ConversionOptions
options = ConversionOptions(workflow_directory="/path/to/dir")
ef2 = expanded_format2(format2_dict, options)
enw = expanded_native(native_dict, options)
# Also accepts already-normalized models
ef2 = expanded_format2(nf2, options)
Expansion fetches @import paths, HTTP/TRS URLs, and $graph references.
Circular references raise ValueError. Max depth is 10.
Use expanded models when you need to traverse into subworkflows (linting, full-tree analysis). Use normalized models when you only need the top-level structure (visualization, simple metadata extraction).
Schema Models — Direct Pydantic Validation#
The schema models are auto-generated from schema-salad YAML definitions
using schema-salad-plus-pydantic. They provide direct pydantic parsing
with no normalization — fields appear exactly as in the source document.
Lax Models (extra=”allow”)#
Accept unknown fields. Use for real-world workflows that may contain Galaxy-version-specific or undocumented fields:
from gxformat2.schema.gxformat2 import GalaxyWorkflow
from gxformat2.schema.native import NativeGalaxyWorkflow
gw = GalaxyWorkflow.model_validate(format2_dict)
ngw = NativeGalaxyWorkflow.model_validate(native_dict)
Strict Models (extra=”forbid”)#
Reject unknown fields. Use for schema compliance checking:
from gxformat2.schema.gxformat2_strict import GalaxyWorkflow as StrictFormat2
from gxformat2.schema.native_strict import NativeGalaxyWorkflow as StrictNative
StrictFormat2.model_validate(format2_dict) # raises on extra fields
StrictNative.model_validate(native_dict) # raises on extra fields
The linter uses both: strict failure with lax success means only extra fields are present (warning), while lax failure indicates fundamental schema errors (error).
Key Types from the Schema#
These types are used throughout the model hierarchy:
from gxformat2.schema.gxformat2 import (
GalaxyType, # Enum: data, collection, string, int, float, ...
WorkflowStepType, # Enum: tool, subworkflow, pause, pick_value
WorkflowInputParameter, # Input definition
WorkflowOutputParameter, # Output definition
WorkflowStepInput, # Step input (source, default)
WorkflowStepOutput, # Step output (id)
CreatorPerson, # Creator with name, identifier, ...
CreatorOrganization, # Organization creator
Report, # Invocation report template
)
from gxformat2.schema.native import (
NativeStepType, # Enum: data_input, data_collection_input, ...
NativeInputConnection, # Step input connection (id, output_name)
NativePostJobAction, # Post-job action definition
)
Raw Dict Access#
If you need to work with workflow dicts without typed models (e.g., for forward-compatibility with fields not yet in the schema), load them directly:
from gxformat2.yaml import ordered_load, ordered_load_path
# From file
workflow_dict = ordered_load_path("workflow.ga")
# From stream
with open("workflow.gxwf.yml") as f:
workflow_dict = ordered_load(f)
# Format detection
is_native = workflow_dict.get("a_galaxy_workflow") == "true"
is_format2 = workflow_dict.get("class") == "GalaxyWorkflow"
The schema models document what fields to expect — refer to
gxformat2.schema.gxformat2 and gxformat2.schema.native
for field names, types, and aliases.
Slicing into Workflow Components#
Sometimes you don’t need the full workflow model — you just want the
inputs, the outputs, or the steps. The gxformat2.normalize
module provides focused accessors that extract individual pieces from
any workflow representation, handling format detection, conversion, and
normalization behind the scenes.
from gxformat2.normalize import inputs, outputs, steps
# From a file path, a raw dict, or any typed model
workflow_inputs = inputs("workflow.ga")
workflow_outputs = outputs(format2_dict)
all_steps = steps(normalized_native_workflow)
inputs() returns a list of
WorkflowInputParameter models,
outputs() returns
WorkflowOutputParameter models,
and steps() returns input parameters
followed by NormalizedWorkflowStep
models — the same objects you’d get from the full
NormalizedFormat2 model, just without
needing to build one yourself.
All three accept the same arguments:
from gxformat2.options import ConversionOptions
# With conversion options and expansion
opts = ConversionOptions(workflow_directory="/path/to/dir")
expanded_inputs = inputs(workflow_dict=wf, options=opts, expand=True)
Dict-returning variants (deprecated)#
The older inputs_normalized(),
outputs_normalized(), and
steps_normalized() functions return the
same data as plain dicts instead of typed models. These are retained for
backward compatibility with tools like Planemo:
from gxformat2.normalize import inputs_normalized
# Returns list[dict] — each dict has "id", "type", "default", etc.
input_dicts = inputs_normalized(workflow_path="workflow.gxwf.yml")
New code should prefer the typed accessors above.