abstract_dataloader.ext.graph
¶
Composing transforms for data processing pipelines.
Programming Model
- Data is represented as a dictionary with string keys and arbitrary values which are atomic from the perspective of transform composition.
- Transforms are created from a directed acyclic graph (DAG) of nodes,
where each node (
Node
) is a callable which takes a set of inputs and produces a set of outputs.
abstract_dataloader.ext.graph.Node
dataclass
¶
Node specification for a graph-based data processing transform.
Example Hydra Config
Attributes:
Name | Type | Description |
---|---|---|
callable |
callable to apply to the inputs. |
|
output |
str | Sequence[str]
|
output data key (or output data keys for a node which returns multiple outputs). |
inputs |
Mapping[str, str]
|
mapping of data keys to input argument names. |
optional |
Mapping[str, str]
|
mapping of optional data keys to input argument names. |
Source code in src/abstract_dataloader/ext/graph.py
apply
¶
Apply the node.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data
|
dict[str, Any]
|
input data to process. |
required |
name
|
str
|
node name (for error messages). |
''
|
Returns:
Type | Description |
---|---|
dict[str, Any]
|
Updated data, with any new keys added to the input data. |
Source code in src/abstract_dataloader/ext/graph.py
abstract_dataloader.ext.graph.Transform
¶
Bases: Transform[dict[str, Any], dict[str, Any]]
Compose multiple callables forming a DAG into a transform.
Warning
Since the input data specifications are not provided at initialization, the graph execution order (or graph validity) is not statically determined, and result in runtime errors if invalid.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
outputs
|
Mapping[str, str] | None
|
output data keys to produce as a mapping of output keys to
graph data keys. If |
None
|
keep_all
|
bool
|
keep references to all intermediate values instead of decref-ing values which are no longer needed. |
False
|
nodes
|
Node | dict[str, Any]
|
nodes in the graph, as keyword arguments where the key indicates
a reference name for the node; any |
{}
|
Source code in src/abstract_dataloader/ext/graph.py
__call__
¶
Execute the transform graph on the input data.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data
|
dict[str, Any]
|
input data to process. |
required |
Returns:
Type | Description |
---|---|
dict[str, Any]
|
Processed data. |