Debugging DAG computations#

As you saw throughout the plotting tutorial, the data transformation framework can be a very powerful tool to prepare data for plotting.

But what about the case where this goes wrong? How can the DAG be debugged?

This page presents some approaches on how to address errors in DAG computations:

Read the error log #

This is the first step towards understanding what’s going on. The data transformation framework aims to make error messages as understandable and helpful as possible.

Let’s look at some examples.

Invalid operation name#

select:
  susceptible:
    path: densities
    transform:
      - .sel: [!dag_prev , {kind: susceptible}]
      - square: [!dag_prev ]  # does not exist

Here, we are trying to select some data from the SEIRD model but have used an operation name (square) that does not exist. Creating a plot with this operation will definitely fail and generate an error message like this:

BadOperationName: Could not find an operation or meta-operation named 'square'!

No operation 'square' registered! Did you mean: squared, sqrt ?
Available operations:
  !=                                       .
  .()                                      .T
  .all                                     .any
  .argmax                                  .argmin
  .argpartition                            .argsort
  .assign                                  .assign_attrs

  . . .

From the error message, it’s quite clear what’s going on: We need to choose the correct operation name. We also get a list of available operations and even get suggestions for similar names – and we can just follow those: Using squared instead of square will solve our problems.

Controlling exceptions in the plotting framework

Error messages in the plotting framework are typically caught and re-raised by the plotting framework, resulting in PlotCreatorError messages like this:

PlotCreatorError: An error occurred during plotting with
UniversePlotCreator for 'debug_DAG_bad_op_name'!
To ignore the error message and continue plotting with the other
plots, specify `debug: False` in the plot configuration or disable
debug mode for the PlotManager.

This also tells us how to control whether an error will be raised or not:

my_plot:
  debug: true   # raise an error and stop plotting

As usual throughout Utopia, the CLI --debug flag also controls this behavior.

Failing operation#

The case of an operation failing is a bit trickier, as it depends on what the operation does in particular. Let’s look at an example where we pass a wrong argument to an operation:

select:
  susceptible:
    path: densities
    transform:
      - .data   # to resolve the utopya XarrayDC into an xr.DataArray
      - .sel: [!dag_prev , {kind: SuSCePTIble}]

The error output from that will be something like the following:

DataOperationFailed: Operation '.sel' failed with a KeyError, see below!
It was called with the following arguments:
  args:
     0:  <xarray.DataArray 'densities' (time: 151, kind: 8)>
array([[0.5968, 0.4032, 0.    , ..., 0.    , 0.    , 0.    ],
       [0.5968, 0.4032, 0.    , ..., 0.    , 0.    , 0.    ],
       [0.5968, 0.4016, 0.0012, ..., 0.    , 0.    , 0.    ],
       ...,
       [0.5968, 0.    , 0.    , ..., 0.    , 0.    , 0.    ],
       [0.5968, 0.    , 0.    , ..., 0.    , 0.    , 0.    ],
       [0.5968, 0.    , 0.    , ..., 0.    , 0.    , 0.    ]])
Coordinates:
  * time  (time) int64 0 1 2 3 4 5 6 7 8 ... 143 144 145 146 147 148 149 150
  * kind  (kind) <U11 'empty' 'susceptible' 'exposed' ... 'source' 'inert'

     1:  {'kind': 'SuSCePTIble'}

  kwargs:  {}

KeyError: 'SuSCePTIble'

What can we learn from that message?

Operation .sel failed, so we know where the error occurred.
We got a KeyError for the given key SuSCePTIble
We see the arguments that were passed to .sel, marked as positional args 0 and 1 … and the given xarray.DataArray does not have a key SuSCePTIble in the kind coordinate dimension!

From that we can deduce: The key actually has to be susceptible.

Now this was comparably straight-forward, but you get the idea.

Hint

In the data operation above, we have added the .data operation, which resolves the previous object from a utopya.eval.containers.XarrayDC into a regular xarray.DataArray object. This makes debugging much easier because it shows the actual content of the array.

Look at the DAG visualization #

What about a case where it’s harder to locate where an error comes from, e.g. if there are multiple operations.

select:
  kind:
    path: densities

transform:
  - .sel: [!dag_tag kind, {kind: susceptible}]
    kwargs: {drop: true}
    tag: susceptible
  - .sel: [!dag_tag kind, {kind: exposed}]
    kwargs: {drop: true}
    tag: exposed
  - .sel: [!dag_tag kind, {kind: infected}]
    kwargs: {drop: true, bAd_ArGuMeNT: i should not be here! }
    tag: infected
  - .sel: [!dag_tag kind, {kind: recovered}]
    kwargs: {drop: true}
    tag: recovered

  - xr.Dataset:
    - susceptible: !dag_tag susceptible
      exposed: !dag_tag exposed
      infected: !dag_tag infected
      recovered: !dag_tag recovered
    tag: data

Again, one is obviously wrong here, but because there are many .sel operations, it’s not immediately clear which one.

Let’s have a look at the terminal log again. You may have noticed already earlier, that something like the following is printed alongside the error:

NOTE     base              Creating DAG visualization (scenario: 'compute_error') ...
NOTE     dag               Generating DAG representation for 7 tags ...

. . .

CAUTION  base              Created DAG visualization for scenario 'compute_error'. For debugging, inspecting the generated plot and the traceback information may be helpful: ... dag_compute_error.pdf
ERROR    plot_mngr         An error occurred during plotting with UniversePlotCreator ...

Here, the plotting framework automatically created a visualization of the DAG to help with debugging. It calls this scenario a compute_error, because that’s what happened: The DAG computation failed and that’s why such a visualization is created. The log also tells you where the file was saved to, typically it ends up right beside where the plot should have been created.

Let’s look at the generated DAG visualization:

A DAG visualization in a failed scenario

This tells us a lot:

The light red node is where the operation failed, while computing the infected tag.
Subsequently, the xr.Dataset operation in the end cannot be carried out.
The remaining node colors show which operations succeeded and which ones were only prepared for computation but not actually carried out.

The DAG visualization is a powerful way of understanding what is going on and if the DAG structure is actually the way you expected it to be.

The visualization feature is controlled via the dag_visualization entry of your plot configuration. Read more about this feature and available parameters in the dantro docs.

How to always create a DAG visualization

By default, DAG visualizations are only created if the DAG computation failed for whatever reason.

To always create a DAG visualization, regardless of that, inherit the .dag.vis.always base configuration.

my_plot:
  based_on:
    # ...
    - .dag.vis.always
    # ...

… which translates to …

my_plot:
  dag_visualization:
    enabled: true
    when:
      always: true
      only_once: true

DAG visualization with multiverse plots

For multiverse plots, DAG visualization may not generate a usable figure, given the potentially large number of nodes. In such cases it makes sense to temporarily restrict the plot to a subspace:

my_multiverse_plot:
  select_and_combine:
    subspace:
      some_dim: [0, 1]
      another_dim: [foo, bar]

Ideally, use the same dimensionality as in the case you want to debug.

Hint

Node positioning drastically improves with pygraphviz installed in the utopia-env.

Warning

DAG visualization is only available after the DAG has been fully constructed. If you make errors during construction, like setting a tag multiple times, the visualization will not be able to help you.

Print debug information #

Even with the DAG visualization helping us in understanding the DAG structure, we may need to probe the computation progress in more detail.

To the rescue: The good ol’ print operation!

Basically, you can insert the print operation at any point between two operations to probe the state in between. A few examples:

# Example 1 --- Tagged operation to probe
- .mean: [!dag_tag some_data, [foo, bar]]
  tag: my_result

#    ... and again with print added before and after
- print: [!dag_tag some_data]
- .mean: [!dag_prev , [foo, bar]]
- print: !dag_prev
  tag: my_result

#    ... more verbose example that allows commenting out of prints
- pass: [!dag_tag some_data]
- print                                  # can comment this out
- .mean: [!dag_prev , [foo, bar]]
- print                                  # can comment this out
- pass: !dag_prev
  tag: my_result


# Example 2 --- Untagged operation to probe
- .mean: [!dag_tag some_data, [foo, bar]]

#    ... and again with print added before and after
- print: [!dag_tag some_data]
- .mean: [!dag_prev , [foo, bar]]
- print

As you see, there are many different ways of doing this. Choose the one that suits you best, the most important aspect is to get the print in there.

Note

In case the computation is actually succeeding but does not have the expected results, you can elicit an error simply by adding a dependent operation that fails:

transform:
  # ...
  - print
  - .mean: [!dag_prev , [foo, bar]]  # operation to probe
  - print
  - raise                            # operation doesn't exist -> error

The underlying function for print is actually not Python’s print() , but dantro.data_ops.ctrl_ops.print_data(), which has further capabilities:

- print: [!dag_tag some_data]
  kwargs:
    fstr: "Data before .mean operation:\n{}"

- .mean   # <-- operation to probe, could of course also have arguments

- print: !dag_prev
  kwargs:
    fstr: "Data after .mean operation:\n{}"

Further approaches #

If the above does not help in isolating the error, there are a bunch of other things you can try:

Check the definition of a failing operation in the operations database and how certain operations are defined.
- dantro-based operations are documented in dantro.data_ops
- Documentation for operations that perform method calls (like .mean does) need to be checked in the respective package documentation. For instance, this may be xarray.DataArray.mean(), xarray.Dataset.mean(), pandas.DataFrame.mean(), or other packages’ .mean methods.
Have a look at the dantro DAG troubleshooting section.
In case the error appears not during computation but in the plot function, check the format (dimensionality, shape, coordinate labels etc.) in which the plot function expects to receive data.
If all of that does not work out, you can try to create a toy example in an interactive python session to find out how the objects behave.

Open an issue and ask for help #

If all of the above approaches did not succeed, we are more than happy to assist. Feel free to open an issue in the Utopia GitLab project.

For bug reports or suggestions to improve the DAG framework, we are welcoming your feedback in the dantro GitLab project.

Debugging DAG computations

Contents

Debugging DAG computations#

Read the error log#

Invalid operation name#

Failing operation#

Look at the DAG visualization#

Print debug information#

Further approaches#

Open an issue and ask for help#

Read the error log #

Look at the DAG visualization #

Print debug information #

Further approaches #

Open an issue and ask for help #