Line plots and Errorbars#

Summary

On this page, you will learn how to

  • use .plot.facet_grid.line for line plots, and

  • use .plot.facet_grid.errorbars or .plot.facet_grid.errorbands for errorbars and errorbands.

A simple line plot#

Let us begin with a very simple, yet very common example: line plots, and line plots with errorbars. We will use the Utopia SEIRD model as an example, but it should always be very obvious how to adapt the configurations to your specific case.

The SEIRD model simulates the densities of different classes of agents over time: those who are susceptible to the disease, those who are infected, those who have recovered, and so on. Let’s say we just want to plot the number of infected agents over time. Here is a plot configuration that generates such a plot:

infected_density:

  based_on:
    - .creator.universe
    - .plot.facet_grid.line

  select:
    data:
      path: data/SEIRD/densities
      transform:
        - .sel: [ !dag_prev , { kind: [infected] }]

There are two top-level entries to this config: based_on and select:

  • based_on is telling the plotting framework which underlying functions to use for this plot.

    • .creator.universe is telling us that this is a universe plot. If you were to perform a multiverse run, the universe creator would then create a separate plot for every run.

    • .plot.facet_grid.line is the actual plot function we are using. This is a pre-implemented function, and will plot a simple line.

  • select selects the data. We include the path to the data (you should adapt this to your own case), and select the kind of agents we want (here: only the infected ones). The dag_prev! tag tells the data transformation framework to apply the selection filter to the previous node in the DAG, the previous node in this case being the densities dataset (more specifically: its result, a xr.DataArray).

Hint

You can think of the syntax as analagous to mapping to a Python function call:

data = select(path="data/SEIRD/densities").sel({"kind": ["infected"]})

with select being a custom data selection function and !dag_prev having turned into the attribute call on its return value.

Hint

If you’re wondering why the plot function is called facet_grid.line and not something simpler like just line, take a look at the Stacked plots and Facet grids section. In brief: The facet_grid() plot function is capable of much more and we are using only a small aspect of it here.

Note

You must always leave a space after a DAG tag, e.g. after !dag_tag or !dag_prev .

This is the output:

A simple line plot

Not bad! By default, you’ll get an infected_densities.pdf output in your output directory. If, for example, you want a png file instead, add the following entries:

infected_density:

  # all the previous entries ...

  file_ext: png
  style:
     savefig.dpi: 300

The savefig.dpi key is optional; you can use it increase the resolution on your plots, e.g. for publications.

Another thing we may want to do is plot several lines all in one plot – for that, see the next section on facet grids.

Changing the appearance#

Now let’s make the whole thing a bit prettier by adding a title and axis labels, changing the color, and using latex:

infected_density:
  # ...
  # Add this to the configuration from above:
  style:
    text.usetex: true
    figure.figsize: [5, 4]
    font.size: 10

  color: crimson

  helpers:
    set_labels:
      y: Density [1/A]
    set_title:
      title: Density of infected agents

The helpers entry sets labels and titles for your axes, among other things. We’ll go into more detail about customising the aesthetics in Customising plot styles; for now, these few changes are enough to create a much cleaner plot:

A simple but prettier line plot

Plotting errorbars#

In probabilistic modelling, you naturally want to be sure that your outputs are not just a coincidence, an artefact of running the model with some ‘lucky’ seed, but actually statistically significant effects. To get some statistics on your outputs, you may therefore wish to run the model over several different seeds, and plot an averaged output with some errorbars.

Let’s run our SEIRD model over a number of different seeds, and plot the resulting curve of infected agents with errorbars:

averaged_infected_density:
  based_on:
    - .creator.multiverse
    - .plot.facet_grid.errorbars

  select_and_combine:
    fields:
      infected:
        path: data/SEIRD/densities
        transform:
          - .sel: [ !dag_prev , { kind: [infected] }]

  transform:
    # Get the x-axis
    - .coords: [!dag_tag infected , time]
      tag: time

    # Calculate mean and standard deviations along the 'seed' dimension
    - .mean: [!dag_tag infected, seed]
      tag: infected_mean
    - .std: [!dag_tag infected, seed]
      tag: infected_std

    # Bundle everything together
    - xr.Dataset:
        data_vars:
          avg: !dag_tag infected_mean
          err: !dag_tag infected_std
      tag: data

  x: time
  y: avg
  yerr: err

  # Additional kwargs, passed to the plot function
  elinewidth: 0.5
  capsize: 2
  color: crimson

Several things are important:

  1. First, this is a multiverse plot, so we must base the plot on the .creator.multiverse, as well as on .plot.facet_grid.errorbars to get the errorbars.

  2. For multiverse plots, you must use the select_and_combine key to select data: This will assemble a multidimensional dataset with labelled axes, enabling selection along parameter dimensions.

  3. We have added a new block to our configuration: the transform block. This is the transformation part of our data analysis, and is telling the DAG how to process the data. Let’s go through it step by step:

    • First, we extract the x-axis of the plot by selecting the time coordinate:

      - .coords: [!dag_tag infected, time]
        tag: time
      
    • Then we calculate the averages and errors using the .mean and .std operations. Note how these operations are applied to the !dag_tag infected node of the DAG:

      - .mean: [!dag_tag infected, seed]
        tag: infected_mean
      

      We are averaging the number of infected agents over the seed dimension, and giving it a tag, so that we can later reference this step in the tranformation. Calculating the variance is analogous.

    • Lastly, we bundle everything up into an xarray.Dataset:

      - xr.Dataset:
          data_vars:
            avg: !dag_tag infected_mean
            err: !dag_tag infected_std
        tag: data
      

      The data variables (data_vars) are the averages and standard deviations we calculated previously, and we can reference them using the !dag_tag s we assigned them.

  4. Then we call the plot function, telling it which data variables to plot where by specifying the x, y, and yerr keys. Any additional keys (such as the errorbar line width) are passed to the low-level plot function, matplotlib.pyplot.errorbar(), giving us this output:

An errorbar plot

Pretty neat – but it really looks like continuous errorbands are the way to go with such a high number of data points. All you need to do is to change the plot everything is based on:

based_on:
  # ...
  # - .plot.facet_grid.errorbars
  - .plot.facet_grid.errorbands         # sets use_bands: true
  # ...
An errorbands plot

Much better!