Line plots and Errorbars#
Summary
On this page, you will learn how to
use
.plot.facet_grid.line
for line plots, anduse
.plot.facet_grid.errorbars
or.plot.facet_grid.errorbands
for errorbars and errorbands.
Complete example: Line plot
line_plot:
based_on:
- .creator.universe
- .plot.facet_grid.line
# Select only the 'infected' agents
select:
data:
path: densities
transform:
- .sel: [!dag_prev , { kind: [infected] }]
# --- Optional additions ---------------------------------------------
# Use latex and set font and figure sizes. We recommend using YAML
# to globally define such features across plots.
# See the 'Customising plot styles' page for details.
style:
text.usetex: True
figure.figsize: [5, 4]
font.size: 10
# Set the line color
color: '#CC3333'
# Set labels and titles using the PlotHelper
helpers:
set_labels:
y: Density [1/A]
set_title:
title: Density of infected agents
Complete example: Errorbar plot
errorbars:
based_on:
- .creator.multiverse
- .plot.facet_grid.errorbars
# Select only the 'infected' kind of agents
select_and_combine:
fields:
infected:
path: densities
transform:
- .sel: [!dag_prev , { kind: infected }]
# Select a subspace of the sweep parameters that was performed here
# (not needed if you are only sweeping over 'seed'. See the 'subspace' page
# for details).
subspace:
transmission rate: [0.6]
immunity rate: [0]
# Get time coordinates, and calculate mean and std
transform:
- .coords: [!dag_tag infected, time]
tag: time
- .mean: [!dag_tag infected, [seed]]
tag: infected_mean
- .std: [!dag_tag infected, [seed]]
tag: infected_std
- xr.Dataset:
- avg: !dag_tag infected_mean
err: !dag_tag infected_std
tag: data # Don't forget to define the 'data' tag
# Distribute the data dimensions
x: time
y: avg
yerr: err
# These kwargs are passed to matplotlib.pyplot.errorbar
capsize: 2
color: '#CC3333'
elinewidth: 0.5
# Set the helpers
helpers:
set_labels:
x: Time [steps]
y: Density [1/A]
set_title:
title: Density of infected agents
A simple line plot#
Let us begin with a very simple, yet very common example: line plots, and line plots with errorbars. We will use the Utopia SEIRD model as an example, but it should always be very obvious how to adapt the configurations to your specific case.
The SEIRD model simulates the densities of different classes of agents over time: those who are susceptible to the disease, those who are infected, those who have recovered, and so on. Let’s say we just want to plot the number of infected agents over time. Here is a plot configuration that generates such a plot:
infected_density:
based_on:
- .creator.universe
- .plot.facet_grid.line
select:
data:
path: data/SEIRD/densities
transform:
- .sel: [ !dag_prev , { kind: [infected] }]
There are two top-level entries to this config: based_on
and select
:
based_on
is telling the plotting framework which underlying functions to use for this plot..creator.universe
is telling us that this is auniverse
plot. If you were to perform a multiverse run, theuniverse creator
would then create a separate plot for every run..plot.facet_grid.line
is the actual plot function we are using. This is apre-implemented function
, and will plot a simple line.
select
selects the data. We include the path to the data (you should adapt this to your own case), and select thekind
of agents we want (here: only the infected ones). Thedag_prev!
tag tells the data transformation framework to apply the selection filter to the previous node in the DAG, the previous node in this case being thedensities
dataset (more specifically: its result, axr.DataArray
).
Hint
You can think of the syntax as analagous to mapping to a Python function call:
data = select(path="data/SEIRD/densities").sel({"kind": ["infected"]})
with select
being a custom data selection function and !dag_prev
having turned into the attribute call on its return
value.
Hint
If you’re wondering why the plot function is called facet_grid.line
and not something simpler like just line
, take a look at the Stacked plots and Facet grids section.
In brief: The facet_grid()
plot function is capable of much more and we are using only a small aspect of it here.
Note
You must always leave a space after a DAG tag, e.g. after !dag_tag
or !dag_prev
.
This is the output:
Not bad! By default, you’ll get an infected_densities.pdf
output in your output directory.
If, for example, you want a png
file instead, add the following entries:
infected_density:
# all the previous entries ...
file_ext: png
style:
savefig.dpi: 300
The savefig.dpi
key is optional; you can use it increase the resolution on your plots, e.g. for publications.
Another thing we may want to do is plot several lines all in one plot – for that, see the next section on facet grids.
Changing the appearance#
Now let’s make the whole thing a bit prettier by adding a title and axis labels, changing the color, and using latex:
infected_density:
# ...
# Add this to the configuration from above:
style:
text.usetex: true
figure.figsize: [5, 4]
font.size: 10
color: crimson
helpers:
set_labels:
y: Density [1/A]
set_title:
title: Density of infected agents
The helpers
entry sets labels and titles for your axes, among other things.
We’ll go into more detail about customising the aesthetics in Customising plot styles; for now, these few changes are enough to create a much cleaner plot:
Plotting errorbars#
In probabilistic modelling, you naturally want to be sure that your outputs are not just a coincidence, an artefact of running the model with some ‘lucky’ seed, but actually statistically significant effects. To get some statistics on your outputs, you may therefore wish to run the model over several different seeds, and plot an averaged output with some errorbars.
Let’s run our SEIRD model over a number of different seeds, and plot the resulting curve of infected agents with errorbars:
averaged_infected_density:
based_on:
- .creator.multiverse
- .plot.facet_grid.errorbars
select_and_combine:
fields:
infected:
path: data/SEIRD/densities
transform:
- .sel: [ !dag_prev , { kind: [infected] }]
transform:
# Get the x-axis
- .coords: [!dag_tag infected , time]
tag: time
# Calculate mean and standard deviations along the 'seed' dimension
- .mean: [!dag_tag infected, seed]
tag: infected_mean
- .std: [!dag_tag infected, seed]
tag: infected_std
# Bundle everything together
- xr.Dataset:
- avg: !dag_tag infected_mean
err: !dag_tag infected_std
tag: data
x: time
y: avg
yerr: err
# Additional kwargs, passed to the plot function
elinewidth: 0.5
capsize: 2
color: crimson
Several things are important:
First, this is a
multiverse
plot, so we must base the plot on the.creator.multiverse
, as well as on.plot.facet_grid.errorbars
to get the errorbars.For multiverse plots, you must use the
select_and_combine
key to select data: This will assemble a multidimensional dataset with labelled axes, enabling selection along parameter dimensions.We have added a new block to our configuration: the
transform
block. This is the transformation part of our data analysis, and is telling the DAG how to process the data. Let’s go through it step by step:First, we extract the x-axis of the plot by selecting the
time
coordinate:- .coords: [!dag_tag infected, time] tag: time
Then we calculate the averages and errors using the
.mean
and.std
operations. Note how these operations are applied to the!dag_tag infected
node of the DAG:- .mean: [!dag_tag infected, seed] tag: infected_mean
We are averaging the number of infected agents over the
seed
dimension, and giving it atag
, so that we can later reference this step in the tranformation. Calculating the variance is analogous.Lastly, we bundle everything up into an
xarray.Dataset
:- xr.Dataset: - avg: !dag_tag infected_mean err: !dag_tag infected_std tag: data
The data variables (
data_vars
) are the averages and standard deviations we calculated previously, and we can reference them using the!dag_tag
s we assigned them.
Then we call the
plot function
, telling it which data variables to plot where by specifying thex
,y
, andyerr
keys. Any additional keys (such as the errorbar line width) are passed to the low-level plot function,matplotlib.pyplot.errorbar()
, giving us this output:
Pretty neat – but it really looks like continuous errorbands are the way to go with such a high number of data points. All you need to do is to change the plot everything is based on:
based_on:
# ...
# - .plot.facet_grid.errorbars
- .plot.facet_grid.errorbands # sets use_bands: true
# ...
Much better!