Distributions

This module provides plotting functions to visualize distributions.

utopya.plot_funcs.distribution.histogram(dm: utopya.datamanager.DataManager, *, uni: utopya.datagroup.UniverseGroup, hlpr: utopya.plotting.PlotHelper, model_name: str, path_to_data: str, histogram_kwargs: Optional[dict] = None, use_unique: bool = False, preprocess: Optional[Tuple[Union[dict, str]]] = None, postprocess: Optional[Tuple[Union[dict, str]]] = None, mask_repeated: bool = False, show_histogram_info: bool = True, transformations_log_level: int = 10, pyplot_func_name: str = 'bar', **pyplot_func_kwargs)[source]

Calculates a histogram from the data and plots it.

This function is very versatile. Its capabilities range from a plain old histogram (only required arguments set) to the plot of a complementary cumulative probability distribution function.

Don’t despair. The documentation of arguments below should give a good idea of what each parameter does.

Parameters
  • dm (DataManager) – The data manager from which to retrieve the data

  • uni (UniverseGroup) – The selected universe data

  • hlpr (PlotHelper) – The PlotHelper that instantiates the figure and takes care of plot aesthetics (labels, title, …) and saving

  • model_name (str) – The model name that the data resides in

  • path_to_data (str) – The path to the data relative to the model data output

  • histogram_kwargs (dict, optional) – Passed to np.histogram. This can be used to adjust the number of bins or set the range the bins should be spread over; the latter also allows to pass a 2-tuple containing None, which will be resolved to data.min() or data.max(). See np.histogram documentation for other arguments.

  • use_unique (bool, optional) – If this option is set, will not do a regular histogram but count unique values.

  • preprocess (Tuple[Union[dict, str]], optional) – Apply pre-processing transformations to the selected data. With the parameters specified here, multiple transformations can be applied. This can be used for dimensionality reduction of the data, but also for other operations, e.g. to select a slice. The operations are carried out before calculating the histogram. For available parameters, see utopya.dataprocessing.transform()

  • postprocess (Tuple[Union[dict, str]], optional) – Same as preprocess but applied _after_ the histogram was computed.

  • mask_repeated (bool, optional) – In use_unique mode, will mask the counts such that repeated values are not shown.

  • show_histogram_info (bool, optional) – Whether to show an info box in the top right-hand corner

  • transformations_log_level (int, optional) – With which log level to perform the transformations. Useful for debugging.

  • pyplot_func_name (str, optional) – The name of the matplotlib.pyplot function to use for plotting. By default, a bar plot is performed. For unique data, it might make more sense to do a line or scatter plot. Note that for the bar plot, the bar widths are automatically passed to the plot call and can not be adjusted.

  • **pyplot_func_kwargs – The kwargs passed on to the pyplot function chosen via the pyplot_func_name argument.

Raises

ValueError – When trying to make a bar plot with use_unique option enabled.


Histogram

utopya.plot_funcs.distribution.histogram(dm: utopya.datamanager.DataManager, *, uni: utopya.datagroup.UniverseGroup, hlpr: utopya.plotting.PlotHelper, model_name: str, path_to_data: str, histogram_kwargs: Optional[dict] = None, use_unique: bool = False, preprocess: Optional[Tuple[Union[dict, str]]] = None, postprocess: Optional[Tuple[Union[dict, str]]] = None, mask_repeated: bool = False, show_histogram_info: bool = True, transformations_log_level: int = 10, pyplot_func_name: str = 'bar', **pyplot_func_kwargs)[source]

Calculates a histogram from the data and plots it.

This function is very versatile. Its capabilities range from a plain old histogram (only required arguments set) to the plot of a complementary cumulative probability distribution function.

Don’t despair. The documentation of arguments below should give a good idea of what each parameter does.

Parameters
  • dm (DataManager) – The data manager from which to retrieve the data

  • uni (UniverseGroup) – The selected universe data

  • hlpr (PlotHelper) – The PlotHelper that instantiates the figure and takes care of plot aesthetics (labels, title, …) and saving

  • model_name (str) – The model name that the data resides in

  • path_to_data (str) – The path to the data relative to the model data output

  • histogram_kwargs (dict, optional) – Passed to np.histogram. This can be used to adjust the number of bins or set the range the bins should be spread over; the latter also allows to pass a 2-tuple containing None, which will be resolved to data.min() or data.max(). See np.histogram documentation for other arguments.

  • use_unique (bool, optional) – If this option is set, will not do a regular histogram but count unique values.

  • preprocess (Tuple[Union[dict, str]], optional) – Apply pre-processing transformations to the selected data. With the parameters specified here, multiple transformations can be applied. This can be used for dimensionality reduction of the data, but also for other operations, e.g. to select a slice. The operations are carried out before calculating the histogram. For available parameters, see utopya.dataprocessing.transform()

  • postprocess (Tuple[Union[dict, str]], optional) – Same as preprocess but applied _after_ the histogram was computed.

  • mask_repeated (bool, optional) – In use_unique mode, will mask the counts such that repeated values are not shown.

  • show_histogram_info (bool, optional) – Whether to show an info box in the top right-hand corner

  • transformations_log_level (int, optional) – With which log level to perform the transformations. Useful for debugging.

  • pyplot_func_name (str, optional) – The name of the matplotlib.pyplot function to use for plotting. By default, a bar plot is performed. For unique data, it might make more sense to do a line or scatter plot. Note that for the bar plot, the bar widths are automatically passed to the plot call and can not be adjusted.

  • **pyplot_func_kwargs – The kwargs passed on to the pyplot function chosen via the pyplot_func_name argument.

Raises

ValueError – When trying to make a bar plot with use_unique option enabled.