API documentation

Generating a series of forecasts

Model estimation and forecasting is provided as a single function:

pypfilt.forecast(params, start, end, streams, dates, summary, filename)

Generate forecasts from various dates during a simulation.

Parameters:
  • params (dict) – The simulation parameters.
  • start – The start of the simulation period.
  • end – The (exclusive) end of the simulation period.
  • streams – A list of observation streams.
  • dates – The dates at which forecasts should be generated.
  • summary – An object that generates summaries of each simulation.
  • filename – The output file to generate (can be None).
Returns:

The simulation state for each forecast date.

This function returns a dictionary that contains the following keys:

  • 'obs': a (flattened) list of every observation;
  • 'complete': the simulation state obtained by assimilating every observation; and
  • datetime.datetime instances: the simulation state obtained for each forecast, identified by the forecasting date.

The simulation states are generated by pypfilt.run() and contain the following keys:

  • 'params': the simulation parameters;
  • 'summary': the dictionary of summary statistics; and
  • 'hist': the matrix of particle state vectors, including individual particle weights (hist[..., -2]) and the index of each particle at the previous time-step (hist[..., -1]), since these can change due to resampling.

The matrix has dimensions \(N_{Steps} \times N_{Particles} \times (N_{SV} + 2)\) for state vectors of size \(N_{SV}\).

Note: if max_days > 0 was passed to pypfilt.default_params(), only a fraction of the entire simulation period will be available.

Particle filter parameters

Default values for the particle filter parameters are provided:

pypfilt.default_params(model, time_scale, max_days=0, px_count=0)

The default particle filter parameters.

Memory usage can reach extreme levels with a large number of particles, and so it may be necessary to keep only a sliding window of the entire particle history matrix in memory.

Parameters:
  • model – The system model.
  • time_scale – The simulation time scale.
  • max_days – The number of contiguous days that must be kept in memory (e.g., the largest observation period).
  • px_count – The number of particles.

The bootstrap particle filter

The bootstrap particle filter is exposed as a single-step function, which will update particle weights and perform resampling as necessary:

pypfilt.step(params, hist, hist_ix, step_num, when, step_obs, max_back, is_fs)

Perform a single time-step for every particle.

Parameters:
  • params – The simulation parameters.
  • hist – The particle history matrix.
  • hist_ix – The index of the current time-step in the history matrix.
  • step_num – The time-step number.
  • when – The current simulation time.
  • step_obs – The list of observations for this time-step.
  • max_back – The number of time-steps into the past when the most recent resampling occurred; must be either a positive integer or None (no limit).
  • is_fs – Indicate whether this is a forecasting simulation (i.e., no observations). For deterministic models it is useful to add some random noise when estimating, to allow identical particles to differ in their behaviour, but this is not desirable when forecasting.
Returns:

True if resampling was performed, otherwise False.

Running a single simulation

pypfilt.run(params, start, end, streams, summary, state=None, save_when=None, save_to=None)

Run the particle filter against any number of data streams.

Parameters:
  • params (dict) – The simulation parameters.
  • start – The start of the simulation period.
  • end – The (exclusive) end of the simulation period.
  • streams – A list of observation streams.
  • summary – An object that generates summaries of each simulation.
  • state – A previous simulation state as returned by, e.g., this function.
  • save_when – Times at which to save the particle history matrix.
  • save_to – The filename for saving the particle history matrix.
Returns:

The resulting simulation state: a dictionary that contains the simulation parameters ('params'), the particle history matrix ('hist'), and the summary statistics ('summary').

Simulation models

All simulation models should derive the following base class:

class pypfilt.Model

The base class for simulation models, which defines the minimal set of methods that are required.

init(params, vec)

Initialise a matrix of state vectors.

Parameters:
  • params – Simulation parameters.
  • vec – An uninitialised \(P \times S\) matrix of state vectors, for \(P\) particles and state vectors of length \(S\) (as defined by state_size()). To set, e.g., the first element of each state vector to \(1\), you can use an ellipsis slice: vec[..., 0] = 1.
state_size()

Return the size of the state vector.

priors(params)

Return a dictionary of model parameter priors. Each key must identify a parameter by name. Each value must be a function that returns samples from the associated prior distribution, and should have the following form:

lambda r, size=None: r.uniform(1.0, 2.0, size=size)

Here, the argument r is a PRNG instance and size specifies the output shape (by default, a single value).

Parameters:params – Simulation parameters.
update(params, step_date, dt, is_fs, prev, curr)

Perform a single time-step.

Parameters:
  • params – Simulation parameters.
  • step_date – The date and time of the current time-step.
  • dt – The time-step size (days).
  • is_fs – Indicates whether this is a forecasting simulation.
  • prev – The state before the time-step.
  • curr – The state after the time-step (destructively updated).
describe()

Describe each component of the state vector with a tuple of the form (name, smooth, min, max), where name is a descriptive name for the variable/parameter, smooth is a boolean that indicates whether the parameter admits continuous sampling (e.g., post-regularisation), and min and max define the (inclusive) range of valid values. These tuples must be in the same order as the state vector itself.

stat_info()

Describe each statistic that can be calculated by this model as a (name, stat_fn) tuple, where name is a string that identifies the statistic and stat_fn is a function that calculates the value of the statistic.

is_valid(hist)

Identify particles whose state and parameters can be inspected. By default, this function returns True for all particles. Override this function to ensure that inchoate particles are correctly ignored.

Weighted statistics

The pypfilt.stats module provides functions for calculating weighted statistics across particle populations.

pypfilt.stats.cov_wt(x, wt, cor=False)

Estimate the weighted covariance matrix, based on a NumPy pull request.

Equivalent to cov.wt(x, wt, cor, center=TRUE, method="unbiased") as provided by the stats package for R.

Parameters:
  • x – A 2-D array; columns represent variables and rows represent observations.
  • wt – A 1-D array of observation weights.
  • cor – Whether to return a correlation matrix instead of a covariance matrix.
Returns:

The covariance matrix (or correlation matrix, if cor=True).

pypfilt.stats.avg_var_wt(x, weights, biased=True)

Return the weighted average and variance (based on a Stack Overflow answer).

Parameters:
  • x – The data points.
  • weights – The normalised weights.
  • biased – Use a biased variance estimator.
Returns:

A tuple that contains the weighted average and weighted variance.

pypfilt.stats.qtl_wt(x, weights, probs)

Equivalent to wtd.quantile(x, weights, probs, normwt=TRUE) as provided by the Hmisc package for R.

Parameters:
  • x – The numerical data.
  • weights – The weight of each data point.
  • probs – The quantile(s) to compute.
Returns:

The array of weighted quantiles.

pypfilt.stats.cred_wt(x, weights, creds)

Calculate weighted credible intervals.

Parameters:
  • x – The numerical data.
  • weights – The weight of each data point.
  • creds (List(int)) – The credible interval(s) to compute (0..100, where 0 represents the median and 100 the entire range).
Returns:

A dictionary that maps credible intervals to the lower and upper interval bounds.

Simulation metadata

Every simulation data file should include metadata that documents the simulation parameters and working environment. The Metadata class provides the means for generating such metadata:

class pypfilt.summary.Metadata

Document the simulation parameters and system environment for a set of simulations. A black-list (ignore_dict) defines which members of the parameters dictionary will be excluded from this metadata, see filter() for details.

build(params, pkgs=None)

Construct a metadata dictionary that documents the simulation parameters and system environment. Note that this should be generated at the start of the simulation, and that the git metadata will only be valid if the working directory is located within a git repository.

Parameters:
  • params – The simulation parameters.
  • pkgs – A dictionary that maps package names to modules that define appropriate __version__ attributes, used to record the versions of additional relevant packages (see the example below).

By default, the versions of pypfilt, h5py, numpy and scipy are recorded. The following example demonstrates how to also record the installed version of the epifx package:

import epifx
import pypfilt.summary
params = ...
meta = pypfilt.summary.Metadata()
metadata = meta.build(params, {'epifx': epifx})
filter(values, ignore, encode_fn)

Recursively filter items from a dictionary, used to remove parameters from the metadata dictionary that, e.g., have no meaningful representation.

Parameters:
  • values – The original dictionary.
  • ignore – A dictionary that specifies which values to ignore.
  • encode_fn – A function that encodes the remaining values (see encode_value()).

For example, to ignore ['px_range'], ['resample']['rnd'], and 'expect_fn' and 'log_llhd_fn' for every observation system when using epifx:

m = pypfilt.summary.Metadata()
ignore = {
    'px_range': None,
    'resample': {'rnd': None},
    # Note the use of ``None`` to match any key under 'obs'.
    'obs': {None: {'expect_fn': None, 'log_llhd_fn': None}}
}
m.filter(params, ignore, m.encode)
encode(value)

Encode values in a form suitable for serialisation in HDF5 files.

  • Integer values are converted to numpy.int32 values.
  • Floating-point values and arrays retain their data type.
  • All other (i.e., non-numerical) values are converted to UTF-8 strings.
object_name(obj)

Return the fully qualified name of the object as a byte string.

priors(params)

Return a dictionary that describes the model parameter priors.

Each key identifies a parameter (by name); the corresponding value is a byte string representation of the prior distribution, which is typically a numpy.random.RandomState method call.

For example:

{'R0': b'random.uniform(1.0, 2.0)',
 'gamma': b'(1 / random.uniform(1.0, 3.0))'}
pkg_version(module)

Attempt to obtain the version of a Python module.

git_data()

Record the status of the git repository within which the working directory is located (if such a repository exists).

run_cmd(args, all_lines=False, err_val=u'')

Run a command and return the (Unicode) output. By default, only the first line is returned; set all_lines=True to receive all of the output as a list of Unicode strings. If the command returns a non-zero exit status, return err_val instead.

Summary data files

The HDF5 class encapsulates the process of calculating and recording summary statistics for each simulation.

class pypfilt.summary.HDF5(params, obs_list, meta=None, first_day=False, only_fs=False)

Save tables of summary statistics to an HDF5 file.

Parameters:
  • params – The simulation parameters.
  • obs_list – A list of all observations.
  • meta – The simulation metadata; by default the output of Metadata.build() is used.
  • first_day – If False (the default) statistics are calculated from the date of the first observation. If True, statistics are calculated from the very beginning of the simulation period.
  • only_fs – If False (the default) statistics are calculated for the initial estimation simulation and for forecasting simulations. If True, statistics are only calculated for forecasting simulations.
add_tables(*tables)

Add summary statistic tables that will be included in the output file.

save_forecasts(fs, filename)

Save forecast summaries to disk in the HDF5 binary data format.

This function creates the following datasets that summarise the estimation and forecasting outputs:

  • 'data/TABLE' for each table.

The provided metadata will be recorded under 'meta/'.

If dataset creation timestamps are enabled, two simulations that produce identical outputs will not result in identical files. Timestamps will be disabled where possible (requires h5py >= 2.2):

  • 'hdf5_track_times': Presence of creation timestamps.
Parameters:
  • fs – Simulation outputs, as returned by pypfilt.forecast().
  • filename – The filename to which the data will be written.

Summary statistic tables

Summary statistics are stored in tables, each of which comprises a set of named columns and a specific number of rows.

The Table class

To calculate a summary statistic, you need to define a subclass of the Table class and provide implementations of each method.

class pypfilt.Table(name)

The base class for summary statistic tables.

Tables are used to record rows of summary statistics as a simulation progresses.

Parameters:name – the name of the table in the output file.
dtype(params, obs_list)

Return the column names and data types, represented as a list of (name, data type) tuples. See the NumPy documentation for details.

Parameters:
  • params – The simulation parameters.
  • obs_list – A list of all observations.
n_rows(start_date, end_date, n_days, n_sys, forecasting)

Return the number of rows required for a single simulation.

Parameters:
  • start_date – The date at which the simulation starts.
  • end_date – The date at which the simulation ends.
  • n_days – The number of days for which the simulation runs.
  • n_sys – The number of observation systems (i.e., data sources).
  • forecastingTrue if this is a forecasting simulation, otherwise False.
add_rows(hist, weights, fs_date, dates, obs_types, insert_fn)

Record rows of summary statistics for some portion of a simulation.

Parameters:
  • hist – The particle history matrix.
  • weights – The weight of each particle at each date in the simulation window; it has dimensions (d, p) for d days and p particles.
  • fs_date – The forecasting date; if this is not a forecasting simulation, this is the date at which the simulation ends.
  • dates – A list of (datetime, ix, hist_ix) tuples that identify each day in the simulation window, the index of that day in the simulation window, and the index of that day in the particle history matrix.
  • obs_types – A set of (unit, period) tuples that identify each observation system from which observations have been taken.
  • insert_fn – A function that inserts one or more rows into the underlying data table; see the examples below.

The row insertion function can be used as follows:

# Insert a single row, represented as a tuple.
insert_fn((x, y, z))
# Insert multiple rows, represented as a list of tuples.
insert_fn([(x0, y0, z0), (x1, y1, z1)], n=2)
finished(hist, weights, fs_date, dates, obs_types, insert_fn)

Record rows of summary statistics at the end of a simulation.

The parameters are as per add_rows().

Derived classes should only implement this method if rows must be recorded by this method; the provided method does nothing.

monitors()

Return a list of monitors required by this Table.

Derived classes should implement this method if they require one or more monitors; the provided method returns an empty list.

Predefined statistics

The following derived classes are provided to calculate basic summary statistics of any generic simulation model.

class pypfilt.summary.ModelCIs(probs=None, name=u'model_cints')

Calculate fixed-probability central credible intervals for all state variables and model parameters.

Parameters:
  • probs – an array of probabilities that define the size of each central credible interval. The default value is numpy.uint8([0, 50, 90, 95, 99, 100]).
  • name – the name of the table in the output file.
class pypfilt.summary.ParamCovar(name=u'param_covar')

Calculate the covariance between all pairs of model parameters during each simulation.

Parameters:name – the name of the table in the output file.

Utility functions

The following column types are provided for convenience when defining custom Table subclasses.

pypfilt.summary.dtype_unit(obs_list, name=u'unit')

The dtype for columns that store observation units.

pypfilt.summary.dtype_period(name=u'period')

The dtype for columns that store observation periods.

pypfilt.summary.dtype_value(value, name=u'value')

The dtype for columns that store observation values.

pypfilt.summary.dtype_names_to_str(dtypes, encoding=u'utf-8')

Ensure that dtype field names are native strings, as required by NumPy. Unicode strings are not valid field names in Python 2, and this can cause problems when using Unicode string literals.

Parameters:
  • dtypes – A list of fields where each field is either a string, or a tuple of length 2 or 3 (see the NumPy docs for details).
  • encoding – The encoding for converting Unicode strings to native strings in Python 2.
Returns:

A list of fields, where each field name is a native string (str type).

Raises:

ValueError – If a name cannot be converted to a native string.

The following functions are provided for converting column types in structured arrays.

pypfilt.summary.convert_cols(data, converters)

Convert columns in a structured array from one type to another.

Parameters:
  • data – The input structured array.
  • converters – A dictionary that maps (unicode) column names to (convert_fn, new_dtype) tuples, which contain a conversion function and define the output dtype.
Returns:

A new structured array.

pypfilt.summary.default_converters(time_scale)

Return a dictionary for converting the 'fs_date' and 'date' columns from (see convert_cols()).

Retrospective statistics

In some cases, the Table model is not sufficiently flexible, since it assumes that statistics can be calculated during the course of a simulation. For some statistics, it may be necessary to observe the entire simulation before the statistics can be calculated.

In this case, you need to define a subclass of the Monitor class, which will observe (“monitor”) each simulation and, upon completion of each simulation, can calculate the necessary summary statistics.

Note that a Table subclass is also required to define the table columns, the number of rows, and to record each row at the end of the simulation.

class pypfilt.Monitor

The base class for simulation monitors.

Monitors are used to calculate quantities that:

  • Are used by multiple Tables (i.e., avoiding repeated computation); or
  • Require a complete simulation for calculation (as distinct from Tables, which incrementally record rows as a simulation progresses).

The quantities calculated by a Monitor can then be recorded by Table.add_rows() and/or Table.finished().

prepare(params, obs_list)

Perform any required preparation prior to a set of simulations.

Parameters:
  • params – The simulation parameters.
  • obs_list – A list of all observations.
begin_sim(start_date, end_date, n_days, n_sys, forecasting)

Perform any required preparation at the start of a simulation.

Parameters:
  • start_date – The date at which the simulation starts.
  • end_date – The date at which the simulation ends.
  • n_days – The number of days for which the simulation runs.
  • n_sys – The number of observation systems (i.e., data sources).
  • forecastingTrue if this is a forecasting simulation, otherwise False.
monitor(hist, weights, fs_date, dates, obs_types)

Monitor the simulation progress.

Parameters:
  • hist – The particle history matrix.
  • weights – The weight of each particle at each date in the simulation window; it has dimensions (d, p) for d days and p particles.
  • fs_date – The forecasting date; if this is not a forecasting simulation, this is the date at which the simulation ends.
  • dates – A list of (datetime, ix, hist_ix) tuples that identify each day in the simulation window, the index of that day in the simulation window, and the index of that day in the particle history matrix.
  • obs_types – A set of (unit, period) tuples that identify each observation system from which observations have been taken.
end_sim(hist, weights, fs_date, dates, obs_types)

Finalise the data as required for the relevant summary statistics.

The parameters are as per monitor().

Derived classes should only implement this method if finalisation of the monitored data is required; the provided method does nothing.

load_state(grp)

Load the monitor state from a cache file.

Parameters:grp – The h5py Group object from which to load the state.
save_state(grp)

Save the monitor state to a cache file.

Parameters:grp – The h5py Group object in which to save the state.

Tables and Monitors

The methods of each Table and Monitor will be called in the following sequence by the HDF5 summary class:

  1. Before any simulations are performed:

    • Table.dtype()
    • Monitor.prepare()

    In addition to defining the column types for each Table, this allows objects to store the simulation parameters and observations.

  2. At the start of each simulation:

    • Monitor.begin_sim()
    • Table.n_rows()

    This notifies each Monitor and each Table of the simulation period, the number of observation systems (i.e., data sources), and whether it is a forecasting simulation (where no resampling will take place).

  3. During each simulation:

    • Monitor.monitor()
    • Table.add_rows()

    This provides a portion of the simulation period for analysis by each Monitor and each Table. Because all of the Monitor.monitor() methods are called before the Table.add_rows() methods, tables can interrogate monitors to obtain any quantities of interest that are calculated by Monitor.monitor().

  4. At the end of each simulation:

    • Monitor.end_sim()
    • Table.finished()

    This allows each Monitor and each Table to perform any final calculations once the simulation has completed. Because all of the Monitor.end_sim() methods are called before the Table.finished() methods, tables can interrogate monitors to obtain any quantities of interest that are calculated by Monitor.end_sim().

Time scales

Two pre-defined simulation time scales are provided.

class pypfilt.Scalar(np_dtype=None)

A dimensionless time scale.

__init__(np_dtype=None)
Parameters:np_dtype – The data type used for serialisation; the default is np.float64.
set_period(start, end, steps_per_unit)

Define the simulation period and time-step size.

Parameters:
  • start (float) – The start of the simulation period.
  • end (float) – The end of the simulation period.
  • steps_per_unit (int) – The number of time-steps per day.
Raises:

ValueError – if start and/or end are not floats, or if steps_per_unit is not a positive integer.

with_observations(*streams)

Return a generator that yields a sequence of tuples that contain: the time-step number, the current time, and a list of observations.

Parameters:streams – Any number of observation streams (each of which is assumed to be sorted chronologically).
with_observations_from_time(start, *streams)

Return a generator that yields a sequence of tuples that contain: the time-step number, the current time, and a list of observations.

Parameters:
  • start – The starting time (set to None to use the start of the simulation period).
  • streams – Any number of observation streams (each of which is assumed to be sorted chronologically).
class pypfilt.Datetime(fmt=None)

A datetime scale where the time unit is days.

__init__(fmt=None)
Parameters:fmt – The format string used to serialise datetime objects; the default is '%Y-%m-%d %H:%M:%S'.
set_period(start, end, steps_per_unit)

Define the simulation period and time-step size.

Parameters:
  • start (datetime.datetime) – The start of the simulation period.
  • end (datetime.datetime) – The end of the simulation period.
  • steps_per_unit (int) – The number of time-steps per day.
Raises:

ValueError – if start and/or end are not datetime.datetime instances, or if steps_per_unit is not a positive integer.

with_observations(*streams)

Return a generator that yields a sequence of tuples that contain: the time-step number, the current time, and a list of observations.

Parameters:streams – Any number of observation streams (each of which is assumed to be sorted chronologically).
with_observations_from_time(start, *streams)

Return a generator that yields a sequence of tuples that contain: the time-step number, the current time, and a list of observations.

Parameters:
  • start – The starting time (set to None to use the start of the simulation period).
  • streams – Any number of observation streams (each of which is assumed to be sorted chronologically).

Custom time scales

If neither of the above time scales is suitable, you can define a custom time scale, which should derive the following base class and define the methods listed here:

class pypfilt.time.Time

The base class for simulation time scales, which defines the minimal set of methods that are required.

dtype(name)

Define the dtype for columns that store times.

native_dtype()

Define the Python type used to represent times in NumPy arrays.

is_instance(value)

Return whether value is an instance of the native time type.

to_dtype(time)

Convert from time to a dtype value.

from_dtype(dval)

Convert from a dtype value to time.

to_unicode(time)

Convert from time to a Unicode string.

This is used to define group names in HDF5 files, and for logging.

steps()

Return a generator that yields a sequence of time-step numbers and times (represented as tuples) that span the simulation period.

The first time-step should be numbered 1 and occur at a time that is one time-step after the beginning of the simulation period.

step_count()

Return the number of time-steps required for the simulation period.

step_of(time)

Return the time-step number that corresponds to the specified time.

add_scalar(time, scalar)

Add a scalar quantity to the specified time.

time_of_obs(obs)

Return the time associated with an observation.

to_scalar(time)

Convert the specified time into a scalar quantity, defined as the time-step number divided by the number of time-steps per time unit.

Plotting

Several plotting routines, built on top of matplotlib, are provided in the pypilt.plot module (matplotlib must be installed in order to use this module).

To generate plots non-interactively (i.e., without having a window appear) use the 'Agg' backend:

import matplotlib
matplotlib.use('Agg')
import matplotlib.pyplot as plt

See the matplotlib FAQ for more details.

Styles and colour palettes

pypfilt.plot.default_style()

The style sheet provided by pypfilt.

pypfilt.plot.apply_style(*args, **kwds)

Temporarily apply a style sheet.

Parameters:style – The style sheet to apply (default: default_style()).
with apply_style():
    make_plots()
pypfilt.plot.n_colours(name, n)

Extract a fixed number of colours from a colour map.

Parameters:
  • name – The colour map name (or a matplotlib.colors.Colormap instance).
  • n – The number of colours required.
colours = n_colours('Blues', 3)
pypfilt.plot.brewer_qual(name)

Qualitative palettes from the ColorBrewer project: 'Accent', 'Dark2', 'Paired', 'Pastel1', 'Pastel2', 'Set1', 'Set2', 'Set3'.

Raises:ValueError – if the palette name is invalid.
pypfilt.plot.colour_iter(col, palette, reverse=False)

Iterate over the unique (sorted) values in an array, returning a (value, colour) tuple for each of the values.

Parameters:
  • col – The column of (unsorted, repeated) values.
  • palette – The colour map name or a list of colours.
  • reverse – Whether to sort the values in ascending (default) or descending order.

Plotting functions

pypfilt.plot.cred_ints(ax, data, x, ci, palette=u'Blues', **kwargs)

Plot credible intervals as shaded regions.

Parameters:
  • ax – The plot axes.
  • data – The NumPy array containing the credible intervals.
  • x – The name of the x-axis column.
  • ci – The name of the credible interval column.
  • palette – The colour map name or a list of colours.
  • **kwargs – Extra arguments to pass to Axes.plot and Axes.fill_between.
Returns:

A list of the series that were plotted.

pypfilt.plot.observations(ax, data, label=u'Observations', future=False, **kwargs)

Plot observed values.

Parameters:
  • ax – The plot axes.
  • data – The NumPy array containing the observation data.
  • label – The label for the observation data.
  • future – Whether the observations occur after the forecasting date.
  • **kwargs – Extra arguments to pass to Axes.plot.
Returns:

A list of the series that were plotted.

pypfilt.plot.series(ax, data, x, y, scales, legend_cols=True, **kwargs)

Add multiple series to a single plot, each of which is styled according to values in other columns.

Parameters:
  • ax – The axes on which to draw the line series.
  • data – The structured array that contains the data to plot.
  • x – The name of the column that corresponds to the x-axis.
  • y – The name of the column that corresponds to the y-axis.
  • scales

    A list of “scales” to apply to each line series; each scale is a tuple (column, kwarg, kwvals, label_fmt) where:

    • column is the name of a column in data;
    • kwarg is the name of a keyword argument passed to plot();
    • kwvals is a list of values that the keyword argument will take; and
    • label_fmt is a format string for the legend keys or a function that returns the legend key.
  • legend_cols – Whether to show each scale in a separate column.
  • **kwargs – Extra arguments to pass to Axes.plot.
Returns:

A list of the series that were plotted.

scales = [
    # Colour lines according to the dispersion parameter.
    ('disp', 'color', brewer_qual('Set1'), r'$k = {:.0f}$'),
    # Vary line style according to the background signal.
    ('bg_obs', 'linestyle', ['-', '--', ':'], r'$bg_{{obs}} = {}$'),
]
series(ax, data, 'x_col', 'y_col', scales)

Faceted plots

This package provides a base class (Plot) for plots that comprise any number of subplots, and three subclasses for specific types of plots:

  • Wrap for plots where a single variable identifies each subplot.
  • Grid for plots where two variables are used to identify each subplot.
  • Single for single plots.

The key method of these classes is Plot.subplots(), which returns an iterator that yields (axes, data) tuples for each subplot. By looping over these tuples, one set of plotting commands can be used to generate all of the subplots. For examples, see the plot_forecasts() and plot_params() functions in the Plotting the results section of Getting Started.

class pypfilt.plot.Plot(**kwargs)

The base class for plots that comprise multiple subplots.

Parameters:

**kwargs – Extra arguments to pass to pyplot.subplots.

Variables:
  • fig – The matplotlib.figure.Figure instance for the plot.
  • axs – The \(M \times N\) array of matplotlib.axes.Axes instances for each of the sub-plots (\(M\) rows and \(N\) columns).
subplots()

Return an iterator that yields (axes, data) tuples for each subplot.

add_to_legend(objs, replace=False)

Add plot objects to the list of items to show in the figure legend.

Parameters:replace – Whether to ignore objects which share a label with any object already in this list (default) or to replace such objects (set to True).
legend(**kwargs)

Add a figure legend that lists the objects registered with add_to_legend().

Parameters:**kwargs – Extra arguments to pass to Figure.legend.
set_xlabel(text, dy, **kwargs)

Add an x-axis label that is centred across all subplots.

Parameters:
  • text – The label text.
  • dy – The vertical position of the label.
  • **kwargs – Extra arguments to pass to Figure.text.
set_ylabel(text, dx, **kwargs)

Add an y-axis label that is centred across all subplots.

Parameters:
  • text – The label text.
  • dx – The horizontal position of the label.
  • **kwargs – Extra arguments to pass to Figure.text.
expand_x_lims(xs, pad_frac=0.05, pad_abs=None)

Increase the range of the x-axis, relative to the plot data.

Parameters:
  • xs – The x-axis data.
  • pad_frac – The fractional increase in range.
  • pad_abs – The absolute increase in range.
expand_y_lims(ys, pad_frac=0.05, pad_abs=None)

Increase the range of the y-axis, relative to the plot data.

Parameters:
  • xs – The y-axis data.
  • pad_frac – The fractional increase in range.
  • pad_abs – The absolute increase in range.
scale_x_date(lbl_fmt, day=None, month=None, year=None)

Use a datetime scale to locate and label the x-axis ticks.

Parameters:
  • lbl_fmt – The strftime() format string for tick labels.
  • day – Locate ticks at every N days.
  • month – Locate ticks at every N months.
  • year – Locate ticks at every N years.
Raises:

ValueError – unless exactly one of day, month, and year is specified.

scale_y_date(lbl_fmt, day=None, month=None, year=None)

Use a datetime scale to locate and label the y-axis ticks.

Parameters:
  • lbl_fmt – The strftime() format string for tick labels.
  • day – Locate ticks at every N days.
  • month – Locate ticks at every N months.
  • year – Locate ticks at every N years.
Raises:

ValueError – unless exactly one of day, month, and year is specified.

save(filename, format, width, height, **kwargs)

Save the plot to disk (a thin wrapper for savefig).

Parameters:
  • filename – The output filename or a Python file-like object.
  • format – The output format.
  • width – The figure width in inches.
  • height – The figure height in inches.
  • **kwargs – Extra arguments for savefig; the defaults are transparent=True and bbox_inches='tight'.
class pypfilt.plot.Wrap(data, xlbl, ylbl, fac, nr=None, nc=None, **kwargs)

Faceted plots similar to those produced by ggplot2’s facet_wrap().

Parameters:
  • data – The NumPy array containing the data to plot.
  • xlbl – The label for the x-axis.
  • ylbl – The label for the y-axis.
  • fac – The faceting variable, represented as a tuple (column_name, label_fmt) where column_name is the name of a column in data and label_fmt is the format string for facet labels or a function that returns the facet label.
  • nr – The number of rows; one of nr and nc must be specified.
  • nc – The number of columns; one of nr and nc must be specified.
  • **kwargs – Extra arguments for Plot.
Raises:

ValueError – if nr and nc are both None or are both specified.

expand_x_lims(col, pad_frac=0.05, pad_abs=None)

Increase the range of the x-axis, relative to the plot data.

Parameters:
  • col – The column name for the x-axis data.
  • pad_frac – The fractional increase in range.
  • pad_abs – The absolute increase in range.
expand_y_lims(col, pad_frac=0.05, pad_abs=None)

Increase the range of the y-axis, relative to the plot data.

Parameters:
  • col – The column name for the y-axis data.
  • pad_frac – The fractional increase in range.
  • pad_abs – The absolute increase in range.
subplots(hide_axes=False, dx=0.055, dy=0.025)

Return an iterator that yields (axes, data) tuples for each subplot.

Parameters:
  • hide_axes – Whether to hide x and y axes that are not on their bottom or left edge, respectively, of the figure.
  • dx – The horizontal location for the y-axis label.
  • dy – The vertical location for the x-axis label.
class pypfilt.plot.Grid(data, xlbl, ylbl, xfac, yfac, **kwargs)

Faceted plots similar to those produced by ggplot2’s facet_grid().

Parameters:
  • data – The NumPy array containing the data to plot.
  • xlbl – The label for the x-axis.
  • ylbl – The label for the y-axis.
  • xfac – The horizontal faceting variable, represented as a tuple (column_name, label_fmt) where column_name is the name of a column in data and label_fmt is the format string for facet labels or a function that returns the facet label.
  • yfac – The vertical faceting variable (see xfac).
  • **kwargs – Extra arguments for Plot.
expand_x_lims(col, pad_frac=0.05, pad_abs=None)

Increase the range of the x-axis, relative to the plot data.

Parameters:
  • col – The column name for the x-axis data.
  • pad_frac – The fractional increase in range.
  • pad_abs – The absolute increase in range.
expand_y_lims(col, pad_frac=0.05, pad_abs=None)

Increase the range of the y-axis, relative to the plot data.

Parameters:
  • col – The column name for the y-axis data.
  • pad_frac – The fractional increase in range.
  • pad_abs – The absolute increase in range.
subplots(hide_axes=False, dx=0.055, dy=0.025)

Return an iterator that yields (axes, data) tuples for each subplot.

Parameters:
  • hide_axes – Whether to hide x and y axes that are not on their bottom or left edge, respectively, of the figure.
  • dx – The horizontal location for the y-axis label.
  • dy – The vertical location for the x-axis label.

For consistency, a class is also provided for single plots.

class pypfilt.plot.Single(data, xlbl, ylbl, **kwargs)

Faceted plots that contain only one sub-plot; i.e., a single plot that provides the same methods as faceted plots that contain many sub-plots.

Parameters:
  • data – The NumPy array containing the data to plot.
  • xlbl – The label for the x-axis.
  • ylbl – The label for the y-axis.
  • **kwargs – Extra arguments for Plot.
expand_x_lims(col, pad_frac=0.05, pad_abs=None)

Increase the range of the x-axis, relative to the plot data.

Parameters:
  • col – The column name for the x-axis data.
  • pad_frac – The fractional increase in range.
  • pad_abs – The absolute increase in range.
expand_y_lims(col, pad_frac=0.05, pad_abs=None)

Increase the range of the y-axis, relative to the plot data.

Parameters:
  • col – The column name for the y-axis data.
  • pad_frac – The fractional increase in range.
  • pad_abs – The absolute increase in range.
subplots(hide_axes=False, dx=0.055, dy=0.025)

Return an iterator that yields (axes, data) tuples for each subplot.

Parameters:
  • hide_axes – Whether to hide x and y axes that are not on their bottom or left edge, respectively, of the figure.
  • dx – The horizontal location for the y-axis label.
  • dy – The vertical location for the x-axis label.

Checking invariants

Provide convenience functions for checking invariants.

pypfilt.check.is_entire_matrix(params, hist, raise_exc=True)

Check whether the history matrix includes all columns (including, e.g., the particle weights and parent indices).

Parameters:
  • params – The simulation parameters.
  • hist – The history matrix.
  • raise_exc – Whether to raise an exception if the check fails.
Returns:

True if the check is successful. If the check fails, either a ValueError exception is raised (if raise_exc == True) or False is returned (if raise_exc == False).

pypfilt.check.is_only_statevec(params, hist, raise_exc=True)

Check whether the history matrix contains only the particle state vector columns.

Parameters:
  • params – The simulation parameters.
  • hist – The history matrix.
  • raise_exc – Whether to raise an exception if the check fails.
Returns:

True if the check is successful. If the check fails, either a ValueError exception is raised (if raise_exc == True) or False is returned (if raise_exc == False).