5.8. pypfilt.summary

5.8.1. Simulation metadata

Every simulation data file should include metadata that documents the simulation parameters and working environment. The Metadata class provides the means for generating such metadata:

class pypfilt.summary.Metadata

Document the simulation parameters and system environment for a set of simulations. A black-list (ignore_dict) defines which members of the parameters dictionary will be excluded from this metadata, see filter() for details.

build(params)

Construct a metadata dictionary that documents the simulation parameters and system environment. Note that this should be generated at the start of the simulation, and that the git metadata will only be valid if the working directory is located within a git repository.

Parameters:params – The simulation parameters.

By default, the versions of pypfilt, h5py, numpy and scipy are recorded. The following example demonstrates how to also record the installed version of the epifx package:

import epifx
import pypfilt.summary
params = ...
params['summary']['meta']['packages'].append('epifx')
meta = pypfilt.summary.Metadata()
filter(values, ignore, encode_fn)

Recursively filter items from a dictionary, used to remove parameters from the metadata dictionary that, e.g., have no meaningful representation.

Parameters:
  • values – The original dictionary.
  • ignore – A dictionary that specifies which values to ignore.
  • encode_fn – A function that encodes the remaining values (see encode_value()).

For example, to ignore ['px_range'], ['random'], and 'obs_model' and 'obs_llhd' for every observation system when using epifx:

m = pypfilt.summary.Metadata()
ignore = {
    'px_range': None,
    'random': None,
    # Note the use of ``None`` to match any key under 'obs'.
    'obs': {None: {'obs_model': None, 'obs_llhd': None}}
}
m.filter(params, ignore, m.encode)
encode(value)

Encode values in a form suitable for serialisation in HDF5 files.

  • Integer values are converted to numpy.int32 values.
  • Floating-point values and arrays retain their data type.
  • All other (i.e., non-numerical) values are converted to UTF-8 strings.
object_names(object_dict)

Return the fully qualified name of each object in a (possibly nested) dictionary.

object_name(obj)

Return the fully qualified name of the object as a byte string.

priors(params)

Return a dictionary that describes the model parameter priors.

Each key identifies a parameter (by name); the corresponding value is a byte string representation of the prior distribution, which is typically a numpy.random.Generator method call.

For example:

{'R0': b'random.uniform(1.0, 2.0)',
 'gamma': b'(1 / random.uniform(1.0, 3.0))'}
pkg_version(module)

Attempt to obtain the version of a Python module.

git_data()

Record the status of the git repository within which the working directory is located (if such a repository exists).

run_cmd(args, all_lines=False, err_val='')

Run a command and return the (Unicode) output. By default, only the first line is returned; set all_lines=True to receive all of the output as a list of Unicode strings. If the command returns a non-zero exit status, return err_val instead.

5.8.2. Summary data files

The HDF5 class encapsulates the process of calculating and recording summary statistics for each simulation.

class pypfilt.summary.HDF5(params, obs_list)

Save tables of summary statistics to an HDF5 file.

Parameters:
  • params – The simulation parameters.
  • obs_list – A list of all observations.
save_forecasts(fs, filename)

Save forecast summaries to disk in the HDF5 binary data format.

This function creates the following datasets that summarise the estimation and forecasting outputs:

  • 'data/TABLE' for each table.

The provided metadata will be recorded under 'meta/'.

If dataset creation timestamps are enabled, two simulations that produce identical outputs will not result in identical files. Timestamps will be disabled where possible (requires h5py >= 2.2):

  • 'hdf5_track_times': Presence of creation timestamps.
Parameters:
  • fs – Simulation outputs, as returned by pypfilt.forecast().
  • filename – The filename to which the data will be written.

5.8.3. Summary statistic tables

Summary statistics are stored in tables, each of which comprises a set of named columns and a specific number of rows.

5.8.3.1. The Table class

To calculate a summary statistic, you need to define a subclass of the Table class and provide implementations of each method.

class pypfilt.Table

The base class for summary statistic tables.

Tables are used to record rows of summary statistics as a simulation progresses.

dtype(ctx, obs_list, name)

Return the column names and data types, represented as a list of (name, data type) tuples. See the NumPy documentation for details.

Parameters:
  • params – The simulation parameters.
  • obs_list – A list of all observations.
  • name – The table’s name.
n_rows(start_date, end_date, n_days, n_sys, forecasting)

Return the number of rows required for a single simulation.

Parameters:
  • start_date – The date at which the simulation starts.
  • end_date – The date at which the simulation ends.
  • n_days – The number of days for which the simulation runs.
  • n_sys – The number of observation systems (i.e., data sources).
  • forecastingTrue if this is a forecasting simulation, otherwise False.
add_rows(hist, weights, fs_date, dates, obs_types, insert_fn)

Record rows of summary statistics for some portion of a simulation.

Parameters:
  • hist – The particle history matrix.
  • weights – The weight of each particle at each date in the simulation window; it has dimensions (d, p) for d days and p particles.
  • fs_date – The forecasting date; if this is not a forecasting simulation, this is the date at which the simulation ends.
  • dates – A list of (datetime, ix, hist_ix) tuples that identify each day in the simulation window, the index of that day in the simulation window, and the index of that day in the particle history matrix.
  • obs_types – A set of (unit, period) tuples that identify each observation system from which observations have been taken.
  • insert_fn – A function that inserts one or more rows into the underlying data table; see the examples below.

The row insertion function can be used as follows:

# Insert a single row, represented as a tuple.
insert_fn((x, y, z))
# Insert multiple rows, represented as a list of tuples.
insert_fn([(x0, y0, z0), (x1, y1, z1)], n=2)
finished(hist, weights, fs_date, dates, obs_types, insert_fn)

Record rows of summary statistics at the end of a simulation.

The parameters are as per add_rows().

Derived classes should only implement this method if rows must be recorded by this method; the provided method does nothing.

5.8.3.2. Predefined statistics

The following derived classes are provided to calculate basic summary statistics of any generic simulation model.

class pypfilt.summary.ModelCIs(probs=None)

Calculate fixed-probability central credible intervals for all state variables and model parameters.

Parameters:probs – an array of probabilities that define the size of each central credible interval. The default value is numpy.uint8([0, 50, 90, 95, 99, 100]).
dtype(ctx, obs_list, name)

Return the column names and data types, represented as a list of (name, data type) tuples. See the NumPy documentation for details.

Parameters:
  • params – The simulation parameters.
  • obs_list – A list of all observations.
  • name – The table’s name.
n_rows(start_date, end_date, n_days, n_sys, forecasting)

Return the number of rows required for a single simulation.

Parameters:
  • start_date – The date at which the simulation starts.
  • end_date – The date at which the simulation ends.
  • n_days – The number of days for which the simulation runs.
  • n_sys – The number of observation systems (i.e., data sources).
  • forecastingTrue if this is a forecasting simulation, otherwise False.
add_rows(hist, weights, fs_date, dates, obs_types, insert_fn)

Record rows of summary statistics for some portion of a simulation.

Parameters:
  • hist – The particle history matrix.
  • weights – The weight of each particle at each date in the simulation window; it has dimensions (d, p) for d days and p particles.
  • fs_date – The forecasting date; if this is not a forecasting simulation, this is the date at which the simulation ends.
  • dates – A list of (datetime, ix, hist_ix) tuples that identify each day in the simulation window, the index of that day in the simulation window, and the index of that day in the particle history matrix.
  • obs_types – A set of (unit, period) tuples that identify each observation system from which observations have been taken.
  • insert_fn – A function that inserts one or more rows into the underlying data table; see the examples below.

The row insertion function can be used as follows:

# Insert a single row, represented as a tuple.
insert_fn((x, y, z))
# Insert multiple rows, represented as a list of tuples.
insert_fn([(x0, y0, z0), (x1, y1, z1)], n=2)
class pypfilt.summary.ParamCovar

Calculate the covariance between all pairs of model parameters during each simulation.

dtype(ctx, obs_list, name)

Return the column names and data types, represented as a list of (name, data type) tuples. See the NumPy documentation for details.

Parameters:
  • params – The simulation parameters.
  • obs_list – A list of all observations.
  • name – The table’s name.
n_rows(start_date, end_date, n_days, n_sys, forecasting)

Return the number of rows required for a single simulation.

Parameters:
  • start_date – The date at which the simulation starts.
  • end_date – The date at which the simulation ends.
  • n_days – The number of days for which the simulation runs.
  • n_sys – The number of observation systems (i.e., data sources).
  • forecastingTrue if this is a forecasting simulation, otherwise False.
add_rows(hist, weights, fs_date, dates, obs_types, insert_fn)

Record rows of summary statistics for some portion of a simulation.

Parameters:
  • hist – The particle history matrix.
  • weights – The weight of each particle at each date in the simulation window; it has dimensions (d, p) for d days and p particles.
  • fs_date – The forecasting date; if this is not a forecasting simulation, this is the date at which the simulation ends.
  • dates – A list of (datetime, ix, hist_ix) tuples that identify each day in the simulation window, the index of that day in the simulation window, and the index of that day in the particle history matrix.
  • obs_types – A set of (unit, period) tuples that identify each observation system from which observations have been taken.
  • insert_fn – A function that inserts one or more rows into the underlying data table; see the examples below.

The row insertion function can be used as follows:

# Insert a single row, represented as a tuple.
insert_fn((x, y, z))
# Insert multiple rows, represented as a list of tuples.
insert_fn([(x0, y0, z0), (x1, y1, z1)], n=2)
class pypfilt.summary.Obs

Record the basic details of each observation; the columns are: 'unit', 'period', 'source', 'date', 'value', 'incomplete', 'upper_bound'.

dtype(ctx, obs_list, name)

Return the column names and data types, represented as a list of (name, data type) tuples. See the NumPy documentation for details.

Parameters:
  • params – The simulation parameters.
  • obs_list – A list of all observations.
  • name – The table’s name.
n_rows(start_date, end_date, n_days, n_sys, forecasting)

Return the number of rows required for a single simulation.

Parameters:
  • start_date – The date at which the simulation starts.
  • end_date – The date at which the simulation ends.
  • n_days – The number of days for which the simulation runs.
  • n_sys – The number of observation systems (i.e., data sources).
  • forecastingTrue if this is a forecasting simulation, otherwise False.
add_rows(hist, weights, fs_date, dates, obs_types, insert_fn)

Record rows of summary statistics for some portion of a simulation.

Parameters:
  • hist – The particle history matrix.
  • weights – The weight of each particle at each date in the simulation window; it has dimensions (d, p) for d days and p particles.
  • fs_date – The forecasting date; if this is not a forecasting simulation, this is the date at which the simulation ends.
  • dates – A list of (datetime, ix, hist_ix) tuples that identify each day in the simulation window, the index of that day in the simulation window, and the index of that day in the particle history matrix.
  • obs_types – A set of (unit, period) tuples that identify each observation system from which observations have been taken.
  • insert_fn – A function that inserts one or more rows into the underlying data table; see the examples below.

The row insertion function can be used as follows:

# Insert a single row, represented as a tuple.
insert_fn((x, y, z))
# Insert multiple rows, represented as a list of tuples.
insert_fn([(x0, y0, z0), (x1, y1, z1)], n=2)
finished(hist, weights, fs_date, dates, obs_types, insert_fn)

Record rows of summary statistics at the end of a simulation.

The parameters are as per add_rows().

Derived classes should only implement this method if rows must be recorded by this method; the provided method does nothing.

class pypfilt.summary.SimulatedObs

Record simulated observations for each particle.

dtype(ctx, obs_list, name)

Return the column names and data types, represented as a list of (name, data type) tuples. See the NumPy documentation for details.

Parameters:
  • params – The simulation parameters.
  • obs_list – A list of all observations.
  • name – The table’s name.
n_rows(start_date, end_date, n_days, n_sys, forecasting)

Return the number of rows required for a single simulation.

Parameters:
  • start_date – The date at which the simulation starts.
  • end_date – The date at which the simulation ends.
  • n_days – The number of days for which the simulation runs.
  • n_sys – The number of observation systems (i.e., data sources).
  • forecastingTrue if this is a forecasting simulation, otherwise False.
add_rows(hist, weights, fs_date, dates, obs_types, insert_fn)

Record rows of summary statistics for some portion of a simulation.

Parameters:
  • hist – The particle history matrix.
  • weights – The weight of each particle at each date in the simulation window; it has dimensions (d, p) for d days and p particles.
  • fs_date – The forecasting date; if this is not a forecasting simulation, this is the date at which the simulation ends.
  • dates – A list of (datetime, ix, hist_ix) tuples that identify each day in the simulation window, the index of that day in the simulation window, and the index of that day in the particle history matrix.
  • obs_types – A set of (unit, period) tuples that identify each observation system from which observations have been taken.
  • insert_fn – A function that inserts one or more rows into the underlying data table; see the examples below.

The row insertion function can be used as follows:

# Insert a single row, represented as a tuple.
insert_fn((x, y, z))
# Insert multiple rows, represented as a list of tuples.
insert_fn([(x0, y0, z0), (x1, y1, z1)], n=2)
class pypfilt.summary.PredictiveCIs(exp_obs_monitor, probs=None)

Record fixed-probability central credible intervals for the observations.

Parameters:
  • exp_obs_monitor – a pypfilt.summary.ExpectedObsMonitor.
  • probs – an array of probabilities that define the size of each central credible interval. The default value is numpy.uint8([0, 50, 95]).
dtype(ctx, obs_list, name)

Return the column names and data types, represented as a list of (name, data type) tuples. See the NumPy documentation for details.

Parameters:
  • params – The simulation parameters.
  • obs_list – A list of all observations.
  • name – The table’s name.
n_rows(start_date, end_date, n_days, n_sys, forecasting)

Return the number of rows required for a single simulation.

Parameters:
  • start_date – The date at which the simulation starts.
  • end_date – The date at which the simulation ends.
  • n_days – The number of days for which the simulation runs.
  • n_sys – The number of observation systems (i.e., data sources).
  • forecastingTrue if this is a forecasting simulation, otherwise False.
add_rows(hist, weights, fs_date, dates, obs_types, insert_fn)

Record rows of summary statistics for some portion of a simulation.

Parameters:
  • hist – The particle history matrix.
  • weights – The weight of each particle at each date in the simulation window; it has dimensions (d, p) for d days and p particles.
  • fs_date – The forecasting date; if this is not a forecasting simulation, this is the date at which the simulation ends.
  • dates – A list of (datetime, ix, hist_ix) tuples that identify each day in the simulation window, the index of that day in the simulation window, and the index of that day in the particle history matrix.
  • obs_types – A set of (unit, period) tuples that identify each observation system from which observations have been taken.
  • insert_fn – A function that inserts one or more rows into the underlying data table; see the examples below.

The row insertion function can be used as follows:

# Insert a single row, represented as a tuple.
insert_fn((x, y, z))
# Insert multiple rows, represented as a list of tuples.
insert_fn([(x0, y0, z0), (x1, y1, z1)], n=2)

5.8.3.3. Utility functions

The following column types are provided for convenience when defining custom Table subclasses.

pypfilt.summary.dtype_value(value, name='value')

The dtype for columns that store observation values.

The following functions are provided for converting column types in structured arrays.

pypfilt.summary.convert_cols(data, converters)

Convert columns in a structured array from one type to another.

Parameters:
  • data – The input structured array.
  • converters – A dictionary that maps (unicode) column names to (convert_fn, new_dtype) tuples, which contain a conversion function and define the output dtype.
Returns:

A new structured array.

pypfilt.summary.default_converters(time_scale)

Return a dictionary for converting the 'fs_date' and 'date' columns from (see convert_cols()).

5.8.4. Retrospective statistics

In some cases, the Table model is not sufficiently flexible, since it assumes that statistics can be calculated during the course of a simulation. For some statistics, it may be necessary to observe the entire simulation before the statistics can be calculated.

In this case, you need to define a subclass of the Monitor class, which will observe (“monitor”) each simulation and, upon completion of each simulation, can calculate the necessary summary statistics.

Note that a Table subclass is also required to define the table columns, the number of rows, and to record each row at the end of the simulation.

class pypfilt.Monitor

The base class for simulation monitors.

Monitors are used to calculate quantities that:

  • Are used by multiple Tables (i.e., avoiding repeated computation); or
  • Require a complete simulation for calculation (as distinct from Tables, which incrementally record rows as a simulation progresses).

The quantities calculated by a Monitor can then be recorded by Table.add_rows() and/or Table.finished().

prepare(ctx, obs_list, name)

Perform any required preparation prior to a set of simulations.

Parameters:
  • params – The simulation parameters.
  • obs_list – A list of all observations.
  • name – The monitor’s name.
begin_sim(start_date, end_date, n_days, n_sys, forecasting)

Perform any required preparation at the start of a simulation.

Parameters:
  • start_date – The date at which the simulation starts.
  • end_date – The date at which the simulation ends.
  • n_days – The number of days for which the simulation runs.
  • n_sys – The number of observation systems (i.e., data sources).
  • forecastingTrue if this is a forecasting simulation, otherwise False.
monitor(hist, weights, fs_date, dates, obs_types)

Monitor the simulation progress.

Parameters:
  • hist – The particle history matrix.
  • weights – The weight of each particle at each date in the simulation window; it has dimensions (d, p) for d days and p particles.
  • fs_date – The forecasting date; if this is not a forecasting simulation, this is the date at which the simulation ends.
  • dates – A list of (datetime, ix, hist_ix) tuples that identify each day in the simulation window, the index of that day in the simulation window, and the index of that day in the particle history matrix.
  • obs_types – A set of (unit, period) tuples that identify each observation system from which observations have been taken.
end_sim(hist, weights, fs_date, dates, obs_types)

Finalise the data as required for the relevant summary statistics.

The parameters are as per monitor().

Derived classes should only implement this method if finalisation of the monitored data is required; the provided method does nothing.

load_state(grp)

Load the monitor state from a cache file.

Parameters:grp – The h5py Group object from which to load the state.
save_state(grp)

Save the monitor state to a cache file.

Parameters:grp – The h5py Group object in which to save the state.

5.8.4.1. Predefined monitors

The PredictiveCIs summary table requires the following monitor:

class pypfilt.summary.ExpectedObsMonitor

Record expected observations for each particle.

This is typically an expensive operation, and this monitor allows multiple summary tables to obtain these values without recalculating them.

expected_obs = None

The expected observation for each particle for the duration of the current simulation window.

Note that this is only valid for tables to inspect in each call to add_rows(), and not in a call to finished().

prepare(ctx, obs_list, name)

Perform any required preparation prior to a set of simulations.

Parameters:
  • params – The simulation parameters.
  • obs_list – A list of all observations.
  • name – The monitor’s name.
monitor(hist, weights, fs_date, dates, obs_types)

Record the peak for each particle during a forecasting run.

end_sim(hist, weights, fs_date, dates, obs_types)

Finalise the data as required for the relevant summary statistics.

The parameters are as per monitor().

Derived classes should only implement this method if finalisation of the monitored data is required; the provided method does nothing.

load_state(grp)

Load the monitor state for disk.

save_state(grp)

Save the monitor state to disk.

5.8.5. Tables and Monitors

The methods of each Table and Monitor will be called in the following sequence by the HDF5 summary class:

  1. Before any simulations are performed:

    • Table.dtype()
    • Monitor.prepare()

    In addition to defining the column types for each Table, this allows objects to store the simulation parameters and observations.

  2. At the start of each simulation:

    • Monitor.begin_sim()
    • Table.n_rows()

    This notifies each Monitor and each Table of the simulation period, the number of observation systems (i.e., data sources), and whether it is a forecasting simulation (where no resampling will take place).

  3. During each simulation:

    • Monitor.monitor()
    • Table.add_rows()

    This provides a portion of the simulation period for analysis by each Monitor and each Table. Because all of the Monitor.monitor() methods are called before the Table.add_rows() methods, tables can interrogate monitors to obtain any quantities of interest that are calculated by Monitor.monitor().

  4. At the end of each simulation:

    • Monitor.end_sim()
    • Table.finished()

    This allows each Monitor and each Table to perform any final calculations once the simulation has completed. Because all of the Monitor.end_sim() methods are called before the Table.finished() methods, tables can interrogate monitors to obtain any quantities of interest that are calculated by Monitor.end_sim().