6.6. pypfilt.summary

6.6.1. Simulation metadata

Every simulation data file should include metadata that documents the simulation parameters and working environment. The Metadata class provides the means for generating such metadata:

class pypfilt.summary.Metadata

Document the simulation settings and system environment for a set of simulations.

build(ctx)

Construct a metadata dictionary that documents the simulation parameters and system environment. Note that this should be generated at the start of the simulation, and that the git metadata will only be valid if the working directory is located within a git repository.

Parameters:ctx – The simulation context.

By default, the versions of pypfilt, h5py, numpy and scipy are recorded.

encode_settings(values, encode_fn)

Recursively encode settings in a dictionary.

Parameters:
  • values – The original dictionary.
  • encode_fn – A function that encodes individual values (see encode_value()).
encode(value)

Encode values in a form suitable for serialisation in HDF5 files.

  • Integer values are converted to numpy.int32 values.
  • Floating-point values and arrays retain their data type.
  • All other (i.e., non-numerical) values are converted to UTF-8 strings.
pkg_version(module)

Attempt to obtain the version of a Python module.

git_data()

Record the status of the git repository within which the working directory is located (if such a repository exists).

run_cmd(args, all_lines=False, err_val='')

Run a command and return the (Unicode) output. By default, only the first line is returned; set all_lines=True to receive all of the output as a list of Unicode strings. If the command returns a non-zero exit status, return err_val instead.

6.6.2. Summary data files

The HDF5 class encapsulates the process of calculating and recording summary statistics for each simulation.

class pypfilt.summary.HDF5(ctx)

Save tables of summary statistics to an HDF5 file.

Parameters:
  • ctx – The simulation context.
  • obs_list – A list of all observations.
save_forecasts(ctx, fs, filename)

Save forecast summaries to disk in the HDF5 binary data format.

This function creates the following datasets that summarise the estimation and forecasting outputs:

  • 'tables/TABLE' for each table.

The provided metadata will be recorded under 'meta/'.

If dataset creation timestamps are enabled, two simulations that produce identical outputs will not result in identical files. Timestamps will be disabled where possible (requires h5py >= 2.2):

  • 'hdf5_track_times': Presence of creation timestamps.
Parameters:
  • ctx – The simulation context.
  • fs – Simulation outputs, as returned by pypfilt.forecast().
  • filename – The filename to which the data will be written.

6.6.3. Summary statistic tables

Summary statistics are stored in tables, each of which comprises a set of named columns and a specific number of rows.

6.6.3.1. The Table class

To calculate a summary statistic, you need to define a subclass of the Table class and provide implementations of each method.

class pypfilt.Table

The base class for summary statistic tables.

Tables are used to record rows of summary statistics as a simulation progresses.

field_types(ctx, obs_list, name)

Return the column names and data types, represented as a list of (name, data type) tuples. See the NumPy documentation for details.

Note

Use pypfilt.io.time_field() for columns that will contain time values. This ensures that the time values will be converted as necessary when loading and saving tables.

Parameters:
  • ctx – The simulation context.
  • obs_list – A list of all observations.
  • name – The table’s name.
n_rows(ctx, start_date, end_date, n_days, forecasting)

Return the number of rows required for a single simulation.

Parameters:
  • ctx – The simulation context.
  • start_date – The date at which the simulation starts.
  • end_date – The date at which the simulation ends.
  • n_days – The number of days for which the simulation runs.
  • forecastingTrue if this is a forecasting simulation, otherwise False.
add_rows(ctx, fs_date, window, insert_fn)

Record rows of summary statistics for some portion of a simulation.

Parameters:
  • ctx – The simulation context.
  • fs_date – The forecasting date; if this is not a forecasting simulation, this is the date at which the simulation ends.
  • window – A list of Snapshot instances that capture the particle states at each time unit in the simulation window.
  • insert_fn – A function that inserts one or more rows into the underlying data table; see the examples below.

The row insertion function can be used as follows:

# Insert a single row, represented as a tuple.
insert_fn((x, y, z))
# Insert multiple rows, represented as a list of tuples.
insert_fn([(x0, y0, z0), (x1, y1, z1)], n=2)
finished(ctx, fs_date, window, insert_fn)

Record rows of summary statistics at the end of a simulation.

The parameters are as per add_rows().

Derived classes should only implement this method if rows must be recorded by this method; the provided method does nothing.

load_state(ctx, grp)

Load the table state from a cache file.

Parameters:
  • ctx – The simulation context.
  • grp – The h5py Group object from which to load the state.
save_state(ctx, grp)

Save the table state to a cache file.

Parameters:
  • ctx – The simulation context.
  • grp – The h5py Group object in which to save the state.

6.6.3.2. Predefined statistics

The following derived classes are provided to calculate basic summary statistics of any generic simulation model.

class pypfilt.summary.ModelCIs

Calculate fixed-probability central credible intervals for all state variables and model parameters.

Note

Credible intervals are only recorded for scalar fields. Non-scalar fields will be ignored.

The default intervals are: 0%, 50%, 60%, 70%, 80%, 90%, 95%, 99%, 100%. These can be overridden in the scenario settings. For example:

[summary.tables]
model_cints.component = "pypfilt.summary.ModelCIs"
model_cints.credible_intervals = [ 0, 50, 95 ]
field_types(ctx, obs_list, name)

Return the column names and data types, represented as a list of (name, data type) tuples. See the NumPy documentation for details.

Note

Use pypfilt.io.time_field() for columns that will contain time values. This ensures that the time values will be converted as necessary when loading and saving tables.

Parameters:
  • ctx – The simulation context.
  • obs_list – A list of all observations.
  • name – The table’s name.
n_rows(ctx, start_date, end_date, n_days, forecasting)

Return the number of rows required for a single simulation.

Parameters:
  • ctx – The simulation context.
  • start_date – The date at which the simulation starts.
  • end_date – The date at which the simulation ends.
  • n_days – The number of days for which the simulation runs.
  • forecastingTrue if this is a forecasting simulation, otherwise False.
add_rows(ctx, fs_date, window, insert_fn)

Record rows of summary statistics for some portion of a simulation.

Parameters:
  • ctx – The simulation context.
  • fs_date – The forecasting date; if this is not a forecasting simulation, this is the date at which the simulation ends.
  • window – A list of Snapshot instances that capture the particle states at each time unit in the simulation window.
  • insert_fn – A function that inserts one or more rows into the underlying data table; see the examples below.

The row insertion function can be used as follows:

# Insert a single row, represented as a tuple.
insert_fn((x, y, z))
# Insert multiple rows, represented as a list of tuples.
insert_fn([(x0, y0, z0), (x1, y1, z1)], n=2)
class pypfilt.summary.ParamCovar

Calculate the covariance between all pairs of model parameters during each simulation.

field_types(ctx, obs_list, name)

Return the column names and data types, represented as a list of (name, data type) tuples. See the NumPy documentation for details.

Note

Use pypfilt.io.time_field() for columns that will contain time values. This ensures that the time values will be converted as necessary when loading and saving tables.

Parameters:
  • ctx – The simulation context.
  • obs_list – A list of all observations.
  • name – The table’s name.
n_rows(ctx, start_date, end_date, n_days, forecasting)

Return the number of rows required for a single simulation.

Parameters:
  • ctx – The simulation context.
  • start_date – The date at which the simulation starts.
  • end_date – The date at which the simulation ends.
  • n_days – The number of days for which the simulation runs.
  • forecastingTrue if this is a forecasting simulation, otherwise False.
add_rows(ctx, fs_date, window, insert_fn)

Record rows of summary statistics for some portion of a simulation.

Parameters:
  • ctx – The simulation context.
  • fs_date – The forecasting date; if this is not a forecasting simulation, this is the date at which the simulation ends.
  • window – A list of Snapshot instances that capture the particle states at each time unit in the simulation window.
  • insert_fn – A function that inserts one or more rows into the underlying data table; see the examples below.

The row insertion function can be used as follows:

# Insert a single row, represented as a tuple.
insert_fn((x, y, z))
# Insert multiple rows, represented as a list of tuples.
insert_fn([(x0, y0, z0), (x1, y1, z1)], n=2)
class pypfilt.summary.SimulatedObs

Record simulated observations for a single observation unit, for each particle in the simulation.

The observation unit must be specified in the scenario settings. For example:

[summary.tables]
sim_obs.component = "pypfilt.summary.SimulatedObs"
sim_obs.observation_unit = "x"

This table uses a unique PRNG seed that is derived from the observation unit, so that simulated observations for different observation units are not correlated. It may be desirable to instead use the common PRNG seed (e.g., to preserve the existing outputs for scenarios with only a single observation model). Set the 'common_prng_seed' setting to True to enable this:

[summary.tables]
sim_obs.component = "pypfilt.summary.SimulatedObs"
sim_obs.observation_unit = "x"
sim_obs.common_prng_seed = true
field_types(ctx, obs_list, name)

Return the column names and data types, represented as a list of (name, data type) tuples. See the NumPy documentation for details.

Note

Use pypfilt.io.time_field() for columns that will contain time values. This ensures that the time values will be converted as necessary when loading and saving tables.

Parameters:
  • ctx – The simulation context.
  • obs_list – A list of all observations.
  • name – The table’s name.
load_state(ctx, group)

Restore the state of each PRNG from the cache.

save_state(ctx, group)

Save the current state of each PRNG to the cache.

n_rows(ctx, start_date, end_date, n_days, forecasting)

Return the number of rows required for a single simulation.

Parameters:
  • ctx – The simulation context.
  • start_date – The date at which the simulation starts.
  • end_date – The date at which the simulation ends.
  • n_days – The number of days for which the simulation runs.
  • forecastingTrue if this is a forecasting simulation, otherwise False.
add_rows(ctx, fs_date, window, insert_fn)

Record rows of summary statistics for some portion of a simulation.

Parameters:
  • ctx – The simulation context.
  • fs_date – The forecasting date; if this is not a forecasting simulation, this is the date at which the simulation ends.
  • window – A list of Snapshot instances that capture the particle states at each time unit in the simulation window.
  • insert_fn – A function that inserts one or more rows into the underlying data table; see the examples below.

The row insertion function can be used as follows:

# Insert a single row, represented as a tuple.
insert_fn((x, y, z))
# Insert multiple rows, represented as a list of tuples.
insert_fn([(x0, y0, z0), (x1, y1, z1)], n=2)
class pypfilt.summary.PredictiveCIs

Record fixed-probability central credible intervals for the observations.

The default intervals are: 0%, 50%, 60%, 70%, 80%, 90%, 95%. These can be overridden in the scenario settings. For example:

[summary.tables]
forecasts.component = "pypfilt.summary.PredictiveCIs"
forecasts.credible_intervals = [ 0, 50, 95 ]
field_types(ctx, obs_list, name)

Return the column names and data types, represented as a list of (name, data type) tuples. See the NumPy documentation for details.

Note

Use pypfilt.io.time_field() for columns that will contain time values. This ensures that the time values will be converted as necessary when loading and saving tables.

Parameters:
  • ctx – The simulation context.
  • obs_list – A list of all observations.
  • name – The table’s name.
n_rows(ctx, start_date, end_date, n_days, forecasting)

Return the number of rows required for a single simulation.

Parameters:
  • ctx – The simulation context.
  • start_date – The date at which the simulation starts.
  • end_date – The date at which the simulation ends.
  • n_days – The number of days for which the simulation runs.
  • forecastingTrue if this is a forecasting simulation, otherwise False.
add_rows(ctx, fs_date, window, insert_fn)

Record rows of summary statistics for some portion of a simulation.

Parameters:
  • ctx – The simulation context.
  • fs_date – The forecasting date; if this is not a forecasting simulation, this is the date at which the simulation ends.
  • window – A list of Snapshot instances that capture the particle states at each time unit in the simulation window.
  • insert_fn – A function that inserts one or more rows into the underlying data table; see the examples below.

The row insertion function can be used as follows:

# Insert a single row, represented as a tuple.
insert_fn((x, y, z))
# Insert multiple rows, represented as a list of tuples.
insert_fn([(x0, y0, z0), (x1, y1, z1)], n=2)
class pypfilt.summary.EnsembleSnapshot

Record the particle state vectors at each time unit of the estimation and forecasting passes.

Note

These snapshots capture the ensemble at each time unit. There is no relationship between the particle ordering at different times.

[summary.tables]
snapshot.component = "pypfilt.summary.EnsembleSnapshot"
field_types(ctx, obs_list, name)

Return the column names and data types, represented as a list of (name, data type) tuples. See the NumPy documentation for details.

Note

Use pypfilt.io.time_field() for columns that will contain time values. This ensures that the time values will be converted as necessary when loading and saving tables.

Parameters:
  • ctx – The simulation context.
  • obs_list – A list of all observations.
  • name – The table’s name.
n_rows(ctx, start_date, end_date, n_days, forecasting)

Return the number of rows required for a single simulation.

Parameters:
  • ctx – The simulation context.
  • start_date – The date at which the simulation starts.
  • end_date – The date at which the simulation ends.
  • n_days – The number of days for which the simulation runs.
  • forecastingTrue if this is a forecasting simulation, otherwise False.
add_rows(ctx, fs_date, window, insert_fn)

Record rows of summary statistics for some portion of a simulation.

Parameters:
  • ctx – The simulation context.
  • fs_date – The forecasting date; if this is not a forecasting simulation, this is the date at which the simulation ends.
  • window – A list of Snapshot instances that capture the particle states at each time unit in the simulation window.
  • insert_fn – A function that inserts one or more rows into the underlying data table; see the examples below.

The row insertion function can be used as follows:

# Insert a single row, represented as a tuple.
insert_fn((x, y, z))
# Insert multiple rows, represented as a list of tuples.
insert_fn([(x0, y0, z0), (x1, y1, z1)], n=2)
class pypfilt.summary.ForecastSnapshot

Record the particle state vectors at the start of each forecasting pass and, optionally, at each day of the forecasting pass.

Note

The particles will be resampled and so the state vectors will have uniform weights (which are not recorded in the table).

[summary.tables]
snapshot.component = "pypfilt.summary.ForecastSnapshot"
snapshot.each_time_unit = true
field_types(ctx, obs_list, name)

Return the column names and data types, represented as a list of (name, data type) tuples. See the NumPy documentation for details.

Note

Use pypfilt.io.time_field() for columns that will contain time values. This ensures that the time values will be converted as necessary when loading and saving tables.

Parameters:
  • ctx – The simulation context.
  • obs_list – A list of all observations.
  • name – The table’s name.
n_rows(ctx, start_date, end_date, n_days, forecasting)

Return the number of rows required for a single simulation.

Parameters:
  • ctx – The simulation context.
  • start_date – The date at which the simulation starts.
  • end_date – The date at which the simulation ends.
  • n_days – The number of days for which the simulation runs.
  • forecastingTrue if this is a forecasting simulation, otherwise False.
add_rows(ctx, fs_date, window, insert_fn)

Record rows of summary statistics for some portion of a simulation.

Parameters:
  • ctx – The simulation context.
  • fs_date – The forecasting date; if this is not a forecasting simulation, this is the date at which the simulation ends.
  • window – A list of Snapshot instances that capture the particle states at each time unit in the simulation window.
  • insert_fn – A function that inserts one or more rows into the underlying data table; see the examples below.

The row insertion function can be used as follows:

# Insert a single row, represented as a tuple.
insert_fn((x, y, z))
# Insert multiple rows, represented as a list of tuples.
insert_fn([(x0, y0, z0), (x1, y1, z1)], n=2)
load_state(ctx, group)

Restore the state of each PRNG from the cache.

save_state(ctx, group)

Save the current state of each PRNG to the cache.

class pypfilt.summary.BackcastPredictiveCIs

Record fixed-probability central credible intervals for backcast observations.

This requires a BackcastMonitor, which should be specified in the scenario settings.

The default intervals are: 0%, 50%, 60%, 70%, 80%, 90%, 95%. These can be overridden in the scenario settings. For example:

[summary.monitors]
backcast_monitor.component = "pypfilt.summary.BackcastMonitor"

[summary.tables]
backcasts.component = "pypfilt.summary.BackcastPredictiveCIs"
backcasts.backcast_monitor = "backcast_monitor"
backcasts.credible_intervals = [ 0, 50, 95 ]
field_types(ctx, obs_list, name)

Return the column names and data types, represented as a list of (name, data type) tuples. See the NumPy documentation for details.

Note

Use pypfilt.io.time_field() for columns that will contain time values. This ensures that the time values will be converted as necessary when loading and saving tables.

Parameters:
  • ctx – The simulation context.
  • obs_list – A list of all observations.
  • name – The table’s name.
n_rows(ctx, start_date, end_date, n_days, forecasting)

Return the number of rows required for a single simulation.

Parameters:
  • ctx – The simulation context.
  • start_date – The date at which the simulation starts.
  • end_date – The date at which the simulation ends.
  • n_days – The number of days for which the simulation runs.
  • forecastingTrue if this is a forecasting simulation, otherwise False.
add_rows(ctx, fs_date, window, insert_fn)

Record rows of summary statistics for some portion of a simulation.

Parameters:
  • ctx – The simulation context.
  • fs_date – The forecasting date; if this is not a forecasting simulation, this is the date at which the simulation ends.
  • window – A list of Snapshot instances that capture the particle states at each time unit in the simulation window.
  • insert_fn – A function that inserts one or more rows into the underlying data table; see the examples below.

The row insertion function can be used as follows:

# Insert a single row, represented as a tuple.
insert_fn((x, y, z))
# Insert multiple rows, represented as a list of tuples.
insert_fn([(x0, y0, z0), (x1, y1, z1)], n=2)
finished(ctx, fs_date, window, insert_fn)

Record rows of summary statistics at the end of a simulation.

The parameters are as per add_rows().

Derived classes should only implement this method if rows must be recorded by this method; the provided method does nothing.

6.6.4. Retrospective statistics

In some cases, the Table model is not sufficiently flexible, since it assumes that statistics can be calculated during the course of a simulation. For some statistics, it may be necessary to observe the entire simulation before the statistics can be calculated.

In this case, you need to define a subclass of the Monitor class, which will observe (“monitor”) each simulation and, upon completion of each simulation, can calculate the necessary summary statistics.

Note that a Table subclass is also required to define the table columns, the number of rows, and to record each row at the end of the simulation.

class pypfilt.Monitor

The base class for simulation monitors.

Monitors are used to calculate quantities that:

  • Are used by multiple Tables (i.e., avoiding repeated computation); or
  • Require a complete simulation for calculation (as distinct from Tables, which incrementally record rows as a simulation progresses).

The quantities calculated by a Monitor can then be recorded by Table.add_rows() and/or Table.finished().

prepare(ctx, obs_list, name)

Perform any required preparation prior to a set of simulations.

Parameters:
  • ctx – The simulation context.
  • obs_list – A list of all observations.
  • name – The monitor’s name.
begin_sim(ctx, start_date, end_date, n_days, forecasting)

Perform any required preparation at the start of a simulation.

Parameters:
  • ctx – The simulation context.
  • start_date – The date at which the simulation starts.
  • end_date – The date at which the simulation ends.
  • n_days – The number of days for which the simulation runs.
  • forecastingTrue if this is a forecasting simulation, otherwise False.
monitor(ctx, fs_date, window)

Monitor the simulation progress.

Parameters:
  • ctx – The simulation context.
  • fs_date – The forecasting date; if this is not a forecasting simulation, this is the date at which the simulation ends.
  • window – A list of Snapshot instances that capture the particle states at each time unit in the simulation window.
end_sim(ctx, fs_date, window)

Finalise the data as required for the relevant summary statistics.

The parameters are as per monitor().

Derived classes should only implement this method if finalisation of the monitored data is required; the provided method does nothing.

load_state(ctx, grp)

Load the monitor state from a cache file.

Parameters:
  • ctx – The simulation context.
  • grp – The h5py Group object from which to load the state.
save_state(ctx, grp)

Save the monitor state to a cache file.

Parameters:
  • ctx – The simulation context.
  • grp – The h5py Group object in which to save the state.
class pypfilt.summary.BackcastMonitor

Record the backcast particle matrix at the end of each estimation simulation, so that it can be examined in forecasting simulations.

[summary.monitors]
backcast_monitor.component = "pypfilt.summary.BackcastMonitor"
backcast = None

The backcast simulation history (History).

Note that this is only valid for tables to inspect during a forecasting simulation, and not during an estimation simulation.

window = None

The backcast summary window (a list of Snapshot values).

Note that this is only valid for tables to inspect during a forecasting simulation, and not during an estimation simulation.

prepare(ctx, obs_list, name)

Perform any required preparation prior to a set of simulations.

Parameters:
  • ctx – The simulation context.
  • obs_list – A list of all observations.
  • name – The monitor’s name.
begin_sim(ctx, start_date, end_date, n_days, forecasting)

Perform any required preparation at the start of a simulation.

Parameters:
  • ctx – The simulation context.
  • start_date – The date at which the simulation starts.
  • end_date – The date at which the simulation ends.
  • n_days – The number of days for which the simulation runs.
  • forecastingTrue if this is a forecasting simulation, otherwise False.
end_sim(ctx, start_date, end_date)

Finalise the data as required for the relevant summary statistics.

The parameters are as per monitor().

Derived classes should only implement this method if finalisation of the monitored data is required; the provided method does nothing.

load_state(ctx, group)

Load the monitor state from a cache file.

Parameters:
  • ctx – The simulation context.
  • grp – The h5py Group object from which to load the state.
save_state(ctx, group)

Save the monitor state to a cache file.

Parameters:
  • ctx – The simulation context.
  • grp – The h5py Group object in which to save the state.

6.6.5. Tables and Monitors

The methods of each Table and Monitor will be called in the following sequence by the HDF5 summary class:

  1. Before any simulations are performed:

    • Table.dtype()
    • Monitor.prepare()

    In addition to defining the column types for each Table, this allows objects to store the simulation parameters and observations.

  2. At the start of each simulation:

    • Monitor.begin_sim()
    • Table.n_rows()

    This notifies each Monitor and each Table of the simulation period, the number of observation systems (i.e., data sources), and whether it is a forecasting simulation (where no resampling will take place).

  3. During each simulation:

    • Monitor.monitor()
    • Table.add_rows()

    This provides a portion of the simulation period for analysis by each Monitor and each Table. Because all of the Monitor.monitor() methods are called before the Table.add_rows() methods, tables can interrogate monitors to obtain any quantities of interest that are calculated by Monitor.monitor().

  4. At the end of each simulation:

    • Monitor.end_sim()
    • Table.finished()

    This allows each Monitor and each Table to perform any final calculations once the simulation has completed. Because all of the Monitor.end_sim() methods are called before the Table.finished() methods, tables can interrogate monitors to obtain any quantities of interest that are calculated by Monitor.end_sim().