5.8. pypfilt.summary¶
5.8.1. Simulation metadata¶
Every simulation data file should include metadata that documents the
simulation parameters and working environment.
The Metadata
class provides the means for generating
such metadata:
-
class
pypfilt.summary.
Metadata
¶ Document the simulation parameters and system environment for a set of simulations. A black-list (
ignore_dict
) defines which members of the parameters dictionary will be excluded from this metadata, seefilter()
for details.-
build
(params)¶ Construct a metadata dictionary that documents the simulation parameters and system environment. Note that this should be generated at the start of the simulation, and that the git metadata will only be valid if the working directory is located within a git repository.
Parameters: params – The simulation parameters. By default, the versions of
pypfilt
,h5py
,numpy
andscipy
are recorded. The following example demonstrates how to also record the installed version of theepifx
package:import epifx import pypfilt.summary params = ... params['summary']['meta']['packages'].append('epifx') meta = pypfilt.summary.Metadata()
-
filter
(values, ignore, encode_fn)¶ Recursively filter items from a dictionary, used to remove parameters from the metadata dictionary that, e.g., have no meaningful representation.
Parameters: - values – The original dictionary.
- ignore – A dictionary that specifies which values to ignore.
- encode_fn – A function that encodes the remaining values (see
encode_value()
).
For example, to ignore
['px_range']
,['random']
, and'obs_model'
and'obs_llhd'
for every observation system when usingepifx
:m = pypfilt.summary.Metadata() ignore = { 'px_range': None, 'random': None, # Note the use of ``None`` to match any key under 'obs'. 'obs': {None: {'obs_model': None, 'obs_llhd': None}} } m.filter(params, ignore, m.encode)
-
encode
(value)¶ Encode values in a form suitable for serialisation in HDF5 files.
- Integer values are converted to
numpy.int32
values. - Floating-point values and arrays retain their data type.
- All other (i.e., non-numerical) values are converted to UTF-8 strings.
- Integer values are converted to
-
object_names
(object_dict)¶ Return the fully qualified name of each object in a (possibly nested) dictionary.
-
object_name
(obj)¶ Return the fully qualified name of the object as a byte string.
-
priors
(params)¶ Return a dictionary that describes the model parameter priors.
Each key identifies a parameter (by name); the corresponding value is a byte string representation of the prior distribution, which is typically a
numpy.random.Generator
method call.For example:
{'R0': b'random.uniform(1.0, 2.0)', 'gamma': b'(1 / random.uniform(1.0, 3.0))'}
-
pkg_version
(module)¶ Attempt to obtain the version of a Python module.
-
git_data
()¶ Record the status of the git repository within which the working directory is located (if such a repository exists).
-
run_cmd
(args, all_lines=False, err_val='')¶ Run a command and return the (Unicode) output. By default, only the first line is returned; set
all_lines=True
to receive all of the output as a list of Unicode strings. If the command returns a non-zero exit status, returnerr_val
instead.
-
5.8.2. Summary data files¶
The HDF5
class encapsulates the process of calculating and recording
summary statistics for each simulation.
-
class
pypfilt.summary.
HDF5
(params, obs_list)¶ Save tables of summary statistics to an HDF5 file.
Parameters: - params – The simulation parameters.
- obs_list – A list of all observations.
-
save_forecasts
(fs, filename)¶ Save forecast summaries to disk in the HDF5 binary data format.
This function creates the following datasets that summarise the estimation and forecasting outputs:
'data/TABLE'
for each table.
The provided metadata will be recorded under
'meta/'
.If dataset creation timestamps are enabled, two simulations that produce identical outputs will not result in identical files. Timestamps will be disabled where possible (requires h5py >= 2.2):
'hdf5_track_times'
: Presence of creation timestamps.
Parameters: - fs – Simulation outputs, as returned by
pypfilt.forecast()
. - filename – The filename to which the data will be written.
5.8.3. Summary statistic tables¶
Summary statistics are stored in tables, each of which comprises a set of named columns and a specific number of rows.
5.8.3.1. The Table class¶
To calculate a summary statistic, you need to define a subclass of the
Table
class and provide implementations of each method.
-
class
pypfilt.
Table
¶ The base class for summary statistic tables.
Tables are used to record rows of summary statistics as a simulation progresses.
-
dtype
(ctx, obs_list, name)¶ Return the column names and data types, represented as a list of
(name, data type)
tuples. See the NumPy documentation for details.Parameters: - params – The simulation parameters.
- obs_list – A list of all observations.
- name – The table’s name.
-
n_rows
(start_date, end_date, n_days, n_sys, forecasting)¶ Return the number of rows required for a single simulation.
Parameters: - start_date – The date at which the simulation starts.
- end_date – The date at which the simulation ends.
- n_days – The number of days for which the simulation runs.
- n_sys – The number of observation systems (i.e., data sources).
- forecasting –
True
if this is a forecasting simulation, otherwiseFalse
.
-
add_rows
(hist, weights, fs_date, dates, obs_types, insert_fn)¶ Record rows of summary statistics for some portion of a simulation.
Parameters: - hist – The particle history matrix.
- weights – The weight of each particle at each date in the
simulation window; it has dimensions
(d, p)
ford
days andp
particles. - fs_date – The forecasting date; if this is not a forecasting simulation, this is the date at which the simulation ends.
- dates – A list of
(datetime, ix, hist_ix)
tuples that identify each day in the simulation window, the index of that day in the simulation window, and the index of that day in the particle history matrix. - obs_types – A set of
(unit, period)
tuples that identify each observation system from which observations have been taken. - insert_fn – A function that inserts one or more rows into the underlying data table; see the examples below.
The row insertion function can be used as follows:
# Insert a single row, represented as a tuple. insert_fn((x, y, z)) # Insert multiple rows, represented as a list of tuples. insert_fn([(x0, y0, z0), (x1, y1, z1)], n=2)
-
finished
(hist, weights, fs_date, dates, obs_types, insert_fn)¶ Record rows of summary statistics at the end of a simulation.
The parameters are as per
add_rows()
.Derived classes should only implement this method if rows must be recorded by this method; the provided method does nothing.
-
5.8.3.2. Predefined statistics¶
The following derived classes are provided to calculate basic summary statistics of any generic simulation model.
-
class
pypfilt.summary.
ModelCIs
(probs=None)¶ Calculate fixed-probability central credible intervals for all state variables and model parameters.
Parameters: probs – an array of probabilities that define the size of each central credible interval. The default value is numpy.uint8([0, 50, 90, 95, 99, 100])
.-
dtype
(ctx, obs_list, name)¶ Return the column names and data types, represented as a list of
(name, data type)
tuples. See the NumPy documentation for details.Parameters: - params – The simulation parameters.
- obs_list – A list of all observations.
- name – The table’s name.
-
n_rows
(start_date, end_date, n_days, n_sys, forecasting)¶ Return the number of rows required for a single simulation.
Parameters: - start_date – The date at which the simulation starts.
- end_date – The date at which the simulation ends.
- n_days – The number of days for which the simulation runs.
- n_sys – The number of observation systems (i.e., data sources).
- forecasting –
True
if this is a forecasting simulation, otherwiseFalse
.
-
add_rows
(hist, weights, fs_date, dates, obs_types, insert_fn)¶ Record rows of summary statistics for some portion of a simulation.
Parameters: - hist – The particle history matrix.
- weights – The weight of each particle at each date in the
simulation window; it has dimensions
(d, p)
ford
days andp
particles. - fs_date – The forecasting date; if this is not a forecasting simulation, this is the date at which the simulation ends.
- dates – A list of
(datetime, ix, hist_ix)
tuples that identify each day in the simulation window, the index of that day in the simulation window, and the index of that day in the particle history matrix. - obs_types – A set of
(unit, period)
tuples that identify each observation system from which observations have been taken. - insert_fn – A function that inserts one or more rows into the underlying data table; see the examples below.
The row insertion function can be used as follows:
# Insert a single row, represented as a tuple. insert_fn((x, y, z)) # Insert multiple rows, represented as a list of tuples. insert_fn([(x0, y0, z0), (x1, y1, z1)], n=2)
-
-
class
pypfilt.summary.
ParamCovar
¶ Calculate the covariance between all pairs of model parameters during each simulation.
-
dtype
(ctx, obs_list, name)¶ Return the column names and data types, represented as a list of
(name, data type)
tuples. See the NumPy documentation for details.Parameters: - params – The simulation parameters.
- obs_list – A list of all observations.
- name – The table’s name.
-
n_rows
(start_date, end_date, n_days, n_sys, forecasting)¶ Return the number of rows required for a single simulation.
Parameters: - start_date – The date at which the simulation starts.
- end_date – The date at which the simulation ends.
- n_days – The number of days for which the simulation runs.
- n_sys – The number of observation systems (i.e., data sources).
- forecasting –
True
if this is a forecasting simulation, otherwiseFalse
.
-
add_rows
(hist, weights, fs_date, dates, obs_types, insert_fn)¶ Record rows of summary statistics for some portion of a simulation.
Parameters: - hist – The particle history matrix.
- weights – The weight of each particle at each date in the
simulation window; it has dimensions
(d, p)
ford
days andp
particles. - fs_date – The forecasting date; if this is not a forecasting simulation, this is the date at which the simulation ends.
- dates – A list of
(datetime, ix, hist_ix)
tuples that identify each day in the simulation window, the index of that day in the simulation window, and the index of that day in the particle history matrix. - obs_types – A set of
(unit, period)
tuples that identify each observation system from which observations have been taken. - insert_fn – A function that inserts one or more rows into the underlying data table; see the examples below.
The row insertion function can be used as follows:
# Insert a single row, represented as a tuple. insert_fn((x, y, z)) # Insert multiple rows, represented as a list of tuples. insert_fn([(x0, y0, z0), (x1, y1, z1)], n=2)
-
-
class
pypfilt.summary.
Obs
¶ Record the basic details of each observation; the columns are:
'unit', 'period', 'source', 'date', 'value', 'incomplete', 'upper_bound'
.-
dtype
(ctx, obs_list, name)¶ Return the column names and data types, represented as a list of
(name, data type)
tuples. See the NumPy documentation for details.Parameters: - params – The simulation parameters.
- obs_list – A list of all observations.
- name – The table’s name.
-
n_rows
(start_date, end_date, n_days, n_sys, forecasting)¶ Return the number of rows required for a single simulation.
Parameters: - start_date – The date at which the simulation starts.
- end_date – The date at which the simulation ends.
- n_days – The number of days for which the simulation runs.
- n_sys – The number of observation systems (i.e., data sources).
- forecasting –
True
if this is a forecasting simulation, otherwiseFalse
.
-
add_rows
(hist, weights, fs_date, dates, obs_types, insert_fn)¶ Record rows of summary statistics for some portion of a simulation.
Parameters: - hist – The particle history matrix.
- weights – The weight of each particle at each date in the
simulation window; it has dimensions
(d, p)
ford
days andp
particles. - fs_date – The forecasting date; if this is not a forecasting simulation, this is the date at which the simulation ends.
- dates – A list of
(datetime, ix, hist_ix)
tuples that identify each day in the simulation window, the index of that day in the simulation window, and the index of that day in the particle history matrix. - obs_types – A set of
(unit, period)
tuples that identify each observation system from which observations have been taken. - insert_fn – A function that inserts one or more rows into the underlying data table; see the examples below.
The row insertion function can be used as follows:
# Insert a single row, represented as a tuple. insert_fn((x, y, z)) # Insert multiple rows, represented as a list of tuples. insert_fn([(x0, y0, z0), (x1, y1, z1)], n=2)
-
finished
(hist, weights, fs_date, dates, obs_types, insert_fn)¶ Record rows of summary statistics at the end of a simulation.
The parameters are as per
add_rows()
.Derived classes should only implement this method if rows must be recorded by this method; the provided method does nothing.
-
-
class
pypfilt.summary.
SimulatedObs
¶ Record simulated observations for each particle.
-
dtype
(ctx, obs_list, name)¶ Return the column names and data types, represented as a list of
(name, data type)
tuples. See the NumPy documentation for details.Parameters: - params – The simulation parameters.
- obs_list – A list of all observations.
- name – The table’s name.
-
n_rows
(start_date, end_date, n_days, n_sys, forecasting)¶ Return the number of rows required for a single simulation.
Parameters: - start_date – The date at which the simulation starts.
- end_date – The date at which the simulation ends.
- n_days – The number of days for which the simulation runs.
- n_sys – The number of observation systems (i.e., data sources).
- forecasting –
True
if this is a forecasting simulation, otherwiseFalse
.
-
add_rows
(hist, weights, fs_date, dates, obs_types, insert_fn)¶ Record rows of summary statistics for some portion of a simulation.
Parameters: - hist – The particle history matrix.
- weights – The weight of each particle at each date in the
simulation window; it has dimensions
(d, p)
ford
days andp
particles. - fs_date – The forecasting date; if this is not a forecasting simulation, this is the date at which the simulation ends.
- dates – A list of
(datetime, ix, hist_ix)
tuples that identify each day in the simulation window, the index of that day in the simulation window, and the index of that day in the particle history matrix. - obs_types – A set of
(unit, period)
tuples that identify each observation system from which observations have been taken. - insert_fn – A function that inserts one or more rows into the underlying data table; see the examples below.
The row insertion function can be used as follows:
# Insert a single row, represented as a tuple. insert_fn((x, y, z)) # Insert multiple rows, represented as a list of tuples. insert_fn([(x0, y0, z0), (x1, y1, z1)], n=2)
-
-
class
pypfilt.summary.
PredictiveCIs
(exp_obs_monitor, probs=None)¶ Record fixed-probability central credible intervals for the observations.
Parameters: - exp_obs_monitor – a
pypfilt.summary.ExpectedObsMonitor
. - probs – an array of probabilities that define the size of each
central credible interval.
The default value is
numpy.uint8([0, 50, 95])
.
-
dtype
(ctx, obs_list, name)¶ Return the column names and data types, represented as a list of
(name, data type)
tuples. See the NumPy documentation for details.Parameters: - params – The simulation parameters.
- obs_list – A list of all observations.
- name – The table’s name.
-
n_rows
(start_date, end_date, n_days, n_sys, forecasting)¶ Return the number of rows required for a single simulation.
Parameters: - start_date – The date at which the simulation starts.
- end_date – The date at which the simulation ends.
- n_days – The number of days for which the simulation runs.
- n_sys – The number of observation systems (i.e., data sources).
- forecasting –
True
if this is a forecasting simulation, otherwiseFalse
.
-
add_rows
(hist, weights, fs_date, dates, obs_types, insert_fn)¶ Record rows of summary statistics for some portion of a simulation.
Parameters: - hist – The particle history matrix.
- weights – The weight of each particle at each date in the
simulation window; it has dimensions
(d, p)
ford
days andp
particles. - fs_date – The forecasting date; if this is not a forecasting simulation, this is the date at which the simulation ends.
- dates – A list of
(datetime, ix, hist_ix)
tuples that identify each day in the simulation window, the index of that day in the simulation window, and the index of that day in the particle history matrix. - obs_types – A set of
(unit, period)
tuples that identify each observation system from which observations have been taken. - insert_fn – A function that inserts one or more rows into the underlying data table; see the examples below.
The row insertion function can be used as follows:
# Insert a single row, represented as a tuple. insert_fn((x, y, z)) # Insert multiple rows, represented as a list of tuples. insert_fn([(x0, y0, z0), (x1, y1, z1)], n=2)
- exp_obs_monitor – a
5.8.3.3. Utility functions¶
The following column types are provided for convenience when defining custom
Table
subclasses.
-
pypfilt.summary.
dtype_value
(value, name='value')¶ The dtype for columns that store observation values.
The following functions are provided for converting column types in structured arrays.
-
pypfilt.summary.
convert_cols
(data, converters)¶ Convert columns in a structured array from one type to another.
Parameters: - data – The input structured array.
- converters – A dictionary that maps (unicode) column names to
(convert_fn, new_dtype)
tuples, which contain a conversion function and define the output dtype.
Returns: A new structured array.
-
pypfilt.summary.
default_converters
(time_scale)¶ Return a dictionary for converting the
'fs_date'
and'date'
columns from (seeconvert_cols()
).
5.8.4. Retrospective statistics¶
In some cases, the Table
model is not sufficiently flexible, since it
assumes that statistics can be calculated during the course of a simulation.
For some statistics, it may be necessary to observe the entire simulation
before the statistics can be calculated.
In this case, you need to define a subclass of the Monitor
class,
which will observe (“monitor”) each simulation and, upon completion of each
simulation, can calculate the necessary summary statistics.
Note that a Table
subclass is also required to define the table
columns, the number of rows, and to record each row at the end of the
simulation.
-
class
pypfilt.
Monitor
¶ The base class for simulation monitors.
Monitors are used to calculate quantities that:
- Are used by multiple Tables (i.e., avoiding repeated computation); or
- Require a complete simulation for calculation (as distinct from Tables, which incrementally record rows as a simulation progresses).
The quantities calculated by a Monitor can then be recorded by
Table.add_rows()
and/orTable.finished()
.-
prepare
(ctx, obs_list, name)¶ Perform any required preparation prior to a set of simulations.
Parameters: - params – The simulation parameters.
- obs_list – A list of all observations.
- name – The monitor’s name.
-
begin_sim
(start_date, end_date, n_days, n_sys, forecasting)¶ Perform any required preparation at the start of a simulation.
Parameters: - start_date – The date at which the simulation starts.
- end_date – The date at which the simulation ends.
- n_days – The number of days for which the simulation runs.
- n_sys – The number of observation systems (i.e., data sources).
- forecasting –
True
if this is a forecasting simulation, otherwiseFalse
.
-
monitor
(hist, weights, fs_date, dates, obs_types)¶ Monitor the simulation progress.
Parameters: - hist – The particle history matrix.
- weights – The weight of each particle at each date in the
simulation window; it has dimensions
(d, p)
ford
days andp
particles. - fs_date – The forecasting date; if this is not a forecasting simulation, this is the date at which the simulation ends.
- dates – A list of
(datetime, ix, hist_ix)
tuples that identify each day in the simulation window, the index of that day in the simulation window, and the index of that day in the particle history matrix. - obs_types – A set of
(unit, period)
tuples that identify each observation system from which observations have been taken.
-
end_sim
(hist, weights, fs_date, dates, obs_types)¶ Finalise the data as required for the relevant summary statistics.
The parameters are as per
monitor()
.Derived classes should only implement this method if finalisation of the monitored data is required; the provided method does nothing.
-
load_state
(grp)¶ Load the monitor state from a cache file.
Parameters: grp – The h5py Group object from which to load the state.
-
save_state
(grp)¶ Save the monitor state to a cache file.
Parameters: grp – The h5py Group object in which to save the state.
5.8.4.1. Predefined monitors¶
The PredictiveCIs
summary table requires the following monitor:
-
class
pypfilt.summary.
ExpectedObsMonitor
¶ Record expected observations for each particle.
This is typically an expensive operation, and this monitor allows multiple summary tables to obtain these values without recalculating them.
-
expected_obs
= None¶ The expected observation for each particle for the duration of the current simulation window.
Note that this is only valid for tables to inspect in each call to
add_rows()
, and not in a call tofinished()
.
-
prepare
(ctx, obs_list, name)¶ Perform any required preparation prior to a set of simulations.
Parameters: - params – The simulation parameters.
- obs_list – A list of all observations.
- name – The monitor’s name.
-
monitor
(hist, weights, fs_date, dates, obs_types)¶ Record the peak for each particle during a forecasting run.
-
end_sim
(hist, weights, fs_date, dates, obs_types)¶ Finalise the data as required for the relevant summary statistics.
The parameters are as per
monitor()
.Derived classes should only implement this method if finalisation of the monitored data is required; the provided method does nothing.
-
load_state
(grp)¶ Load the monitor state for disk.
-
save_state
(grp)¶ Save the monitor state to disk.
-
5.8.5. Tables and Monitors¶
The methods of each Table
and Monitor
will be called in the
following sequence by the HDF5
summary class:
Before any simulations are performed:
Table.dtype()
Monitor.prepare()
In addition to defining the column types for each
Table
, this allows objects to store the simulation parameters and observations.At the start of each simulation:
Monitor.begin_sim()
Table.n_rows()
This notifies each
Monitor
and eachTable
of the simulation period, the number of observation systems (i.e., data sources), and whether it is a forecasting simulation (where no resampling will take place).During each simulation:
Monitor.monitor()
Table.add_rows()
This provides a portion of the simulation period for analysis by each
Monitor
and eachTable
. Because all of theMonitor.monitor()
methods are called before theTable.add_rows()
methods, tables can interrogate monitors to obtain any quantities of interest that are calculated byMonitor.monitor()
.At the end of each simulation:
Monitor.end_sim()
Table.finished()
This allows each
Monitor
and eachTable
to perform any final calculations once the simulation has completed. Because all of theMonitor.end_sim()
methods are called before theTable.finished()
methods, tables can interrogate monitors to obtain any quantities of interest that are calculated byMonitor.end_sim()
.