Scenario Settings#

The Getting Started guide includes several example scenarios. Here we define all of the settings that may be defined in a scenario file. Scenarios are defined in TOML files. TOML is a simple, easy to read configuration file format, similar to JSON and YAML.

There are three kinds of settings:

Those that have default values (marked default) and only need to be defined if a non-default value is desired;
Those that are automatically defined (marked automatically defined) and cannot be overridden; and
Settings for which a value must be provided.

Core components#

Each of the following components must be defined for each scenario.

components.model: The simulation model; see pypfilt.model.Model.
components.time: The simulation time scale; see pypfilt.time for available time scales.
components.sampler: The prior distribution sampler; see pypfilt.sampler for available samplers.
components.summary: The component that saves summary statistics and other outputs; see pypfilt.summary.HDF5.

Optional pseudo-random number generators (PRNGs) can also be defined:

components.random.NAME: Defines a PRNG called 'NAME'.
components.random.NAME.seed default: filter.prng_seed: Defines the seed used to initial the PRNG called 'NAME'.

Input and output files#

Input and output files can be stored in separate directories, and cache files can either be retained for future forecasts or deleted.

files.input_directory: The directory that contains input data files, such as lookup tables and observations.
files.output_directory: The directory in which the simulation results will be saved.
files.temp_directory default: tempfile.gettempdir(): The directory in which temporary files, such as cache files, will be saved.
files.cache_file default: None: The file used to store particle filter states between forecast simulations. See Caching particle states for details.
files.delete_cache_file_before_forecast default: False: Whether to delete cached particle filter states (if any) before running forecast simulations; this only affects pypfilt.forecast().
files.delete_cache_file_after_forecast default: False: Whether to delete cached particle filter states (if any) after running forecast simulations; this only affects pypfilt.forecast().

Particle filter settings#

The following settings affect the behaviour of the particle filter itself.

filter.particles: The number of particles (i.e., model simulations).
filter.prng_seed: The seed for the various pseudo-random number generators, which are used to draw samples from the model prior distribution, to resample the particle ensemble, for use by stochastic simulation models, etc.
filter.history_window: The minimum number of time units that must be stored in the particle history matrix. For example, if an observation model describes the change in model state over two time units, the history window should be at least 2. Set this to -1 to store the entire simulation period in the particle history matrix.
filter.minimal_estimation_run default: True: When generating forecasts with pypfilt.forecast(), whether to end the estimation pass at the final forecasting date, rather than continuing to end of the simulation period.
filter.results.save_history default: False: Whether to return the particle history matrix for each estimation and forecasting pass.

Note

Setting this to True can substantially increase memory usage when performing large numbers of estimation and/or forecasting passes.
filter.reweight_or_fail default: False: Whether to raise an exception and terminate the simulation if the updated particle weights sum to zero. By default, when the updated particle weights sum to zero, their previous weights will be used instead — effectively ignoring problematic observations.
filter.reweight.enabled default: True: Whether to update particle weights in response to observations.
filter.reweight.exponent automatically defined: An exponent to apply to observation likelihoods; should only take values between 0 and 1 (inclusive).
filter.resample.enabled default: True: Whether to resample particles.
filter.resample.method default: 'deterministic': The method to use when resampling the particle ensemble; see pypfilt.resample.resample().
filter.resample.threshold default: 0.25: The resampling threshold; when the effective number of particles (normalised to the unit interval) drops below this threshold, the particles will be resampled.
filter.resample.before_forecasting default: False: Whether to resample the particles before each forecasting pass. Note that if the particle weights are already uniform, they will not be resampled.
filter.regularisation.enabled default: False: Whether to use the post-regularised particle filter; see pypfilt.resample.post_regularise().
filter.regularisation.bounds default: {}: The state vector fields to which post-regularisation will be applied. Lower and upper bounds must be provided for each field. See Regularisation for an example.
filter.regularisation.tolerance default: 1e-8: The minimum variation in a field’s value over the particle ensemble for it to be a candidate for post-regularisation. Fields whose range of values are smaller than this tolerance will be considered as constants, and will be ignored by the post-regularised particle filter.
filter.regularisation.bandwidth_scale default: 0.5: The bandwidth scaling factor for the regularisation kernel. This is defined relative to the optimal bandwidth for a Gaussian kernel (when the underlying density is Gaussian with unit covariance). See Improving Regularised Particle Filters for details.
filter.regularisation.regularise_or_fail default: False: Whether to raise an exception and terminate the simulation if the covariance matrix is not positive definite. By default, when the post-regularised particle filter encounters this situation, it prints a warning message and leaves the particle ensemble unchanged.
filter.adaptive_fit.enabled default: False: Whether to use an adaptive fitting method when generating forecasts with pypfilt.forecast().
filter.adaptive_fit.method default: None: The method to use when performing adapting fitting; see pypfilt.adaptive_fit() .
filter.adaptive_fit.target_effective_fraction default: 0.5: The target effective particle fraction for each estimation pass in the adaptive fitting process; used by the "fit_effective_fraction" method.
filter.adaptive_fit.exponent_tolerance default: 0.001: The minimum increase in exponent for each estimation pass in the adaptive fitting process; used by the "fit_effective_fraction" method.
filter.adaptive_fit.exponent_step default: None: The sequence of exponents to use in each estimation pass; used by the "fixed_exponents" method. This may be a sequence of exponents, or a single value that defines the step size.

Particle ensemble partitions#

The particle ensemble can be divided into a number of non-overlapping partitions, each of which is assigned a fixed net probability mass. This allows the ensemble to maintain an invariant, such as a discrete distribution over some model parameter, over the estimation and forecasting passes. Partitions are defined by an array of tables:

filter.partition

An array of tables, each of which defines a partition; see the settings listed below.

filter.partition.particles

The number of particles contained in the partition. These must sum to the value of filter.particles.

filter.partition.weight

The net probability mass allocated to the partition. These must sum to 1.0.

filter.partition.reservoir default: False

Whether this partition acts as a “reservoir”. Particles in reservoir partitions are never reweighted or resampled. Reservoirs can be used for:

Preserving a representative sample of the model prior distribution, in which case the reservoir should have non-zero weight; and
Providing candidate particles when resampling other non-reservoir partitions, in which case the reservoir should have zero weight.

filter.partition.reservoir_partition default: not defined

Identifies the reservoir partition associated with this non-reservoir partition.

filter.partition.reservoir_fraction default: not defined

The fraction of particles that will be sampled from the associated reservoir when resampling this non-reservoir partition.

For example, the ensemble can be divided into two partitions of different sizes and weights:

[filter]
particles = 1000

[[filter.partition]]
particles = 600
weight = 0.75

[[filter.partition]]
particles = 400
weight = 0.25

The ensemble can also be divided into a small reservoir partition and a large non-reservoir partition, where every time the non-reservoir partition is resampled, 20% of particles are selected from the reservoir:

[filter]
particles = 1000

# Define partition #1.
[[filter.partition]]
particles = 100
weight = 0.0 # These particles have zero weight in the ensemble.
reservoir = true

# Define partition #2.
[[filter.partition]]
particles = 900
weight = 1.0 # This partition represents 100% of the ensemble.
reservoir_partition = 1 # Use partition #1 as a reservoir.
reservoir_fraction = 0.2 # Select 20% of particles from the reservoir.

The following settings are automatically defined:

filter.partition.slice automatically defined: A slice object that can be used to select the particles included in this partition.
filter.partition.reservoir_ix automatically defined: The zero-based index of the reservoir associated with this partition, if any.

Note

Use the slice object for indexing into state vectors, because it will return a view. In contrast, using a mask array will return a copy. See Indexing on ndarrays for further details.

Lookup tables#

lookup_tables.TABLE_NAME: Defines a lookup table called 'TABLE_NAME'. The value can either be a string that identifies the input data file, or a table that identifies the input data file and whether to associate each particle with a different column from this table. See Provide observation models with lookup tables and Using lookup tables in your own models for examples.

Simulation model settings#

The model table can be used to define any model-specific settings used by the simulation model that are not included in the particle state vector.

For example, a simulation model might require a population_size parameter that is constant across all of the particles. This could be defined as a model parameter and included in the particle state vector, but it could also be defined as a model setting:

[model]
population_size = 100

The simulation model can then retrieve this value from the simulation context:

def get_population_size(ctx):
    return ctx.settings['model']['population_size']

Ordinary differential equation solvers#

For models that inherit from OdeModel, the integration method and custom solver options can be specified in the model table. For example:

[model]
ode_method = "RK23" # Explicit Runge-Kutta method of order 3(2).
ode_solver_options.rtol = 1e-2 # Reduce the relative tolerance.
ode_solver_options.atol = 1e-4 # Reduce the absolute tolerance.

See the scipy.integrate.solve_ivp() documentation for supported integration methods and solver options.

Simulation model prior distribution#

prior.PARAMETER: Defines the prior distribution for the model parameter 'PARAMETER'.

There are two ways in which prior samples for a model parameter can be defined:

As a distribution from which the sampler component will draw values.

The specification of a distribution will depend on the choice of sampler component. See pypfilt.sampler for details, and see the Defining multiple scenarios for examples of defining the model prior distribution.
Note

When using the LatinHypercube sampler, the following specifies a uniform distribution over the interval [5, 10]:
```
[prior]
x = { name = "uniform", args.loc = 5.0, args.scale = 5.0}
```
As an external data source that contains the sample values. This allows you to use arbitrarily-complex model prior distributions, which can incorporate features such as correlations between model parameters.

Samples can be read from space-delimited text files, by specifying the file name and the column name:
```
[prior]
x = { external = true, table = "input-file.ssv", column = "x" }
```
Samples can also be read from HDF5 datasets, by specifying the file name, dataset path, and column name:
```
[prior]
x.external = true
x.hdf5 = "prior-samples.hdf5"
x.dataset = "data/prior-samples"
x.column = "x"
```

Observation models#

A scenario may include zero or more observation models. See Defining a scenario and Defining multiple scenarios for examples of defining multiple observation models for a single scenario.

observations.OBS_UNIT: Defines an observation model whose observation unit is 'OBS_UNIT'.
observations.OBS_UNIT.model: The observation model; see pypfilt.obs.Obs and pypfilt.obs.Univariate.
observations.OBS_UNIT.file default: None: The name of the input data file that contains observations for this observation model.
observations.OBS_UNIT.file_args default: {}: An optional table of named arguments to pass to the observation model’s from_file() method.

Note

These arguments are in addition to the input filename and the simulation time scale. For example, the Univariate class supports arbitrary column names via the time_col and value_col arguments.
observations.OBS_UNIT.parameters.NAME: The value for the observation model parameter 'NAME'.

Note

Observation model parameters are assumed to be scalar. If a list of values is provided, there will be a separate scenario instance for each value in this list. If lists of values are provided for multiple observation model parameters, there will be a separate scenario instances for each combination of parameter values (i.e., for each element in the Cartesian product of parameter values).
observations.OBS_UNIT.descriptor.name.NAME: The name used to identify the observation model parameter 'NAME' in output file names. The parameter name itself is a reasonable value to use, unless it contains special characters.
observations.OBS_UNIT.descriptor.format.NAME: The format string used to convert the value of the observation model parameter 'NAME' into a string, for use in output file names. See the format string syntax documentation for details.

Note

In addition to the above settings, the observations.OBS_UNIT table can be used to define other settings that are specific to the observation model. For example, see Provide observation models with lookup tables, where the observation model setting pr_obs_lookup is used to identify a lookup table.

Summary statistics#

There are several settings that affect the summary statistics:

summary.init default: {}: An optional table of named arguments to pass to the summary component’s constructor.
summary.only_forecasts default: False: Whether to generate summary statistics only for forecast simulations, and ignore any estimation passes.
summary.save_history default: False: Whether to save the particle history matrix in the output file.
summary.save_backcast default: False: Whether to save the particle back-casts in the output file.
summary.metadata.minimal default: False: Whether to record only minimal metadata, ignoring details such as installed packages and the scenario settings. When set to True, only the Python version will be recorded. This can be useful when running many rounds of extremely rapid simulations, where metadata isn’t required for each round and repeated metadata collection may noticeably slow the simulations.

Note

In general, you should not need to change this setting.
summary.metadata.packages default: []: The list of installed Python packages whose versions should be included in the output metadata. The version of pypfilt and all of its dependencies are automatically recorded and do not need to be specified here.

Each summary statistic is recorded by a summary table:

summary.tables.TABLE_NAME: Defines a summary table called 'TABLE_NAME'.
summary.tables.TABLE_NAME.component: Defines the component that will generate the summary table called 'TABLE_NAME'.

Note

Each summary table may require/support different settings. See pypfilt.summary for examples of using each of the summary tables provided by pypfilt.

Specific summary tables may require one or more summary monitors:

summary.monitors.MONITOR_NAME: Defines a summary monitor called 'MONITOR_NAME'.
summary.tables.MONITOR_NAME.component: Defines the component that implements the summary monitor called 'MONITOR_NAME'.

Note

Each summary monitor may require/support different settings. See the corresponding monitor documentation for details.

Scenarios and scenario-specific settings#

Each scenario file should define one or more scenarios. See the Defining multiple scenarios for an example.

scenario.SCENARIO_ID: Defines a scenario with the unique identifier 'SCENARIO_ID'.
scenario.SCENARIO_ID.name: A descriptive name for the scenario 'SCENARIO_ID'.

Each scenario inherits all of the settings defined in the scenario file, and these can be overridden with scenario-specific values. For example:

[components]
time = "pypfilt.Scalar"

[scenario.scalar_time]
name = "This scenario uses scalar time"

[scenario.date_time]
name = "This scenario uses datetime"
components.time = "pypfilt.Datetime"

Note

A scenario can override any setting in this way. This allows common settings to be defined only once, and for scenario-specific settings to be defined inside each scenario table.

Warning

It isn’t possible to remove a common setting by defining an empty scenario-specific setting. See override_dict() for details.

Scenario Settings#

Core components#

Time-related settings#

Input and output files#

Particle filter settings#

Particle ensemble partitions#

Lookup tables#

Simulation model settings#

Ordinary differential equation solvers#

Simulation model prior distribution#

Observation models#

Summary statistics#

Scenarios and scenario-specific settings#