5. Scenario Settings

The predation scenario definition provides an example of how to define forecast scenarios. Here we define all of the settings that may be defined in a scenario file. Scenarios are defined in TOML files. TOML is a simple, easy to read configuration file format, similar to JSON and YAML.

There are three kinds of settings:

  • Those that have default values (marked default) and only need to be defined if a non-default value is desired;
  • Those that are automatically defined (marked automatically defined) and cannot be overridden; and
  • Settings for which a value must be provided.

5.1. Core components

Each of the following components must be defined for each scenario.

components.model
The simulation model; see pypfilt.model.Model.
components.time
The simulation time scale; see pypfilt.time for available time scales.
components.sampler
The prior distribution sampler; see pypfilt.sampler for available samplers.
components.summary
The component that saves summary statistics and other outputs; see pypfilt.summary.HDF5.

5.3. Input and output files

Input and output files can be stored in separate directories, and cache files can either be retained for future forecasts or deleted.

files.input_directory
The directory that contains input data files, such as lookup tables and observations.
files.output_directory
The directory in which the simulation results will be saved.
files.temp_directory default: tempfile.gettempdir()
The directory in which temporary files, such as cache files, will be saved.
files.delete_cache_file_before_forecast default: False
Whether to delete cached particle filter states (if any) before running forecast simulations; this only affects pypfilt.forecast().
files.delete_cache_file_after_forecast default: False
Whether to delete cached particle filter states (if any) after running forecast simulations; this only affects pypfilt.forecast().

5.4. Particle filter settings

The following settings affect the behaviour of the particle filter itself.

filter.particles
The number of particles (i.e., model simulations).
filter.prng_seed
The seed for the various pseudo-random number generators, which are used to draw samples from the model prior distribution, to resample the particle ensemble, for use by stochastic simulation models, etc.
filter.history_window
The minimum number of time units that must be stored in the particle history matrix. For example, if an observation model describes the change in model state over two time units, the history window should be at least 2. Set this to -1 to store the entire simulation period in the particle history matrix.
filter.minimal_estimation_run default: True
When generating forecasts with pypfilt.forecast(), whether to end the estimation pass at the final forecasting date, rather than continuing to end of the simulation period.
filter.resample.method default: 'deterministic'
The method to use when resampling the particle ensemble; see pypfilt.resample.resample().
filter.resample.threshold default: 0.25
The resampling threshold; when the effective number of particles (normalised to the unit interval) drops below this threshold, the particles will be resampled.
filter.regularisation.enabled default: False
Whether to use the post-regularised particle filter; see pypfilt.resample.post_regularise().
filter.regularisation.bounds default: {}
The state vector fields to which post-regularisation will be applied. Lower and upper bounds must be provided for each field. See the predation scenario definition for an example.
filter.regularisation.tolerance default: 1e-8
The minimum variation in a field’s value over the particle ensemble for it to be a candidate for post-regularisation. Fields whose range of values are smaller than this tolerance will be considered as constants, and will be ignored by the post-regularised particle filter.
filter.regularisation.regularise_or_fail default: False
Whether to raise an exception and terminate the simulation if the covariance matrix is not positive definite. By default, when the post-regularised particle filter encounters this situation, it prints a warning message and leaves the particle ensemble unchanged.

5.5. Lookup tables

lookup_tables.TABLE_NAME
Defines a lookup table called 'TABLE_NAME'. The value can either be a string that identifies the input data file, or a table that identifies the input data file and whether to associate each particle with a different column from this table. See Provide observation models with lookup tables and Using lookup tables in your own models for examples.

5.6. Simulation model settings

The model table can be used to define any model-specific settings used by the simulation model that are not included in the particle state vector.

For example, a simulation model might require a population_size parameter that is constant across all of the particles. This could be defined as a model parameter and included in the particle state vector, but it could also be defined as a model setting:

[model]
population_size = 100

The simulation model can then retrieve this value from the simulation context:

def get_population_size(ctx):
    return ctx.settings['model']['population_size']

5.7. Simulation model prior distribution

prior.PARAMETER
Defines the prior distribution for the model parameter 'PARAMETER'.

There are two ways in which prior samples for a model parameter can be defined:

  1. As a distribution from which the sampler component will draw values.

    The specification of a distribution will depend on the choice of sampler component. See pypfilt.sampler for details, and see the predation scenario definition for an example of defining the model prior distribution.

    Note

    Below is an example of using the Independent sampler to specify a uniform distribution over the interval [5, 10]:

    [prior]
    x = { name = "uniform", args.low = 5.0, args.high = 10.0 }
    

    When using the LatinHypercube sampler, the equivalent definition is:

    [prior]
    x = { name = "uniform", args.loc = 5.0, args.scale = 5.0}
    
  2. As an external data source that contains the sample values. This allows you to use arbitrarily-complex model prior distributions, which can incorporate features such as correlations between model parameters.

    Samples can be read from space-delimited text files, by specifying the file name and the column name:

    [prior]
    x = { external = true, table = "input-file.ssv", column = "x" }
    

    Samples can also be read from HDF5 datasets, by specifying the file name, dataset path, and column name:

    [prior]
    x.external = true
    x.hdf5 = "prior-samples.hdf5"
    x.dataset = "data/prior-samples"
    x.column = "x"
    

5.8. Observation models

A scenario may include zero or more observation models. See the predation scenario definition for an example of defining two observation models for a single scenario.

observations.OBS_UNIT
Defines an observation model whose observation unit is 'OBS_UNIT'.
observations.OBS_UNIT.model
The observation model; see pypfilt.obs.Obs and pypfilt.obs.Univariate.
observations.OBS_UNIT.file default: None
The name of the input data file that contains observations for this observation model.
observations.OBS_UNIT.file_args default: {}

An optional table of named arguments to pass to the observation model’s from_file() method.

Note

These arguments are in addition to the input filename and the simulation time scale. For example, the Univariate class supports arbitrary column names via the time_col and value_col arguments.

observations.OBS_UNIT.parameters.NAME

The value for the observation model parameter 'NAME'.

Note

Observation model parameters are assumed to be scalar. If a list of values is provided, there will be a separate scenario instance for each value in this list. If lists of values are provided for multiple observation model parameters, there will be a separate scenario instances for each combination of parameter values (i.e., for each element in the Cartesian product of parameter values).

observations.OBS_UNIT.descriptor.name.NAME
The name used to identify the observation model parameter 'NAME' in output file names. The parameter name itself is a reasonable value to use, unless it contains special characters.
observations.OBS_UNIT.descriptor.format.NAME
The format string used to convert the value of the observation model parameter 'NAME' into a string, for use in output file names. See the format string syntax documentation for details.

Note

In addition to the above settings, the observations.OBS_UNIT table can be used to define other settings that are specific to the observation model. For example, see Provide observation models with lookup tables, where the observation model setting pr_obs_lookup is used to identify a lookup table.

5.9. Summary statistics

There are several settings that affect the summary statistics:

summary.init default: {}
An optional table of named arguments to pass to the summary component’s constructor.
summary.only_forecasts default: False
Whether to generate summary statistics only for forecast simulations, and ignore any estimation passes.
summary.save_history default: False
Whether to save the particle history matrix in the output file.
summary.save_backcast default: False
Whether to save the particle back-casts in the output file.
summary.metadata.packages default: []
The list of installed Python packages whose versions should be included in the output metadata. The version of pypfilt and all of its dependencies are automatically recorded and do not need to be specified here.

Each summary statistic is recorded by a summary table:

summary.tables.TABLE_NAME
Defines a summary table called 'TABLE_NAME'.
summary.tables.TABLE_NAME.component

Defines the component that will generate the summary table called 'TABLE_NAME'.

Note

Each summary table may require/support different settings. See pypfilt.summary for examples of using each of the summary tables provided by pypfilt.

Specific summary tables may require one or more summary monitors:

summary.monitors.MONITOR_NAME
Defines a summary monitor called 'MONITOR_NAME'.
summary.tables.MONITOR_NAME.component

Defines the component that implements the summary monitor called 'MONITOR_NAME'.

Note

Each summary monitor may require/support different settings. See the corresponding monitor documentation for details.

5.10. Scenarios and scenario-specific settings

Each scenario file should define one or more scenarios. See the predation scenario definition for an example.

scenario.SCENARIO_ID
Defines a scenario with the unique identifier 'SCENARIO_ID'.
scenario.SCENARIO_ID.name
A descriptive name for the scenario 'SCENARIO_ID'.

Each scenario inherits all of the settings defined in the scenario file, and these can be overridden with scenario-specific values. For example:

[components]
time = "pypfilt.Scalar"

[scenario.scalar_time]
name = "This scenario uses scalar time"

[scenario.date_time]
name = "This scenario uses datetime"
components.time = "pypfilt.Datetime"

Note

A scenario can override any setting in this way. This allows common settings to be defined only once, and for scenario-specific settings to be defined inside each scenario table.