5. Scenario Settings¶
The predation scenario definition provides an example of how to define forecast scenarios. Here we define all of the settings that may be defined in a scenario file. Scenarios are defined in TOML files. TOML is a simple, easy to read configuration file format, similar to JSON and YAML.
There are three kinds of settings:
- Those that have default values (marked default) and only need to be defined if a non-default value is desired;
- Those that are automatically defined (marked automatically defined) and cannot be overridden; and
- Settings for which a value must be provided.
5.1. Core components¶
Each of the following components must be defined for each scenario.
components.model
- The simulation model; see
pypfilt.model.Model
. components.time
- The simulation time scale; see
pypfilt.time
for available time scales. components.sampler
- The prior distribution sampler; see
pypfilt.sampler
for available samplers. components.summary
- The component that saves summary statistics and other outputs; see
pypfilt.summary.HDF5
.
5.3. Input and output files¶
Input and output files can be stored in separate directories, and cache files can either be retained for future forecasts or deleted.
files.input_directory
- The directory that contains input data files, such as lookup tables and observations.
files.output_directory
- The directory in which the simulation results will be saved.
files.temp_directory
default:tempfile.gettempdir()
- The directory in which temporary files, such as cache files, will be saved.
files.delete_cache_file_before_forecast
default:False
- Whether to delete cached particle filter states (if any) before running forecast simulations; this only affects
pypfilt.forecast()
. files.delete_cache_file_after_forecast
default:False
- Whether to delete cached particle filter states (if any) after running forecast simulations; this only affects
pypfilt.forecast()
.
5.4. Particle filter settings¶
The following settings affect the behaviour of the particle filter itself.
filter.particles
- The number of particles (i.e., model simulations).
filter.prng_seed
- The seed for the various pseudo-random number generators, which are used to draw samples from the model prior distribution, to resample the particle ensemble, for use by stochastic simulation models, etc.
filter.history_window
- The minimum number of time units that must be stored in the particle history matrix.
For example, if an observation model describes the change in model state over two time units, the history window should be at least
2
. Set this to-1
to store the entire simulation period in the particle history matrix. filter.minimal_estimation_run
default:True
- When generating forecasts with
pypfilt.forecast()
, whether to end the estimation pass at the final forecasting date, rather than continuing to end of the simulation period. filter.resample.method
default:'deterministic'
- The method to use when resampling the particle ensemble; see
pypfilt.resample.resample()
. filter.resample.threshold
default:0.25
- The resampling threshold; when the effective number of particles (normalised to the unit interval) drops below this threshold, the particles will be resampled.
filter.regularisation.enabled
default:False
- Whether to use the post-regularised particle filter; see
pypfilt.resample.post_regularise()
. filter.regularisation.bounds
default:{}
- The state vector fields to which post-regularisation will be applied. Lower and upper bounds must be provided for each field. See the predation scenario definition for an example.
filter.regularisation.tolerance
default:1e-8
- The minimum variation in a field’s value over the particle ensemble for it to be a candidate for post-regularisation. Fields whose range of values are smaller than this tolerance will be considered as constants, and will be ignored by the post-regularised particle filter.
filter.regularisation.regularise_or_fail
default:False
- Whether to raise an exception and terminate the simulation if the covariance matrix is not positive definite. By default, when the post-regularised particle filter encounters this situation, it prints a warning message and leaves the particle ensemble unchanged.
5.5. Lookup tables¶
lookup_tables.TABLE_NAME
- Defines a lookup table called
'TABLE_NAME'
. The value can either be a string that identifies the input data file, or a table that identifies the input data file and whether to associate each particle with a different column from this table. See Provide observation models with lookup tables and Using lookup tables in your own models for examples.
5.6. Simulation model settings¶
The model
table can be used to define any model-specific settings used by the simulation model that are not included in the particle state vector.
For example, a simulation model might require a population_size
parameter that is constant across all of the particles.
This could be defined as a model parameter and included in the particle state vector, but it could also be defined as a model setting:
[model]
population_size = 100
The simulation model can then retrieve this value from the simulation context:
def get_population_size(ctx):
return ctx.settings['model']['population_size']
5.7. Simulation model prior distribution¶
prior.PARAMETER
- Defines the prior distribution for the model parameter
'PARAMETER'
.
There are two ways in which prior samples for a model parameter can be defined:
As a distribution from which the
sampler
component will draw values.The specification of a distribution will depend on the choice of
sampler
component. Seepypfilt.sampler
for details, and see the predation scenario definition for an example of defining the model prior distribution.Note
Below is an example of using the
Independent
sampler to specify a uniform distribution over the interval[5, 10]
:[prior] x = { name = "uniform", args.low = 5.0, args.high = 10.0 }
When using the
LatinHypercube
sampler, the equivalent definition is:[prior] x = { name = "uniform", args.loc = 5.0, args.scale = 5.0}
As an external data source that contains the sample values. This allows you to use arbitrarily-complex model prior distributions, which can incorporate features such as correlations between model parameters.
Samples can be read from space-delimited text files, by specifying the file name and the column name:
[prior] x = { external = true, table = "input-file.ssv", column = "x" }
Samples can also be read from HDF5 datasets, by specifying the file name, dataset path, and column name:
[prior] x.external = true x.hdf5 = "prior-samples.hdf5" x.dataset = "data/prior-samples" x.column = "x"
5.8. Observation models¶
A scenario may include zero or more observation models. See the predation scenario definition for an example of defining two observation models for a single scenario.
observations.OBS_UNIT
- Defines an observation model whose observation unit is
'OBS_UNIT'
. observations.OBS_UNIT.model
- The observation model; see
pypfilt.obs.Obs
andpypfilt.obs.Univariate
. observations.OBS_UNIT.file
default:None
- The name of the input data file that contains observations for this observation model.
observations.OBS_UNIT.file_args
default:{}
An optional table of named arguments to pass to the observation model’s
from_file()
method.Note
These arguments are in addition to the input filename and the simulation time scale. For example, the
Univariate
class supports arbitrary column names via thetime_col
andvalue_col
arguments
.observations.OBS_UNIT.parameters.NAME
The value for the observation model parameter
'NAME'
.Note
Observation model parameters are assumed to be scalar. If a list of values is provided, there will be a separate scenario instance for each value in this list. If lists of values are provided for multiple observation model parameters, there will be a separate scenario instances for each combination of parameter values (i.e., for each element in the Cartesian product of parameter values).
observations.OBS_UNIT.descriptor.name.NAME
- The name used to identify the observation model parameter
'NAME'
in output file names. The parameter name itself is a reasonable value to use, unless it contains special characters. observations.OBS_UNIT.descriptor.format.NAME
- The format string used to convert the value of the observation model parameter
'NAME'
into a string, for use in output file names. See the format string syntax documentation for details.
Note
In addition to the above settings, the observations.OBS_UNIT
table can be used to define other settings that are specific to the observation model.
For example, see Provide observation models with lookup tables, where the observation model setting pr_obs_lookup
is used to identify a lookup table.
5.9. Summary statistics¶
There are several settings that affect the summary statistics:
summary.init
default:{}
- An optional table of named arguments to pass to the summary component’s constructor.
summary.only_forecasts
default:False
- Whether to generate summary statistics only for forecast simulations, and ignore any estimation passes.
summary.save_history
default:False
- Whether to save the particle history matrix in the output file.
summary.save_backcast
default:False
- Whether to save the particle back-casts in the output file.
summary.metadata.packages
default:[]
- The list of installed Python packages whose versions should be included in the output metadata. The version of pypfilt and all of its dependencies are automatically recorded and do not need to be specified here.
Each summary statistic is recorded by a summary table:
summary.tables.TABLE_NAME
- Defines a summary table called
'TABLE_NAME'
. summary.tables.TABLE_NAME.component
Defines the component that will generate the summary table called
'TABLE_NAME'
.Note
Each summary table may require/support different settings. See
pypfilt.summary
for examples of using each of the summary tables provided by pypfilt.
Specific summary tables may require one or more summary monitors:
summary.monitors.MONITOR_NAME
- Defines a summary monitor called
'MONITOR_NAME'
. summary.tables.MONITOR_NAME.component
Defines the component that implements the summary monitor called
'MONITOR_NAME'
.Note
Each summary monitor may require/support different settings. See the corresponding monitor documentation for details.
5.10. Scenarios and scenario-specific settings¶
Each scenario file should define one or more scenarios. See the predation scenario definition for an example.
scenario.SCENARIO_ID
- Defines a scenario with the unique identifier
'SCENARIO_ID'
. scenario.SCENARIO_ID.name
- A descriptive name for the scenario
'SCENARIO_ID'
.
Each scenario inherits all of the settings defined in the scenario file, and these can be overridden with scenario-specific values. For example:
[components]
time = "pypfilt.Scalar"
[scenario.scalar_time]
name = "This scenario uses scalar time"
[scenario.date_time]
name = "This scenario uses datetime"
components.time = "pypfilt.Datetime"
Note
A scenario can override any setting in this way. This allows common settings to be defined only once, and for scenario-specific settings to be defined inside each scenario table.