Scenario Settings#
The Getting Started guide includes several example scenarios. Here we define all of the settings that may be defined in a scenario file. Scenarios are defined in TOML files. TOML is a simple, easy to read configuration file format, similar to JSON and YAML.
There are three kinds of settings:
Those that have default values (marked default) and only need to be defined if a non-default value is desired;
Those that are automatically defined (marked automatically defined) and cannot be overridden; and
Settings for which a value must be provided.
Core components#
Each of the following components must be defined for each scenario.
components.model
The simulation model; see
pypfilt.model.Model
.components.time
The simulation time scale; see
pypfilt.time
for available time scales.components.sampler
The prior distribution sampler; see
pypfilt.sampler
for available samplers.components.summary
The component that saves summary statistics and other outputs; see
pypfilt.summary.HDF5
.
Input and output files#
Input and output files can be stored in separate directories, and cache files can either be retained for future forecasts or deleted.
files.input_directory
The directory that contains input data files, such as lookup tables and observations.
files.output_directory
The directory in which the simulation results will be saved.
files.temp_directory
default:tempfile.gettempdir()
The directory in which temporary files, such as cache files, will be saved.
files.cache_file
default:None
The file used to store particle filter states between forecast simulations. See Caching particle states for details.
files.delete_cache_file_before_forecast
default:False
Whether to delete cached particle filter states (if any) before running forecast simulations; this only affects
pypfilt.forecast()
.files.delete_cache_file_after_forecast
default:False
Whether to delete cached particle filter states (if any) after running forecast simulations; this only affects
pypfilt.forecast()
.
Particle filter settings#
The following settings affect the behaviour of the particle filter itself.
filter.particles
The number of particles (i.e., model simulations).
filter.prng_seed
The seed for the various pseudo-random number generators, which are used to draw samples from the model prior distribution, to resample the particle ensemble, for use by stochastic simulation models, etc.
filter.history_window
The minimum number of time units that must be stored in the particle history matrix. For example, if an observation model describes the change in model state over two time units, the history window should be at least
2
. Set this to-1
to store the entire simulation period in the particle history matrix.filter.minimal_estimation_run
default:True
When generating forecasts with
pypfilt.forecast()
, whether to end the estimation pass at the final forecasting date, rather than continuing to end of the simulation period.filter.results.save_history
default:False
Whether to return the particle history matrix for each estimation and forecasting pass.
Note
Setting this to
True
can substantially increase memory usage when performing large numbers of estimation and/or forecasting passes.filter.reweight_or_fail
default:False
Whether to raise an exception and terminate the simulation if the updated particle weights sum to zero. By default, when the updated particle weights sum to zero, their previous weights will be used instead — effectively ignoring problematic observations.
filter.reweight.enabled
default:True
Whether to update particle weights in response to observations.
filter.reweight.exponent
automatically definedAn exponent to apply to observation likelihoods; should only take values between 0 and 1 (inclusive).
filter.resample.enabled
default:True
Whether to resample particles.
filter.resample.method
default:'deterministic'
The method to use when resampling the particle ensemble; see
pypfilt.resample.resample()
.filter.resample.threshold
default:0.25
The resampling threshold; when the effective number of particles (normalised to the unit interval) drops below this threshold, the particles will be resampled.
filter.resample.before_forecasting
default:False
Whether to resample the particles before each forecasting pass. Note that if the particle weights are already uniform, they will not be resampled.
filter.regularisation.enabled
default:False
Whether to use the post-regularised particle filter; see
pypfilt.resample.post_regularise()
.filter.regularisation.bounds
default:{}
The state vector fields to which post-regularisation will be applied. Lower and upper bounds must be provided for each field. See Regularisation for an example.
filter.regularisation.tolerance
default:1e-8
The minimum variation in a field’s value over the particle ensemble for it to be a candidate for post-regularisation. Fields whose range of values are smaller than this tolerance will be considered as constants, and will be ignored by the post-regularised particle filter.
filter.regularisation.bandwidth_scale
default:0.5
The bandwidth scaling factor for the regularisation kernel. This is defined relative to the optimal bandwidth for a Gaussian kernel (when the underlying density is Gaussian with unit covariance). See Improving Regularised Particle Filters for details.
filter.regularisation.regularise_or_fail
default:False
Whether to raise an exception and terminate the simulation if the covariance matrix is not positive definite. By default, when the post-regularised particle filter encounters this situation, it prints a warning message and leaves the particle ensemble unchanged.
filter.adaptive_fit.enabled
default:False
Whether to use an adaptive fitting method when generating forecasts with
pypfilt.forecast()
.filter.adaptive_fit.method
default:None
The method to use when performing adapting fitting; see
pypfilt.adaptive_fit()
.filter.adaptive_fit.target_effective_fraction
default:0.5
The target effective particle fraction for each estimation pass in the adaptive fitting process; used by the
"fit_effective_fraction"
method.filter.adaptive_fit.exponent_tolerance
default:0.001
The minimum increase in exponent for each estimation pass in the adaptive fitting process; used by the
"fit_effective_fraction"
method.filter.adaptive_fit.exponent_step
default:None
The sequence of exponents to use in each estimation pass; used by the
"fixed_exponents"
method. This may be a sequence of exponents, or a single value that defines the step size.
Particle ensemble partitions#
The particle ensemble can be divided into a number of non-overlapping partitions, each of which is assigned a fixed net probability mass. This allows the ensemble to maintain an invariant, such as a discrete distribution over some model parameter, over the estimation and forecasting passes. Partitions are defined by an array of tables:
filter.partition
An array of tables, each of which defines a partition; see the settings listed below.
filter.partition.particles
The number of particles contained in the partition. These must sum to the value of
filter.particles
.filter.partition.weight
The net probability mass allocated to the partition. These must sum to
1.0
.filter.partition.reservoir
default:False
Whether this partition acts as a “reservoir”. Particles in reservoir partitions are never reweighted or resampled. Reservoirs can be used for:
Preserving a representative sample of the model prior distribution, in which case the reservoir should have non-zero weight; and
Providing candidate particles when resampling other non-reservoir partitions, in which case the reservoir should have zero weight.
filter.partition.reservoir_partition
default: not definedIdentifies the reservoir partition associated with this non-reservoir partition.
filter.partition.reservoir_fraction
default: not definedThe fraction of particles that will be sampled from the associated reservoir when resampling this non-reservoir partition.
For example, the ensemble can be divided into two partitions of different sizes and weights:
[filter]
particles = 1000
[[filter.partition]]
particles = 600
weight = 0.75
[[filter.partition]]
particles = 400
weight = 0.25
The ensemble can also be divided into a small reservoir partition and a large non-reservoir partition, where every time the non-reservoir partition is resampled, 20% of particles are selected from the reservoir:
[filter]
particles = 1000
# Define partition #1.
[[filter.partition]]
particles = 100
weight = 0.0 # These particles have zero weight in the ensemble.
reservoir = true
# Define partition #2.
[[filter.partition]]
particles = 900
weight = 1.0 # This partition represents 100% of the ensemble.
reservoir_partition = 1 # Use partition #1 as a reservoir.
reservoir_fraction = 0.2 # Select 20% of particles from the reservoir.
The following settings are automatically defined:
filter.partition.slice
automatically definedA slice object that can be used to select the particles included in this partition.
filter.partition.reservoir_ix
automatically definedThe zero-based index of the reservoir associated with this partition, if any.
Note
Use the slice object for indexing into state vectors, because it will return a view. In contrast, using a mask array will return a copy. See Indexing on ndarrays for further details.
Lookup tables#
lookup_tables.TABLE_NAME
Defines a lookup table called
'TABLE_NAME'
. The value can either be a string that identifies the input data file, or a table that identifies the input data file and whether to associate each particle with a different column from this table. See Provide observation models with lookup tables and Using lookup tables in your own models for examples.
Simulation model settings#
The model
table can be used to define any model-specific settings used by the simulation model that are not included in the particle state vector.
For example, a simulation model might require a population_size
parameter that is constant across all of the particles.
This could be defined as a model parameter and included in the particle state vector, but it could also be defined as a model setting:
[model]
population_size = 100
The simulation model can then retrieve this value from the simulation context:
def get_population_size(ctx):
return ctx.settings['model']['population_size']
Ordinary differential equation solvers#
For models that inherit from OdeModel
, the integration method and custom solver options can be specified in the model
table.
For example:
[model]
ode_method = "RK23" # Explicit Runge-Kutta method of order 3(2).
ode_solver_options.rtol = 1e-2 # Reduce the relative tolerance.
ode_solver_options.atol = 1e-4 # Reduce the absolute tolerance.
See the scipy.integrate.solve_ivp() documentation for supported integration methods and solver options.
Simulation model prior distribution#
prior.PARAMETER
Defines the prior distribution for the model parameter
'PARAMETER'
.
There are two ways in which prior samples for a model parameter can be defined:
As a distribution from which the
sampler
component will draw values.The specification of a distribution will depend on the choice of
sampler
component. Seepypfilt.sampler
for details, and see the Defining multiple scenarios for examples of defining the model prior distribution.Note
When using the
LatinHypercube
sampler, the following specifies a uniform distribution over the interval[5, 10]
:[prior] x = { name = "uniform", args.loc = 5.0, args.scale = 5.0}
As an external data source that contains the sample values. This allows you to use arbitrarily-complex model prior distributions, which can incorporate features such as correlations between model parameters.
Samples can be read from space-delimited text files, by specifying the file name and the column name:
[prior] x = { external = true, table = "input-file.ssv", column = "x" }
Samples can also be read from HDF5 datasets, by specifying the file name, dataset path, and column name:
[prior] x.external = true x.hdf5 = "prior-samples.hdf5" x.dataset = "data/prior-samples" x.column = "x"
Observation models#
A scenario may include zero or more observation models. See Defining a scenario and Defining multiple scenarios for examples of defining multiple observation models for a single scenario.
observations.OBS_UNIT
Defines an observation model whose observation unit is
'OBS_UNIT'
.observations.OBS_UNIT.model
The observation model; see
pypfilt.obs.Obs
andpypfilt.obs.Univariate
.observations.OBS_UNIT.file
default:None
The name of the input data file that contains observations for this observation model.
observations.OBS_UNIT.file_args
default:{}
An optional table of named arguments to pass to the observation model’s
from_file()
method.Note
These arguments are in addition to the input filename and the simulation time scale. For example, the
Univariate
class supports arbitrary column names via thetime_col
andvalue_col
arguments
.observations.OBS_UNIT.parameters.NAME
The value for the observation model parameter
'NAME'
.Note
Observation model parameters are assumed to be scalar. If a list of values is provided, there will be a separate scenario instance for each value in this list. If lists of values are provided for multiple observation model parameters, there will be a separate scenario instances for each combination of parameter values (i.e., for each element in the Cartesian product of parameter values).
observations.OBS_UNIT.descriptor.name.NAME
The name used to identify the observation model parameter
'NAME'
in output file names. The parameter name itself is a reasonable value to use, unless it contains special characters.observations.OBS_UNIT.descriptor.format.NAME
The format string used to convert the value of the observation model parameter
'NAME'
into a string, for use in output file names. See the format string syntax documentation for details.
Note
In addition to the above settings, the observations.OBS_UNIT
table can be used to define other settings that are specific to the observation model.
For example, see Provide observation models with lookup tables, where the observation model setting pr_obs_lookup
is used to identify a lookup table.
Summary statistics#
There are several settings that affect the summary statistics:
summary.init
default:{}
An optional table of named arguments to pass to the summary component’s constructor.
summary.only_forecasts
default:False
Whether to generate summary statistics only for forecast simulations, and ignore any estimation passes.
summary.save_history
default:False
Whether to save the particle history matrix in the output file.
summary.save_backcast
default:False
Whether to save the particle back-casts in the output file.
summary.metadata.packages
default:[]
The list of installed Python packages whose versions should be included in the output metadata. The version of pypfilt and all of its dependencies are automatically recorded and do not need to be specified here.
Each summary statistic is recorded by a summary table:
summary.tables.TABLE_NAME
Defines a summary table called
'TABLE_NAME'
.summary.tables.TABLE_NAME.component
Defines the component that will generate the summary table called
'TABLE_NAME'
.Note
Each summary table may require/support different settings. See
pypfilt.summary
for examples of using each of the summary tables provided by pypfilt.
Specific summary tables may require one or more summary monitors:
summary.monitors.MONITOR_NAME
Defines a summary monitor called
'MONITOR_NAME'
.summary.tables.MONITOR_NAME.component
Defines the component that implements the summary monitor called
'MONITOR_NAME'
.Note
Each summary monitor may require/support different settings. See the corresponding monitor documentation for details.
Scenarios and scenario-specific settings#
Each scenario file should define one or more scenarios. See the Defining multiple scenarios for an example.
scenario.SCENARIO_ID
Defines a scenario with the unique identifier
'SCENARIO_ID'
.scenario.SCENARIO_ID.name
A descriptive name for the scenario
'SCENARIO_ID'
.
Each scenario inherits all of the settings defined in the scenario file, and these can be overridden with scenario-specific values. For example:
[components]
time = "pypfilt.Scalar"
[scenario.scalar_time]
name = "This scenario uses scalar time"
[scenario.date_time]
name = "This scenario uses datetime"
components.time = "pypfilt.Datetime"
Note
A scenario can override any setting in this way. This allows common settings to be defined only once, and for scenario-specific settings to be defined inside each scenario table.
Warning
It isn’t possible to remove a common setting by defining an empty scenario-specific setting.
See override_dict()
for details.