Working with data frames#
We have already shown how to run forecasts with pypfilt.forecast() and inspect the results returned by this function.
For a more robust workflow, we should also save the forecast results to an output (HDF5) file:
pypfilt.forecast(context, forecast_times, filename='output_file.hdf5')
Note
HDF5 is a file format that allows you to store lots of data tables and related metadata in a single file. You can explore the contents of an HDF5 file with the h5py package.
Loading data frames#
The pypfilt.io module provides functions for loading results as Pandas or Polars data frames, including:
load_observations()for loading input observations; andload_summary_tables()for loading summary statistic tables.
This offers greater flexibility in analysing and visualising the results of your simulations.
Summary table structure#
Recall that in Plotting the results we had to combine the credible intervals from the estimation pass and the forecast from \(t = 20\)
# Collect credible intervals for the recent backcast and the forecast.
fit = results.estimation.tables['forecasts']
forecast = results.forecasts[forecast_time].tables['forecasts']
credible_intervals = np.concatenate(
(fit[fit['time'] >= backcast_time], forecast)
)
When saving results to an HDF5 file, the summary tables for the estimation pass and each forecasting pass are combined into a single table.
For example, the 'forecasts' tables from the estimation pass and the forecast at \(t = 20\) (as shown in the code block above) are combined into a single 'forecasts' table that contains both sets of results.
These combined tables contain an additional 'fs_time' column:
For each forecast, the
'fs_time'column will contain the forecast time. For example, if we run a forecast from \(t = 20\), the'fs_time'column will contain the value20for the rows associated with this forecast.For the estimation pass, the
'fs_time'column will contain the end of the simulation period. For example, the simulation period for each scenario in this tutorial starts at \(t = 0\) and ends at \(t = 25\), and so the'fs_time'column will contain the value25for the rows associated with the estimation pass.
Reproducibility#
Saving the results to an HDF5 file also has several advantages for reproducible results:
This collects all of the input observations and summary tables in a single file;
This file also contains the scenario settings, and the version number of pypfilt and related packages;
If you are working in a git repository, this file will also include the current git commit, branch name, and a list of modified files.