pypfilt.io#

The pypfilt.io module provides functions for reading tabular data from data files, creating lookup tables, and saving and loading summary tables.

Reading tabular data#

Use read_fields() to read tabular data (such as observations) from plain-text files.

pypfilt.io.read_fields(time_scale, path, fields, comment='#', encoding='utf-8')#

Read data from a space-delimited text file with column headers defined in the first non-comment line.

This is wrapper for read_table() that ensures time columns are identifiable.

Note

Use time_field() to identify columns that contain time values. See the example below.

Warning

This does not handle string columns. See read_table() for a potential solution.

Parameters:

time_scale – The simulation time scale.
path – The path to the data file.
fields – The columns to read from the data file, represented as a sequence of (name, type) tuples, where type must be a NumPy scalar.
comment – The characters, or list of characters, that indicate the start of a single-line comment.
encoding – The name of the encoding used to decode the file content.

Raises:

ValueError – if columns contains a string column.

Example:

The following function reads a time series of floating-point values.

import numpy as np
import pypfilt.io


def load_time_series(self, filename, time_scale):
    fields = [pypfilt.io.time_field('time'), ('value', np.float64)]
    return pypfilt.io.read_fields(time_scale, filename, fields)

pypfilt.io.string_field(name)#

Return a (name, type) tuple that identifies a field as containing string values.

Use this function to define summary table fields that contain string values.

Examples:

>>> import numpy as np
>>> from pypfilt.io import string_field
>>> fields = [string_field('parameter_name'), ('value', np.float64)]

pypfilt.io.time_field(name)#

Return a (name, type) tuple that identifies a field as containing time values.

Use this function to define summary table fields that contain time values.

Examples:

>>> import numpy as np
>>> from pypfilt.io import time_field
>>> fields = [time_field('time'), ('value', np.float64)]

pypfilt.io.read_table(path, columns, comment='#', encoding='utf-8')#

Read data from a space-delimited text file with column headers defined in the first non-comment line.

Warning

This does not handle string columns. To load tabular data that includes string values, you should use numpy.genfromtxt and then change the array data type:

import numpy as np
import pypfilt
from pypfilt.io import time_field, string_field, fields_dtype

# Load data from a text file.
filename = 'input.ssv'
table = np.genfromtxt(filename, dtype=None)

# Define the table fields.
fields = [time_field('time'), string_field('location')]
time_scale = pypfilt.Datetime()
dtype = fields_dtype(time_scale, fields)

# Change the array data type.
table = table.asdtype(dtype)

Parameters:

path – The path to the data file.
columns – The columns to read from the data file, represented as a sequence of (name, type) tuples where type must be a NumPy scalar, or (name, type, converter) tuples where converter is a function that converts the column string into the desired value.
comment – The characters, or list of characters, that indicate the start of a single-line comment.
encoding – The name of the encoding used to decode the file content.

Raises:

ValueError – if columns contains a string column.

Examples:

>>> from pypfilt.io import date_column, read_table
>>> import numpy as np
>>> import datetime
>>> path = 'input_data.ssv'
>>> with open(path, 'w') as f:
...     _ = f.write('time value\n')
...     _ = f.write('2020-01-01 1\n')
...     _ = f.write('2020-01-02 3\n')
...     _ = f.write('2020-01-03 5\n')
>>> columns = [date_column('time'), ('value', np.int64)]
>>> data = read_table(path, columns)
>>> isinstance(data['time'][0], datetime.datetime)
True
>>> observations = [
...     {'time': row['time'], 'value': row['value']} for row in data
... ]
>>> # Remove the input file when it is no longer needed.
>>> import os
>>> os.remove(path)

pypfilt.io.date_column(name, fmt='%Y-%m-%d')#

Return a (name, type, converter) tuple that can be used with read_table() to convert a column into datetime.datetime values.

Note

Where dates are used for observation times, they should be represented as datetime.datetime values, not as datetime.date values. This is why this function returns a converter that returns datetime.datetime values.

Parameters:

name (str) – The column name in the data file.
fmt (str) – The date format used to parse the column values.

pypfilt.io.datetime_column(name, fmt='%Y-%m-%dT%H:%M:%S')#

Return a (name, type, converter) tuple that can be used with read_table() to convert a column into datetime.datetime values.

Parameters:

name (str) – The column name in the data file.
fmt (str) – The datetime format used to parse the column values.

Writing tabular data#

Use write_table() to save summary tables to plain-text files.

pypfilt.io.write_table(path, table, time_scale=None, columns=None, encoding='utf-8')#

Write a data table to a space-delimited text file with column headers.

Parameters:

path – The path to the output file.
table – The data table.
time_scale (pypfilt.time.Time) – The simulation time scale. If this is not provided, time values will be converted to strings using str().
columns – The subset of table columns to write to the output file.
encoding – The name of the encoding used to decode the file content.

Warning

This does not check whether string columns and time values contain whitespace.

Lookup tables#

The pypfilt.io module also provides lookup tables, which are used to retrieve time-indexed values (e.g., time-varying model inputs).

Note

Define one or more lookup tables in the scenario definition (see Lookup table settings). You can then retrieve the corresponding Lookup component from the simulation context:

table_name = 'some_name'
lookup_time = 1.0
table = context.component['lookup'][table_name]
values = table.lookup(lookup_time)

class pypfilt.io.Lookup(lookup_table)#

Lookup tables provide a means of retrieving time-indexed quantities, which can be used to incorporate time-varying effects into simulation models and observation models.

Parameters:: lookup_table – A data table, typically loaded with read_lookup_table().

value_count()#: Return the number of value columns in the lookup table.

lookup(when)#: Return the value(s) associated with a specific time.

times()#: Return the array of times for which values are defined.

start()#: Return the first time for which values are defined.

end()#: Return the final time for which values are defined.

pypfilt.io.read_lookup_table(path, time, dtype='f8', comment='#', encoding='utf-8')#

Read time-indexed data from a space-delimited text file with column headers defined in the first non-comment line.

Parameters:

path – The path to the data file.
time (pypfilt.time.Time) – The time scale.
dtype – The type of the lookup values.
comment – The characters, or list of characters, that indicate the start of a single-line comment.
encoding – The name of the encoding used to decode the file content.

Examples:

>>> from pypfilt.io import read_lookup_table, lookup
>>> from pypfilt.time import Datetime
>>> import datetime
>>> path = 'input_data.ssv'
>>> with open(path, 'w') as f:
...     _ = f.write('time value1 value2 value3\n')
...     _ = f.write('2020-01-01 1.0 1.5 2.0\n')
>>> time = Datetime()
>>> table = read_lookup_table(path, time)
>>> isinstance(table['time'][0], datetime.datetime)
True
>>> when = datetime.datetime(2020, 1, 1)
>>> values = lookup(table, when)
>>> len(values.shape) == 1
True
>>> all(isinstance(value, float) for value in values)
True
>>> # Remove the input file when it is no longer needed.
>>> import os
>>> os.remove(path)

pypfilt.io.lookup_values_count(lookup_table)#: Return the number of value columns in a lookup table.

pypfilt.io.lookup_times(lookup_table)#: Return the times for which the lookup table contains values.

pypfilt.io.lookup(lookup_table, when)#: Return the values associated with a specific time.

Summary tables#

The pypfilt.io module provides convenience functions for preserving time values when saving and loading summary tables.

Note

When implementing a summary table, simply ensure that the field_types() method uses pypfilt.io.time_field() to identify each field that will contain time values, and pypfilt.io.string_field() to identify each field that will contain string values.

You can then use load_summary_tables(), load_summary_table(), and load_dataset() to retrieve saved tables and ensure that all time values are in the expected format.

pypfilt.io.load_observations(data_file, obs_units=None, time_scale=None, fmt=None)#

Return a dictionary that contains each set of observations in the data file, using observation units as keys.

Parameters:

data_file – The path to a HDF5 data file.
obs_units – An optional list of observation units for which observations should be loaded.
time_scale – The time scale that was used to encode time values, or a scenario instance or simulation context with a valid time scale. If set to None, the time scale will be identified by calling detect_timescale().
fmt – The desired dataset format; supported formats are NumPy structured arrays (None, default), Pandas data frames ("pandas"), and Polars data frames ("polars").

Raises:

ValueError – if an observation unit included in obs_units does not exist, or if fmt is invalid.

pypfilt.io.load_summary_tables(data_file, table_names=None, time_scale=None, fmt=None)#

Return a dictionary that contains each summary table in the data file, using table names as keys.

Parameters:

data_file – The path to a HDF5 data file.
table_names – An optional list of table names that should be loaded. When provided, any table whose name does not appear in table_names will be ignored.
time_scale – The time scale that was used to encode time values, or a scenario instance or simulation context with a valid time scale. If set to None, the time scale will be identified by calling detect_timescale().
fmt – The desired dataset format; supported formats are NumPy structured arrays (None, default), Pandas data frames ("pandas"), and Polars data frames ("polars").

Raises:

ValueError – if a table included in table_names does not exist, or if fmt is invalid.

pypfilt.io.load_summary_table(time_scale, data_file, dataset_path, subset=(), fmt=None)#

Load a summary table from a HDF5 dataset, converting stored types into native types as necessary.

Note

If you are loading multiple tables from the same data file, you may want to open the data file yourself and use load_dataset() to load each table.

Parameters:

time_scale – The time scale that was used to encode time values, or a scenario instance or simulation context with a valid time scale. If set to None, the time scale will be identified by calling detect_timescale().
data_file – The path to a HDF5 data file.
dataset_path – The path to the HDF5 dataset.
subset – A slicing specification used to load a subset of the data. By default, the entire dataset is loaded. Accepted values include indices, slices, and field names. See the h5py documentation for details.
fmt – The desired dataset format; supported formats are NumPy structured arrays (None, default), Pandas data frames ("pandas"), and Polars data frames ("polars").

Raises:

ValueError – if fmt is invalid.

Examples:

import pypfilt
from pypfilt.io import load_summary_table

data_file = 'output.hdf5'
dataset_path = '/path/to/my/dataset'
time_scale = pypfilt.Datetime()

# Load the entire dataset.
table = load_summary_table(time_scale, data_file, dataset_path)

# Load every 10th row.
subset = slice(None, None, 10)
table = load_summary_table(time_scale, data_file, dataset_path, subset)

pypfilt.io.load_dataset(time_scale, dataset, subset=(), fmt=None)#

Load a structured array from a HDF5 dataset, converting stored types into native types as necessary.

Parameters:

time_scale – The time scale that was used to encode time values, or a scenario instance or simulation context with a valid time scale.
dataset – The HDF5 dataset.
subset – A slicing specification used to load a subset of the data. By default, the entire dataset is loaded. Accepted values include indices, slices, and field names. See the h5py documentation for details.
fmt – The desired dataset format; supported formats are NumPy structured arrays (None, default), Pandas data frames ("pandas"), and Polars data frames ("polars").

Raises:

ValueError – if fmt is invalid.

Examples:

import h5py
import pypfilt

data_file = 'output.hdf5'
dataset_path = '/path/to/my/dataset'
time_scale = pypfilt.Datetime()

# Load the entire dataset.
with h5py.File(data_file, 'r') as f:
    dataset = f[dataset_path]
    table = pypfilt.io.load_dataset(time_scale, dataset)

# Load every 10th row.
subset = slice(None, None, 10)
with h5py.File(data_file, 'r') as f:
    dataset = f[dataset_path]
    table = pypfilt.io.load_dataset(time_scale, dataset, subset)

Note

When

The pypfilt.summary.HDF5 class ensures that all summary tables are constructed with the appropriate data type and are saved with the necessary metadata, so you should never need to use the following functions.

pypfilt.io.fields_dtype(time_scale, fields)#

Return a NumPy data type (dtype) object that describes the provided data fields, and identifies fields that contain time values and string values.

Parameters:: time_scale – The simulation time scale, or a scenario instance or simulation context with a valid time scale.

pypfilt.io.save_dataset(time_scale, group, name, table, **kwargs)#

Save a structured array as a HDF5 dataset, converting native types into stored types as necessary.

Returns:: The HDF5 dataset.

pypfilt.io.detect_timescale(data_file)#

Return an instance of the time scale that was used to encode time values in the provided data file, or None if the time scale could not be identified.

Parameters:: data_file – The path to a HDF5 data file.