pypfilt.io#
The pypfilt.io
module provides functions for reading tabular data from
data files.
Reading tabular data#
Use read_fields()
to read tabular data (such as observations) from plain-text files.
- pypfilt.io.read_fields(time_scale, path, fields, comment='#', encoding='utf-8')#
Read data from a space-delimited text file with column headers defined in the first non-comment line.
This is wrapper for
read_table()
that ensures time columns are identifiable.Note
Use
time_field()
to identify columns that contain time values. See the example below.Warning
This does not handle string columns. See
read_table()
for a potential solution.- Parameters:
time_scale – The simulation time scale.
path – The path to the data file.
fields – The columns to read from the data file, represented as a sequence of
(name, type)
tuples, wheretype
must be a NumPy scalar.comment – The characters, or list of characters, that indicate the start of a single-line comment.
encoding – The name of the encoding used to decode the file content.
- Raises:
ValueError – if
columns
contains a string column.- Example:
The following function reads a time series of floating-point values.
import numpy as np import pypfilt.io def load_time_series(self, filename, time_scale): fields = [pypfilt.io.time_field('time'), ('value', np.float64)] return pypfilt.io.read_fields(time_scale, filename, fields)
- pypfilt.io.string_field(name)#
Return a
(name, type)
tuple that identifies a field as containing string values.Use this function to define summary table fields that contain string values.
- Examples:
>>> import numpy as np >>> from pypfilt.io import string_field >>> fields = [string_field('parameter_name'), ('value', np.float64)]
- pypfilt.io.time_field(name)#
Return a
(name, type)
tuple that identifies a field as containing time values.Use this function to define summary table fields that contain time values.
- Examples:
>>> import numpy as np >>> from pypfilt.io import time_field >>> fields = [time_field('time'), ('value', np.float64)]
- pypfilt.io.read_table(path, columns, comment='#', encoding='utf-8')#
Read data from a space-delimited text file with column headers defined in the first non-comment line.
Warning
This does not handle string columns. To load tabular data that includes string values, you should use numpy.genfromtxt and then change the array data type:
import numpy as np import pypfilt from pypfilt.io import time_field, string_field, fields_dtype # Load data from a text file. filename = 'input.ssv' table = np.genfromtxt(filename, dtype=None) # Define the table fields. fields = [time_field('time'), string_field('location')] time_scale = pypfilt.Datetime() dtype = fields_dtype(time_scale, fields) # Change the array data type. table = table.asdtype(dtype)
- Parameters:
path – The path to the data file.
columns – The columns to read from the data file, represented as a sequence of
(name, type)
tuples wheretype
must be a NumPy scalar, or(name, type, converter)
tuples whereconverter
is a function that converts the column string into the desired value.comment – The characters, or list of characters, that indicate the start of a single-line comment.
encoding – The name of the encoding used to decode the file content.
- Raises:
ValueError – if
columns
contains a string column.- Examples:
>>> from pypfilt.io import date_column, read_table >>> import numpy as np >>> import datetime >>> path = 'input_data.ssv' >>> with open(path, 'w') as f: ... _ = f.write('time value\n') ... _ = f.write('2020-01-01 1\n') ... _ = f.write('2020-01-02 3\n') ... _ = f.write('2020-01-03 5\n') >>> columns = [date_column('time'), ('value', np.int64)] >>> data = read_table(path, columns) >>> isinstance(data['time'][0], datetime.datetime) True >>> observations = [ ... {'time': row['time'], 'value': row['value']} for row in data ... ] >>> # Remove the input file when it is no longer needed. >>> import os >>> os.remove(path)
- pypfilt.io.date_column(name, fmt='%Y-%m-%d')#
Return a
(name, type, converter)
tuple that can be used withread_table()
to convert a column intodatetime.datetime
values.Note
Where dates are used for observation times, they should be represented as
datetime.datetime
values, not asdatetime.date
values. This is why this function returns a converter that returnsdatetime.datetime
values.- Parameters:
name (str) – The column name in the data file.
fmt (str) – The date format used to parse the column values.
- pypfilt.io.datetime_column(name, fmt='%Y-%m-%dT%H:%M:%S')#
Return a
(name, type, converter)
tuple that can be used withread_table()
to convert a column intodatetime.datetime
values.- Parameters:
name (str) – The column name in the data file.
fmt (str) – The datetime format used to parse the column values.
Writing tabular data#
Use write_table()
to save summary tables to plain-text files.
- pypfilt.io.write_table(path, table, time_scale=None, columns=None, encoding='utf-8')#
Write a data table to a space-delimited text file with column headers.
- Parameters:
path – The path to the output file.
table – The data table.
time_scale (pypfilt.time.Time) – The simulation time scale. If this is not provided, time values will be converted to strings using
str()
.columns – The subset of table columns to write to the output file.
encoding – The name of the encoding used to decode the file content.
Warning
This does not check whether string columns and time values contain whitespace.
Lookup tables#
The pypfilt.io
module also provides lookup tables, which are used to
retrieve time-indexed values (e.g., time-varying model inputs).
Note
Define one or more lookup tables in the scenario definition (see Lookup table settings).
You can then retrieve the corresponding Lookup
component from the simulation context:
table_name = 'some_name'
lookup_time = 1.0
table = context.component['lookup'][table_name]
values = table.lookup(lookup_time)
- class pypfilt.io.Lookup(lookup_table)#
Lookup tables provide a means of retrieving time-indexed quantities, which can be used to incorporate time-varying effects into simulation models and observation models.
- Parameters:
lookup_table – A data table, typically loaded with
read_lookup_table()
.
- value_count()#
Return the number of value columns in the lookup table.
- lookup(when)#
Return the value(s) associated with a specific time.
- times()#
Return the array of times for which values are defined.
- start()#
Return the first time for which values are defined.
- end()#
Return the final time for which values are defined.
- pypfilt.io.read_lookup_table(path, time, dtype='f8', comment='#', encoding='utf-8')#
Read time-indexed data from a space-delimited text file with column headers defined in the first non-comment line.
- Parameters:
path – The path to the data file.
time (pypfilt.time.Time) – The time scale.
dtype – The type of the lookup values.
comment – The characters, or list of characters, that indicate the start of a single-line comment.
encoding – The name of the encoding used to decode the file content.
- Examples:
>>> from pypfilt.io import read_lookup_table, lookup >>> from pypfilt.time import Datetime >>> import datetime >>> path = 'input_data.ssv' >>> with open(path, 'w') as f: ... _ = f.write('time value1 value2 value3\n') ... _ = f.write('2020-01-01 1.0 1.5 2.0\n') >>> time = Datetime() >>> table = read_lookup_table(path, time) >>> isinstance(table['time'][0], datetime.datetime) True >>> when = datetime.datetime(2020, 1, 1) >>> values = lookup(table, when) >>> len(values.shape) == 1 True >>> all(isinstance(value, float) for value in values) True >>> # Remove the input file when it is no longer needed. >>> import os >>> os.remove(path)
- pypfilt.io.lookup_values_count(lookup_table)#
Return the number of value columns in a lookup table.
- pypfilt.io.lookup_times(lookup_table)#
Return the times for which the lookup table contains values.
- pypfilt.io.lookup(lookup_table, when)#
Return the values associated with a specific time.
Summary tables#
The pypfilt.io
module provides convenience functions for preserving time values when saving and loading summary tables.
Note
When implementing a summary table, simply ensure that the field_types()
method uses pypfilt.io.time_field()
to identify each field that will contain time values, and pypfilt.io.string_field()
to identify each field that will contain string values.
You can then use load_summary_table()
and pypfilt.io.load_dataset()
to retrieve saved tables and ensure that all time values are in the expected format.
- pypfilt.io.load_summary_table(time_scale, data_file, dataset_path, subset=())#
Load a summary table from a HDF5 dataset, converting stored types into native types as necessary.
Note
If you are loading multiple tables from the same data file, you may want to open the data file yourself and use
load_dataset()
to load each table.- Parameters:
time_scale – The time scale that was used to encode time values, or a scenario instance or simulation context with a valid time scale.
data_file – The path to a HDF5 data file.
dataset_path – The path to the HDF5 dataset.
subset – A slicing specification used to load a subset of the data. By default, the entire dataset is loaded. Accepted values include indices, slices, and field names. See the h5py documentation for details.
- Examples:
import pypfilt from pypfilt.io import load_summary_table data_file = 'output.hdf5' dataset_path = '/path/to/my/dataset' time_scale = pypfilt.Datetime() # Load the entire dataset. table = load_summary_table(time_scale, data_file, dataset_path) # Load every 10th row. subset = slice(None, None, 10) table = load_summary_table(time_scale, data_file, dataset_path, subset)
- pypfilt.io.load_dataset(time_scale, dataset, subset=())#
Load a structured array from a HDF5 dataset, converting stored types into native types as necessary.
- Parameters:
time_scale – The time scale that was used to encode time values, or a scenario instance or simulation context with a valid time scale.
dataset – The HDF5 dataset.
subset – A slicing specification used to load a subset of the data. By default, the entire dataset is loaded. Accepted values include indices, slices, and field names. See the h5py documentation for details.
- Examples:
import h5py import pypfilt data_file = 'output.hdf5' dataset_path = '/path/to/my/dataset' time_scale = pypfilt.Datetime() # Load the entire dataset. with h5py.File(data_file, 'r') as f: dataset = f[dataset_path] table = pypfilt.io.load_dataset(time_scale, dataset) # Load every 10th row. subset = slice(None, None, 10) with h5py.File(data_file, 'r') as f: dataset = f[dataset_path] table = pypfilt.io.load_dataset(time_scale, dataset, subset)
The pypfilt.summary.HDF5
class ensures that all summary tables are constructed with the appropriate data type and are saved with the necessary metadata, so you should never need to use the following functions.
- pypfilt.io.fields_dtype(time_scale, fields)#
Return a NumPy data type (dtype) object that describes the provided data fields, and identifies fields that contain time values and string values.
- Parameters:
time_scale – The simulation time scale, or a simulation context.
- pypfilt.io.save_dataset(time_scale, group, name, table, **kwargs)#
Save a structured array as a HDF5 dataset, converting native types into stored types as necessary.
- Returns:
The HDF5 dataset.