6.8. pypfilt.io¶
The pypfilt.io
module provides functions for reading tabular data from
data files.
6.8.1. Reading tabular data¶
Use read_fields()
to read tabular data (such as observations) from plain-text files.
-
pypfilt.io.
read_fields
(time_scale, path, fields, comment='#', encoding='utf-8')¶ Read data from a space-delimited text file with column headers defined in the first non-comment line.
This is wrapper for
read_table()
that ensures time columns are identifiable.Note
Use
time_field()
to identify columns that contain time values. See the example below.Warning
This does not handle string columns. See
read_table()
for a potential solution.Parameters: - time_scale – The simulation time scale.
- path – The path to the data file.
- fields – The columns to read from the data file, represented as a
sequence of
(name, type)
tuples, wheretype
must be a NumPy scalar. - comment – The characters, or list of characters, that indicate the start of a single-line comment.
- encoding – The name of the encoding used to decode the file content.
Raises: ValueError – if
columns
contains a string column.Example: The following function reads a time series of floating-point values.
import numpy as np import pypfilt.io def load_time_series(self, filename, time_scale): fields = [pypfilt.io.time_field('date'), ('value', np.float_)] return pypfilt.io.read_fields(time_scale, filename, fields)
-
pypfilt.io.
string_field
(name)¶ Return a
(name, type)
tuple that identifies a field as containing string values.Use this function to define summary table fields that contain string values.
Examples: >>> import numpy as np >>> from pypfilt.io import string_field >>> fields = [string_field('parameter_name'), ('value', np.float_)]
-
pypfilt.io.
time_field
(name)¶ Return a
(name, type)
tuple that identifies a field as containing time values.Use this function to define summary table fields that contain time values.
Examples: >>> import numpy as np >>> from pypfilt.io import time_field >>> fields = [time_field('time'), ('value', np.float_)]
-
pypfilt.io.
read_table
(path, columns, comment='#', encoding='utf-8')¶ Read data from a space-delimited text file with column headers defined in the first non-comment line.
Warning
This does not handle string columns. To load tabular data that includes string values, you should use numpy.genfromtxt and then change the array data type:
import numpy as np import pypfilt from pypfilt.io import time_field, string_field, fields_dtype # Load data from a text file. filename = 'input.ssv' table = np.genfromtxt(filename, dtype=None) # Define the table fields. fields = [time_field('time'), string_field('location')] time_scale = pypfilt.Datetime() dtype = fields_dtype(time_scale, fields) # Change the array data type. table = table.asdtype(dtype)
Parameters: - path – The path to the data file.
- columns – The columns to read from the data file, represented as a
sequence of
(name, type)
tuples wheretype
must be a NumPy scalar, or(name, type, converter)
tuples whereconverter
is a function that converts the column string into the desired value. - comment – The characters, or list of characters, that indicate the start of a single-line comment.
- encoding – The name of the encoding used to decode the file content.
Raises: ValueError – if
columns
contains a string column.Examples: >>> from pypfilt.io import date_column, read_table >>> import numpy as np >>> import datetime >>> path = "input_data.ssv" >>> with open(path, 'w') as f: ... _ = f.write('date value\n') ... _ = f.write('2020-01-01 1\n') ... _ = f.write('2020-01-02 3\n') ... _ = f.write('2020-01-03 5\n') >>> columns = [date_column('date'), ('value', np.int_)] >>> data = read_table(path, columns) >>> isinstance(data['date'][0], datetime.datetime) True >>> observations = [{'date': row['date'], 'value': row['value']} ... for row in data]
-
pypfilt.io.
date_column
(name, fmt='%Y-%m-%d')¶ Return a
(name, type, converter)
tuple that can be used withread_table()
to convert a column intodatetime.datetime
values.Note
Where dates are used for observation times, they should be represented as
datetime.datetime
values, not asdatetime.date
values. This is why this function returns a converter that returnsdatetime.datetime
values.Parameters: - name (str) – The column name in the data file.
- fmt (str) – The date format used to parse the column values.
-
pypfilt.io.
datetime_column
(name, fmt='%Y-%m-%dT%H:%M:%S')¶ Return a
(name, type, converter)
tuple that can be used withread_table()
to convert a column intodatetime.datetime
values.Parameters: - name (str) – The column name in the data file.
- fmt (str) – The datetime format used to parse the column values.
6.8.2. Lookup tables¶
The pypfilt.io
module also provides lookup tables, which are used to
retrieve time-indexed values (e.g., time-varying model inputs).
-
pypfilt.io.
read_lookup_table
(path, time, dtype='f8', comment='#', encoding='utf-8')¶ Read time-indexed data from a space-delimited text file with column headers defined in the first non-comment line.
Parameters: - path – The path to the data file.
- time (pypfilt.time.Time) – The time scale.
- dtype – The type of the lookup values.
- comment – The characters, or list of characters, that indicate the start of a single-line comment.
- encoding – The name of the encoding used to decode the file content.
Examples: >>> from pypfilt.io import read_lookup_table, lookup >>> from pypfilt.time import Datetime >>> import datetime >>> path = "input_data.ssv" >>> with open(path, 'w') as f: ... _ = f.write('date value1 value2 value3\n') ... _ = f.write('2020-01-01 1.0 1.5 2.0\n') >>> time = Datetime() >>> table = read_lookup_table(path, time) >>> isinstance(table['date'][0], datetime.datetime) True >>> when = datetime.datetime(2020, 1, 1) >>> values = lookup(table, when) >>> len(values.shape) == 1 True >>> all(isinstance(value, float) for value in values) True
-
pypfilt.io.
lookup_values_count
(lookup_table)¶ Return the number of value columns in a lookup table.
-
pypfilt.io.
lookup
(lookup_table, when)¶ Return the values associated with a specific time.
6.8.3. Summary tables¶
The pypfilt.io
module provides convenience functions for preserving time values when saving and loading summary tables.
Note
When implementing a summary table, simply ensure that the field_types()
method uses pypfilt.io.time_field()
to identify each field that will contain time values, and pypfilt.io.string_field()
to identify each field that will contain string values.
You can then use pypfilt.io.load_dataset()
to retrieve saved tables and ensure that all time values are in the expected format.
-
pypfilt.io.
load_dataset
(time_scale, dataset)¶ Load a structured array from a HDF5 dataset, converting stored types into native types as necessary.
Examples: import h5py import pypfilt dataset_path = '/path/to/my/dataset' time_scale = pypfilt.Datetime() with h5py.File('output.hdf5', 'r') as f: dataset = f[dataset_path] table = pypfilt.io.load_dataset(time_scale, dataset)
The pypfilt.summary.HDF5
class ensures that all summary tables are constructed with the appropriate data type and are saved with the necessary metadata, so you should never need to use the following functions.
-
pypfilt.io.
fields_dtype
(time_scale, fields)¶ Return a NumPy data type (dtype) object that describes the provided data fields, and identifies fields that contain time values and string values.
Parameters: time_scale – The simulation time scale, or a simulation context.
-
pypfilt.io.
save_dataset
(time_scale, group, name, table, **kwargs)¶ Save a structured array as a HDF5 dataset, converting native types into stored types as necessary.
Returns: The HDF5 dataset.