6.8. pypfilt.io

The pypfilt.io module provides functions for reading tabular data from data files.

6.8.1. Reading tabular data

Use read_fields() to read tabular data (such as observations) from plain-text files.

pypfilt.io.read_fields(time_scale, path, fields, comment='#', encoding='utf-8')

Read data from a space-delimited text file with column headers defined in the first non-comment line.

This is wrapper for read_table() that ensures time columns are identifiable.

Note

Use time_field() to identify columns that contain time values. See the example below.

Warning

This does not handle string columns. See read_table() for a potential solution.

Parameters:
  • time_scale – The simulation time scale.
  • path – The path to the data file.
  • fields – The columns to read from the data file, represented as a sequence of (name, type) tuples, where type must be a NumPy scalar.
  • comment – The characters, or list of characters, that indicate the start of a single-line comment.
  • encoding – The name of the encoding used to decode the file content.
Raises:

ValueError – if columns contains a string column.

Example:

The following function reads a time series of floating-point values.

import numpy as np
import pypfilt.io

def load_time_series(self, filename, time_scale):
    fields = [pypfilt.io.time_field('date'), ('value', np.float_)]
    return pypfilt.io.read_fields(time_scale, filename, fields)
pypfilt.io.string_field(name)

Return a (name, type) tuple that identifies a field as containing string values.

Use this function to define summary table fields that contain string values.

Examples:
>>> import numpy as np
>>> from pypfilt.io import string_field
>>> fields = [string_field('parameter_name'), ('value', np.float_)]
pypfilt.io.time_field(name)

Return a (name, type) tuple that identifies a field as containing time values.

Use this function to define summary table fields that contain time values.

Examples:
>>> import numpy as np
>>> from pypfilt.io import time_field
>>> fields = [time_field('time'), ('value', np.float_)]
pypfilt.io.read_table(path, columns, comment='#', encoding='utf-8')

Read data from a space-delimited text file with column headers defined in the first non-comment line.

Warning

This does not handle string columns. To load tabular data that includes string values, you should use numpy.genfromtxt and then change the array data type:

import numpy as np
import pypfilt
from pypfilt.io import time_field, string_field, fields_dtype

# Load data from a text file.
filename = 'input.ssv'
table = np.genfromtxt(filename, dtype=None)

# Define the table fields.
fields = [time_field('time'), string_field('location')]
time_scale = pypfilt.Datetime()
dtype = fields_dtype(time_scale, fields)

# Change the array data type.
table = table.asdtype(dtype)
Parameters:
  • path – The path to the data file.
  • columns – The columns to read from the data file, represented as a sequence of (name, type) tuples where type must be a NumPy scalar, or (name, type, converter) tuples where converter is a function that converts the column string into the desired value.
  • comment – The characters, or list of characters, that indicate the start of a single-line comment.
  • encoding – The name of the encoding used to decode the file content.
Raises:

ValueError – if columns contains a string column.

Examples:
>>> from pypfilt.io import date_column, read_table
>>> import numpy as np
>>> import datetime
>>> path = "input_data.ssv"
>>> with open(path, 'w') as f:
...    _ = f.write('date value\n')
...    _ = f.write('2020-01-01 1\n')
...    _ = f.write('2020-01-02 3\n')
...    _ = f.write('2020-01-03 5\n')
>>> columns = [date_column('date'), ('value', np.int_)]
>>> data = read_table(path, columns)
>>> isinstance(data['date'][0], datetime.datetime)
True
>>> observations = [{'date': row['date'], 'value': row['value']}
...                 for row in data]
pypfilt.io.date_column(name, fmt='%Y-%m-%d')

Return a (name, type, converter) tuple that can be used with read_table() to convert a column into datetime.datetime values.

Note

Where dates are used for observation times, they should be represented as datetime.datetime values, not as datetime.date values. This is why this function returns a converter that returns datetime.datetime values.

Parameters:
  • name (str) – The column name in the data file.
  • fmt (str) – The date format used to parse the column values.
pypfilt.io.datetime_column(name, fmt='%Y-%m-%dT%H:%M:%S')

Return a (name, type, converter) tuple that can be used with read_table() to convert a column into datetime.datetime values.

Parameters:
  • name (str) – The column name in the data file.
  • fmt (str) – The datetime format used to parse the column values.

6.8.2. Lookup tables

The pypfilt.io module also provides lookup tables, which are used to retrieve time-indexed values (e.g., time-varying model inputs).

pypfilt.io.read_lookup_table(path, time, dtype='f8', comment='#', encoding='utf-8')

Read time-indexed data from a space-delimited text file with column headers defined in the first non-comment line.

Parameters:
  • path – The path to the data file.
  • time (pypfilt.time.Time) – The time scale.
  • dtype – The type of the lookup values.
  • comment – The characters, or list of characters, that indicate the start of a single-line comment.
  • encoding – The name of the encoding used to decode the file content.
Examples:
>>> from pypfilt.io import read_lookup_table, lookup
>>> from pypfilt.time import Datetime
>>> import datetime
>>> path = "input_data.ssv"
>>> with open(path, 'w') as f:
...    _ = f.write('date value1 value2 value3\n')
...    _ = f.write('2020-01-01 1.0 1.5 2.0\n')
>>> time = Datetime()
>>> table = read_lookup_table(path, time)
>>> isinstance(table['date'][0], datetime.datetime)
True
>>> when = datetime.datetime(2020, 1, 1)
>>> values = lookup(table, when)
>>> len(values.shape) == 1
True
>>> all(isinstance(value, float) for value in values)
True
pypfilt.io.lookup_values_count(lookup_table)

Return the number of value columns in a lookup table.

pypfilt.io.lookup(lookup_table, when)

Return the values associated with a specific time.

6.8.3. Summary tables

The pypfilt.io module provides convenience functions for preserving time values when saving and loading summary tables.

Note

When implementing a summary table, simply ensure that the field_types() method uses pypfilt.io.time_field() to identify each field that will contain time values, and pypfilt.io.string_field() to identify each field that will contain string values.

You can then use pypfilt.io.load_dataset() to retrieve saved tables and ensure that all time values are in the expected format.

pypfilt.io.load_dataset(time_scale, dataset)

Load a structured array from a HDF5 dataset, converting stored types into native types as necessary.

Examples:
import h5py
import pypfilt

dataset_path = '/path/to/my/dataset'
time_scale = pypfilt.Datetime()

with h5py.File('output.hdf5', 'r') as f:
    dataset = f[dataset_path]
    table = pypfilt.io.load_dataset(time_scale, dataset)

The pypfilt.summary.HDF5 class ensures that all summary tables are constructed with the appropriate data type and are saved with the necessary metadata, so you should never need to use the following functions.

pypfilt.io.fields_dtype(time_scale, fields)

Return a NumPy data type (dtype) object that describes the provided data fields, and identifies fields that contain time values and string values.

Parameters:time_scale – The simulation time scale, or a simulation context.
pypfilt.io.save_dataset(time_scale, group, name, table, **kwargs)

Save a structured array as a HDF5 dataset, converting native types into stored types as necessary.

Returns:The HDF5 dataset.