Skip to content
Snippets Groups Projects

Time Series Harmonization

A collection of functions to harmonize time series.

Index

harmonize_shift2grid

harmonize_shift2Grid(freq, method='nshift')
parameter data type default value description
freq offset string Frequency of the target grid
method method string "nshift" Method used to shift values and flags

The function "harmonizes" a time series to an equidistant frequency grid by shifting data points to multiples of freq.

This process includes:

  1. All missing values in the data set get flagged. These values will be excluded from the shifting process.
  2. Depending on the method, the data points and the associated flags will be assigned to a timestamp in the taget grid

NOTE:

  • The data will be projected to an regular grid ranging from the first to the last timestamp of the original time series
  • Because of the above, the size of the harmonized time series is likely to differ from the size of the original series
  • Data from the original time series might be dropped (e.g. if there are multiple candidates for a shift, only one is used), but can be restored by deharmonize

harmonize_aggregate2grid

harmonize_aggregate2Grid(freq, value_func, flag_func="max", method='nagg')
parameter data type default value description
freq offset string Frequency of the target grid
value_func function string Function used for data aggregation
flag_func function string "max" Function used for flags aggregation
method method string "nagg" Method used to assign values to the target grid

The function "harmonizes" a time series to an equidistant frequency grid by aggregating data points to multiples of freq using the method.

This process includes:

  1. All missing values in the data set get flagged. These values will be excluded from the aggregation process
  2. Values and flags will be aggregated by value_func and flag_func respectively
  3. Depending on the method, the aggregation results will be assigned to a timestamp in the taget grid

NOTE:

  • The data will be projected to an regular grid ranging from the first to the last timestamp of the original time series
  • Because of the above, the size of the harmonized time series is likely to differ from the size of the original series
  • Newly introduced intervals not covering any data in the original dataset will be treated as missing data

harmonize_linear2grid

harmonize_linear2Grid(freq, method='nagg', func="max")
parameter data type default value description
freq offset string Frequency of the target grid
method shift/aggregation method string "nagg" Method used to propagate flags to the target grid
func function string "max" Function used for flags aggregation

The function "harmonizes" a time series to an equidistant frequency grid by linear interpolation of data points to multiples of freq.

This process includes:

  1. All missing values in the data set get flagged. These values will be excluded from the aggregation process
  2. Linear interpolation. This is not a gap filling algorithm, only target grid points, that are surrounded by valid data points in the original data set within a range of freq will be calculated.
  3. Depending on the method, the original flags get shifted or aggregated with func to the target grid

NOTE:

  • Newly introduced intervals not covering any data in the original dataset will be treated as missing data

harmonize_interpolate2grid

harmonize_interpolate2Grid(freq,
                           method, order=1,
                           flag_method='nagg', flag_func="max")
parameter data type default value description
freq offset string Frequency of the target grid
method interpolation method string Interpolation method
order integer 1 Order of the interpolation, only relevant if applicable in the method
flag_method shift/aggregation method string "nagg" Method used to propagate flags to the target grid
flag_func function string "max" Function used for flags aggregation

The function "harmonizes" a time series to an equidistant frequency grid by interpolation of data points to multiples of freq.

This process includes:

  1. All missing values in the data set get flagged. These values will be excluded from the aggregation process
  2. Interpolation with method. This is not a gap filling algorithm, only target grid points, that are surrounded by valid data points in the original data set within a range of freq will be calculated.
  3. Depending on the method, the original flags get shifted or aggregated with func to the target grid

NOTE:

  • Newly introduced intervals not covering any data in the original dataset will be treated as missing data
  • We recommended harmonize_shift2Grid over the methods nearest and pad

harmonize_downsample

harmonize_downsample(sample_freq, agg_freq,
                     sample_func="mean", agg_func="mean",
                     max_invalid=None)
parameter data type default value description
sample_freq offset string Frequency of the source grid
agg_freq offset string Frequency of the target grid
sample_func function string "mean" Function used to aggregate data to sample_freq. If None the data is expected to have a frequency of sample_freq
agg_func function string "mean" Function used to aggregate data from sample_freq to agg_freq
max_invalid integer None If the number of invalid data points (missing/flagged) within an aggregation interval exceeds max_invalid it is set to NAN

The function downsamples a time series from its sample_freq to the lower sampling rate agg_freq, by aggregation with agg_func.

If a sample_func is given, the data will be aggragated to sample_freq before downsampling.

NOTE:

  • Although the function is a wrapper around harmonize, the deharmonization of "true" downsamples (sample_freq < agg_freq) is not supported yet.

harmonize

harmonize(freq, inter_method, reshape_method, inter_agg="mean", inter_order=1,
          inter_downcast=False, reshape_agg="max", reshape_missing_flag=None,
          reshape_shift_comment=True, data_missing_value=np.nan)
parameter data type default value description
freq offset string Frequency of the target grid
inter_method shift/aggregation/interpolation method string Method used to project values to the target grid
reshape_method shift/aggregation method string Method used to project flags to the target grid
inter_agg aggregation function string "mean" Function used for aggregation, if an inter_method is given
inter_order int 1 The order of interpolation applied, if an an inter_method is given
inter_downcast boolean False True: Decrease interpolation order if data chunks that are too short to be interpolated with order inter_order.
False: Project those data chunks to NAN.
Option only relevant if inter_method supports an inter_order
reshape_agg aggregation function string "max" Function used for the aggregation of flags. By default ("max") the worst/highest flag is assigned
reshape_missing_flag string None Valid flag value, that will be used for empty harmonization intervals. By default (None) such intervals are set to BAD
reshape_shift_comment boolean True True: Shifted flags will be reset, other fields associated with a flag might get lost.
False: Shifted flags will not be reset.

Only relevant for multi-column flagger and a given inter_method
data_missing_value Any np.nan The value, indicating missing data

The function "harmonizes" a time series to an equidistant frequency grid. In general this includes projection and/or interpolation of the data to timestamps, that are multiples of freq.

This process includes:

  1. All missing values equal to data_missing_value in the data set get flagged. These values will be excluded from the aggregation process
  2. Values will be calculated according to the given inter_method
  3. Flags will be calculated according to the given reshape_method

NOTE:

  • The data will be projected to an regular grid ranging from the first to the last timestamp of the original time series
  • Because of the above, the size of the harmonized time series is likely to differ from the size of the original series
  • Newly introduced intervals not covering any data in the original dataset will be set to data_missing_value and reshape_missing respectively
  • Data from the original time series might be dropped, but can be restored by deharmonize
  • Flags calculated on the new harmonized data set can be projected to the original grid by deharmonize

deharmonize

deharmonize(co_flagging)
parameter data type default value description
co_flagging boolean False: depending on the harmonization method applied, only overwrite ultimately preceeding, first succeeding or nearest flag to a harmonized flag.
True: Depending on the harmonization method applied, overwrite all the values covered by the succeeding or preceeding sampling intervall, or, all the values in the range of a harmonic flags timestamp.

After having calculated flags on an equidistant frequency grid, generated by a call to a harmonization function, you may want to project that new flags on to the original data index, or just restore the original data shape. Then a call to deharmonize will do exactly that.

deharmonize will check for harmonization information for the variable it is applied on (automatically generated by any call to a harmonization function of that variable) and than:

  1. Overwrite the harmonized data series with the original dataseries and its timestamps.
  2. Project the calculated flags onto the original index, by inverting the flag projection method used for harmonization, meaning, that:
    • if the flags got shifted or aggregated forward, either the flag associated with the ultimatly preceeding original timestamp, to the harmonized flag (co_flagging=False), or all the flags, coverd by the harmonized flags preceeding sampling intervall (co_flagging=True) get overwritten with the harmonized flag - if they are "better" than this harmonized flag. (According to the flagging order of the current flagger.)
    • if the flags got shifted or aggregated backwards, either the flag associated with the first succeeding original timestamp, to the harmonized flag (co_flagging=False), or all the flags, coverd by the harmonized flags succeeding sampling intervall (co_flagging=True) get overwritten with the harmonized flag - if they are "better" than this harmonized flag. (According to the flagging order of the current flagger.)
    • if the flags got shifted or aggregated to the nearest harmonic index, either the flag associated with the flag, nearest, to the harmonized flag (co_flagging=False), or all the flags, covered by the harmonized flags range (co_flagging=True) get overwritten with the harmonized flag - if they are "better" than this harmonized flag. (According to the flagging order of the current flagger.)

Paramater Descriptions

Aggregation Functions

keyword description
"sum" sum of the values
"mean" arithmetioc mean of the values
"min" minimum value
"max" maximum value
"median" median of the values
"first" first value
"last" last value

Aggregation Methods

keyword description
"fagg" aggregation result is propagated to the next target grid point
"bagg" aggregation result is propagated to the last target grid point
"nagg" aggregation result is propagated to the closest target grid point

Shift Methods

keyword description
"fshift" propagate the last valid value/flag to the grid point or fill with BAD/NAN
"bshift" propagate the next valid value/flag to the grid point or fill with BAD/NAN
"nshift" propagate the closest value/flag to the grid point or fill with BAD/NAN

Interpolation Methods

  • All the pandas.Series interpolation methods are supported
  • Available interpolations:
    • "linear"
    • "time"
    • "nearest"
    • "zero"
    • "slinear"
    • "quadratic"
    • "cubic"
    • "spline"
    • "barycentric"
    • "polynomial"
    • "krogh"
    • "piecewise_polynomial"
    • "spline"
    • "pchip"
    • "akima"