-
David Schäfer authored82e0559e
Time Series Harmonization
A collection of functions to harmonize time series.
Index
- harmonize_shift2Grid
- harmonize_aggregate2Grid
- harmonize_linear2Grid
- harmonize_interpolate2Grid
- harmonize_downsample
- harmonize
- deharmonize
harmonize_shift2grid
harmonize_shift2Grid(freq, method='nshift')
parameter | data type | default value | description |
---|---|---|---|
freq | offset string | Frequency of the target grid | |
method | method string | "nshift" |
Method used to shift values and flags |
The function "harmonizes" a time series to an equidistant frequency
grid by shifting data points to multiples of freq
.
This process includes:
- All missing values in the data set get flagged. These values will be excluded from the shifting process.
- Depending on the
method
, the data points and the associated flags will be assigned to a timestamp in the target grid
NOTE:
- The data will be projected to an regular grid ranging from the first to the last timestamp of the original time series
- Because of the above, the size of the harmonized time series is likely to differ from the size of the original series
- Data from the original time series might be dropped (e.g. if there are multiple candidates for a shift, only one is used), but can be restored by deharmonize
harmonize_aggregate2grid
harmonize_aggregate2Grid(freq, value_func, flag_func="max", method='nagg')
parameter | data type | default value | description |
---|---|---|---|
freq | offset string | Frequency of the target grid | |
value_func | function string | Function used for data aggregation | |
flag_func | function string | "max" |
Function used for flags aggregation |
method | method string | "nagg" |
Method used to assign values to the target grid |
The function "harmonizes" a time series to an equidistant frequency grid
by aggregating data points to multiples of freq
using the method
.
This process includes:
- All missing values in the data set get flagged. These values will be excluded from the aggregation process
- Values and flags will be aggregated by
value_func
andflag_func
respectively - Depending on the
method
, the aggregation results will be assigned to a timestamp in the target grid
NOTE:
- The data will be projected to an regular grid ranging from the first to the last timestamp of the original time series
- Because of the above, the size of the harmonized time series is likely to differ from the size of the original series
- Newly introduced intervals not covering any data in the original dataset will be treated as missing data
harmonize_linear2grid
harmonize_linear2Grid(freq, method='nagg', func="max")
parameter | data type | default value | description |
---|---|---|---|
freq | offset string | Frequency of the target grid | |
method | shift/aggregation method string | "nagg" |
Method used to propagate flags to the target grid |
func | function string | "max" |
Function used for flags aggregation |
The function "harmonizes" a time series to an equidistant frequency grid
by linear interpolation of data points to multiples of freq
.
This process includes:
- All missing values in the data set get flagged. These values will be excluded from the aggregation process
- Linear interpolation. This is not a gap filling algorithm, only target grid points,
that are surrounded by valid data points in the original data set within a range
of
freq
will be calculated. - Depending on the
method
, the original flags get shifted or aggregated withfunc
to the target grid
NOTE:
- Newly introduced intervals not covering any data in the original dataset will be treated as missing data
harmonize_interpolate2grid
harmonize_interpolate2Grid(freq,
method, order=1,
flag_method='nagg', flag_func="max")
parameter | data type | default value | description |
---|---|---|---|
freq | offset string | Frequency of the target grid | |
method | interpolation method string | Interpolation method | |
order | integer | 1 |
Order of the interpolation, only relevant if applicable in the method
|
flag_method | shift/aggregation method string | "nagg" |
Method used to propagate flags to the target grid |
flag_func | function string | "max" |
Function used for flags aggregation |
The function "harmonizes" a time series to an equidistant frequency grid
by interpolation of data points to multiples of freq
.
This process includes:
- All missing values in the data set get flagged. These values will be excluded from the aggregation process
- Interpolation with
method
. This is not a gap filling algorithm, only target grid points, that are surrounded by valid data points in the original data set within a range offreq
will be calculated. - Depending on the
method
, the original flags get shifted or aggregated withfunc
to the target grid
NOTE:
- Newly introduced intervals not covering any data in the original dataset will be treated as missing data
- We recommended
harmonize_shift2Grid
over themethod
snearest
andpad
harmonize_downsample
harmonize_downsample(sample_freq, agg_freq,
sample_func="mean", agg_func="mean",
max_invalid=None)
parameter | data type | default value | description |
---|---|---|---|
sample_freq | offset string | Frequency of the source grid | |
agg_freq | offset string | Frequency of the target grid | |
sample_func | function string | "mean" |
Function used to aggregate data to sample_freq . If None the data is expected to have a frequency of sample_freq
|
agg_func | function string | "mean" |
Function used to aggregate data from sample_freq to agg_freq
|
max_invalid | integer | None |
If the number of invalid data points (missing/flagged) within an aggregation interval exceeds max_invalid it is set to NAN
|
The function downsamples a time series from its sample_freq
to the lower
sampling rate agg_freq
, by aggregation with agg_func
.
If a sample_func
is given, the data will be aggregated to sample_freq
before downsampling.
NOTE:
- Although the function is a wrapper around
harmonize
, the deharmonization of "true" downsamples (sample_freq
<agg_freq
) is not supported yet.
harmonize
harmonize(freq, inter_method, reshape_method, inter_agg="mean", inter_order=1,
inter_downcast=False, reshape_agg="max", reshape_missing_flag=None,
reshape_shift_comment=True, data_missing_value=np.nan)
parameter | data type | default value | description |
---|---|---|---|
freq | offset string | Frequency of the target grid | |
inter_method | shift/aggregation/interpolation method string | Method used to project values to the target grid | |
reshape_method | shift/aggregation method string | Method used to project flags to the target grid | |
inter_agg | aggregation function string | "mean" |
Function used for aggregation, if an inter_method is given |
inter_order | int | 1 |
The order of interpolation applied, if an inter_method is given |
inter_downcast | bool | False |
True : Decrease interpolation order if data chunks that are too short to be interpolated with order inter_order . False : Project those data chunks to NAN . Option only relevant if inter_method supports an inter_order
|
reshape_agg | aggregation function string | "max" |
Function used for the aggregation of flags. By default ("max" ) the worst/highest flag is assigned |
reshape_missing_flag | string | None |
Valid flag value, that will be used for empty harmonization intervals. By default (None ) such intervals are set to BAD
|
reshape_shift_comment | bool | True |
True : Shifted flags will be reset, other fields associated with a flag might get lost. False : Shifted flags will not be reset. Only relevant for multi-column flagger and a given inter_method
|
data_missing_value | Any | np.nan |
The value, indicating missing data |
The function "harmonizes" a time series to an equidistant frequency grid.
In general this includes projection and/or interpolation of the data to
timestamps, that are multiples of freq
.
This process includes:
- All missing values equal to
data_missing_value
in the data set get flagged. These values will be excluded from the aggregation process - Values will be calculated according to the given
inter_method
- Flags will be calculated according to the given
reshape_method
NOTE:
- The data will be projected to an regular grid ranging from the first to the last timestamp of the original time series
- Because of the above, the size of the harmonized time series is likely to differ from the size of the original series
- Newly introduced intervals not covering any data in the original
dataset will be set to
data_missing_value
andreshape_missing
respectively - Data from the original time series might be dropped, but can be restored by deharmonize
- Flags calculated on the new harmonized data set can be projected to the original grid by deharmonize
deharmonize
deharmonize(co_flagging)
parameter | data type | default value | description |
---|---|---|---|
co_flagging | boolean | Control the bahviour of the flag reprojection, see description |
This functions projects harmonized datasets back to there original time stamps and thereby restores the original data shape.
A combination of calls to one of the harmonize*
functions and deharmonize
,
allows to leverage information from data sets with differing timestamps/frequencies
and bring the generated information back to the original dataset.
deharmonize
will implicitly revert the methods and functions applied during
harmonization. I.e.:
-
The harmonized time series will be dropped in favor of the original one
-
Flags are projected to the original time stamps if the are 'worse'/higher than the original. The direction of this projection is invert to the shift/aggregation direction in
harmonize*
, i.e. a forward shift inharmonize*
will result in a backward shift indeharmonize
and vice versa. -
The projection behavior is controlled by the value of
co_flagging
:-
False
: Project a flag from the harmonized time series to a single flag in the deharmonized data set -
True
: Project a flag in the harmonized time series to all flags in the respective projection interval.
Let's say during harmonization a dataset was aggregated to a lower frequency (e.g. a time series with a frequency of 10 minutes was resampled to one with a frequency of 1 hour) and needs to be deharmonized. If
co_flagging
isTrue
, the flags from the harmonized dataset will be projected to all the six values within the aggregation period, ifco_flagging
is False, only the next/last/nearest value in the deharmonized dataset will inherit the flag from the harmonized time series. -
Parameter Descriptions
Aggregation Functions
keyword | description |
---|---|
"sum" |
sum of the values |
"mean" |
arithmetic mean of the values |
"min" |
minimum value |
"max" |
maximum value |
"median" |
median of the values |
"first" |
first value |
"last" |
last value |
Aggregation Methods
keyword | description |
---|---|
"fagg" |
aggregation result is propagated to the next target grid point |
"bagg" |
aggregation result is propagated to the last target grid point |
"nagg" |
aggregation result is propagated to the closest target grid point |
Shift Methods
keyword | description |
---|---|
"fshift" |
propagate the last valid value/flag to the grid point or fill with BAD /NAN
|
"bshift" |
propagate the next valid value/flag to the grid point or fill with BAD /NAN
|
"nshift" |
propagate the closest value/flag to the grid point or fill with BAD /NAN
|
Interpolation Methods
- All the
pandas.Series
interpolation methods are supported - Available interpolations:
"linear"
"time"
"nearest"
"zero"
"slinear"
"quadratic"
"cubic"
"spline"
"barycentric"
"polynomial"
"krogh"
"piecewise_polynomial"
"spline"
"pchip"
"akima"