# Time Series Harmonization

A collection of functions to harmonize time series.

## Index

- [harmonize_shift2Grid](#harmonize_shift2grid)
- [harmonize_aggregate2Grid](#harmonize_aggregate2grid)
- [harmonize_linear2Grid](#harmonize_linear2grid)
- [harmonize_interpolate2Grid](#harmonize_interpolate2grid)
- [harmonize_downsample](#harmonize_downsample)
- [harmonize](#harmonize)
- [deharmonize](#deharmonize)


## harmonize_shift2grid

```
harmonize_shift2Grid(freq, method='nshift')
```
| parameter | data type                                                     | default value | description                           |
|-----------|---------------------------------------------------------------|---------------|---------------------------------------|
| freq      | [offset string](docs/ParameterDescriptions.md#offset-strings) |               | Frequency of the target grid          |
| method    | [method string](#shift-methods)                               | `"nshift"`    | Method used to shift values and flags |


The function "harmonizes" a time series to an equidistant frequency
grid by shifting data points to multiples of `freq`.

This process includes:

1. All missing values in the data set get [flagged](docs/funcs/Miscellaneous-md#missing). 
   These values will be excluded from the shifting process.
2. Depending on the `method`, the data points and the associated
   flags will be assigned to a timestamp in the taget grid
   
NOTE:
- The data will be projected to an regular grid ranging from 
  the first to the last timestamp of the original time series
- Because of the above, the size of the harmonized time series
  is likely to differ from the size of the original series
- Data from the original time series might be dropped 
  (e.g. if there are multiple candidates for a shift, only 
  one is used), but can be restored by [deharmonize](#deharmonize)

## harmonize_aggregate2grid

```
harmonize_aggregate2Grid(freq, value_func, flag_func="max", method='nagg')
```
| parameter  | data type                                                     | default value | description                                     |
|------------|---------------------------------------------------------------|---------------|-------------------------------------------------|
| freq       | [offset string](docs/ParameterDescriptions.md#offset-strings) |               | Frequency of the target grid                    |
| value_func | [function string](#aggregation-functions)                     |               | Function used for data aggregation              |
| flag_func  | [function string](#aggregation-functions)                     | `"max"`       | Function used for flags aggregation             |
| method     | [method string](#aggregation-methods)                         | `"nagg"`      | Method used to assign values to the target grid |


The function "harmonizes" a time series to an equidistant frequency grid
by aggregating data points to multiples of `freq` using the `method`.

This process includes:

1. All missing values in the data set get [flagged](docs/funcs/Miscellaneous-md#missing). 
   These values will be excluded from the aggregation process
2. Values and flags will be aggregated by `value_func` and `flag_func` respectively
3. Depending on the `method`, the aggregation results will be assigned to a timestamp
   in the taget grid

NOTE:
- The data will be projected to an regular grid ranging from 
  the first to the last timestamp of the original time series
- Because of the above, the size of the harmonized time series
  is likely to differ from the size of the original series
- Newly introduced intervals not covering any data in the original
  dataset will be treated as missing data


## harmonize_linear2grid

```
harmonize_linear2Grid(freq, method='nagg', func="max")
```

| parameter | data type                                                                 | default value | description                                       |
|-----------|---------------------------------------------------------------------------|---------------|---------------------------------------------------|
| freq      | [offset string](docs/ParameterDescriptions.md#offset-strings)             |               | Frequency of the target grid                      |
| method    | [shift](#shift-methods)/[aggregation](#aggregation-methods) method string | `"nagg"`      | Method used to propagate flags to the target grid |
| func      | [function string](#aggregation-functions)                                 | `"max"`       | Function used for flags aggregation               |


The function "harmonizes" a time series to an equidistant frequency grid
by linear interpolation of data points to multiples of `freq`.

This process includes:

1. All missing values in the data set get [flagged](docs/funcs/Miscellaneous-md#missing). 
   These values will be excluded from the aggregation process
2. Linear interpolation. This is not a gap filling algorithm, only target grid points, 
   that are surrounded by valid data points in the original data set within a range 
   of `freq` will be calculated.
4. Depending on the `method`, the original flags get shifted
   or aggregated with `func` to the target grid


NOTE:
- Newly introduced intervals not covering any data in the original
  dataset will be treated as missing data


## harmonize_interpolate2grid

```
harmonize_interpolate2Grid(freq,
                           method, order=1,
                           flag_method='nagg', flag_func="max")
```
| parameter   | data type                                                                 | default value | description                                                             |
|-------------|---------------------------------------------------------------------------|---------------|-------------------------------------------------------------------------|
| freq        | [offset string](docs/ParameterDescriptions.md#offset-strings)             |               | Frequency of the target grid                                            |
| method      | [interpolation method string](#interpolation-methods)                     |               | Interpolation method                                                    |
| order       | integer                                                                   | `1`           | Order of the interpolation, only relevant if applicable in the `method` |
| flag_method | [shift](#shift-methods)/[aggregation](#aggregation-methods) method string | `"nagg"`      | Method used to propagate flags to the target grid                       |
| flag_func   | [function string](#aggregation-functions)                                 | `"max"`       | Function used for flags aggregation                                     |


The function "harmonizes" a time series to an equidistant frequency grid
by interpolation of data points to multiples of `freq`.

This process includes:

1. All missing values in the data set get [flagged](docs/funcs/Miscellaneous-md#missing). 
   These values will be excluded from the aggregation process
2. Interpolation with `method`. This is not a gap filling algorithm,
   only target grid points, that are surrounded by valid data points in the original
   data set within a range of `freq` will be calculated.
3. Depending on the `method`, the original flags get shifted
   or aggregated with `func` to the target grid

NOTE:
- Newly introduced intervals not covering any data in the original
  dataset will be treated as missing data
- We recommended `harmonize_shift2Grid` over the `method`s
  `nearest` and `pad`


## harmonize_downsample

```
harmonize_downsample(sample_freq, agg_freq,
                     sample_func="mean", agg_func="mean",
                     max_invalid=None)
```
| parameter   | data type                                                     | default value | description                                                                                                                    |
|-------------|---------------------------------------------------------------|---------------|--------------------------------------------------------------------------------------------------------------------------------|
| sample_freq | [offset string](docs/ParameterDescriptions.md#offset-strings) |               | Frequency of the source grid                                                                                                   |
| agg_freq    | [offset string](docs/ParameterDescriptions.md#offset-strings) |               | Frequency of the target grid                                                                                                   |
| sample_func | [function string](#aggregation-function)                      | `"mean"`      | Function used to aggregate data to `sample_freq`. If `None` the data is expected to have a frequency of `sample_freq`          |
| agg_func    | [function string](#aggregation-function)                      | `"mean"`      | Function used to aggregate data from `sample_freq` to `agg_freq`                                                               |
| max_invalid | integer                                                       | `None`        | If the number of invalid data points (missing/flagged) within an aggregation interval exceeds `max_invalid` it is set to `NAN` |

The function downsamples a time series from its `sample_freq` to the lower
sampling rate `agg_freq`, by aggregation with `agg_func`.

If a `sample_func` is given, the data will be aggragated to `sample_freq`
before downsampling.

NOTE:
- Although the function is a wrapper around `harmonize`, the deharmonization of "true"
  downsamples (`sample_freq` < `agg_freq`) is not supported yet.


## harmonize

```
harmonize(freq, inter_method, reshape_method, inter_agg="mean", inter_order=1,
          inter_downcast=False, reshape_agg="max", reshape_missing_flag=None,
          reshape_shift_comment=True, data_missing_value=np.nan)
```

| parameter             | data type                                                                                                         | default value | description                                                                                                                                                                                                                                   |
|-----------------------|-------------------------------------------------------------------------------------------------------------------|---------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| freq                  | [offset string](docs/ParameterDescriptions.md#offset-strings)                                                     |               | Frequency of the target grid                                                                                                                                                                                                                  |
| inter_method          | [shift](#shift-methods)/[aggregation](#aggregation-methods)/[interpolation](#interpolation-methods) method string |               | Method used to project values to the target grid                                                                                                                                                                                              |
| reshape_method        | [shift](#shift-methods)/[aggregation](#aggregation-methods) method string                                         |               | Method used to project flags to the target grid                                                                                                                                                                                               |
| inter_agg             | [aggregation function string](#aggregation-functions)                                                             | `"mean"`      | Function used for aggregation, if an `inter_method` is given                                                                                                                                                                                  |
| inter_order           | int                                                                                                               | `1`           | The order of interpolation applied, if an an `inter_method` is given                                                                                                                                                                          |
| inter_downcast        | boolean                                                                                                           | `False`       | `True`: Decrease interpolation order if data chunks that are too short to be interpolated with order `inter_order`. <br/> `False`: Project those data chunks to `NAN`. <br/> Option only relevant if `inter_method` supports an `inter_order` |
| reshape_agg           | [aggregation function string](#aggregation-functions)                                                             | `"max"`       | Function used for the aggregation of flags. By default (`"max"`) the worst/highest flag is assigned                                                                                                                                           |
| reshape_missing_flag  | string                                                                                                            | `None`        | Valid flag value, that will be used for empty harmonization intervals. By default (`None`) such intervals are set to `BAD`                                                                                                                    |
| reshape_shift_comment | boolean                                                                                                           | `True`        | `True`: Shifted flags will be reset, other fields associated with a flag might get lost. <br/> `False`: Shifted flags will not be reset. <br/> <br/> Only relevant for multi-column flagger and a given `inter_method`                        |
| data_missing_value    | Any                                                                                                               | `np.nan`      | The value, indicating missing data                                                                                                                                                                                                            |


The function "harmonizes" a time series to an equidistant frequency grid.
In general this includes projection and/or interpolation of the data to
timestamps, that are multiples of `freq`.

This process includes:

1. All missing values equal to `data_missing_value` in the data set
   get [flagged](docs/funcs/Miscellaneous-md#missing). 
   These values will be excluded from the aggregation process
2. Values will be calculated according to the given `inter_method`
3. Flags will be calculated according to the given `reshape_method`

NOTE:
- The data will be projected to an regular grid ranging from 
  the first to the last timestamp of the original time series
- Because of the above, the size of the harmonized time series
  is likely to differ from the size of the original series
- Newly introduced intervals not covering any data in the original
  dataset will be set to `data_missing_value` and `reshape_missing`
  respectively
- Data from the original time series might be dropped, but can
  be restored by [deharmonize](#deharmonize)
- Flags calculated on the new harmonized data set can be projected
  to the original grid by [deharmonize](#deharmonize)


## deharmonize

```
deharmonize(co_flagging)
```

| parameter   | data type | default value | description                                                                                                                                                                                                                                                                                                                                                           |
|-------------|-----------|---------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| co_flagging | boolean   |               | `False`: depending on the harmonization method applied, only overwrite ultimately preceeding, first succeeding or nearest flag to a harmonized flag. <br/> `True`: Depending on the harmonization method applied, overwrite all the values covered by the succeeding or preceeding sampling intervall, or, all the values in the range of a harmonic flags timestamp. |


After having calculated flags on an equidistant frequency grid, generated by
a call to a harmonization function, you may want to project
that new flags on to the original data index, or just restore the
original data shape. Then a call to `deharmonize` will do exactly that.

`deharmonize` will check for harmonization information for the variable it is
applied on (automatically generated by any call to a harmonization function of that variable)
and than:

1. Overwrite the harmonized data series with the original dataseries and its timestamps.
2. Project the calculated flags onto the original index, by inverting the
  flag projection method used for harmonization, meaning, that:
    * if the flags got shifted or aggregated forward, either the flag associated with the ultimatly preceeding
      original timestamp, to the harmonized flag (`co_flagging`=`False`),
      or all the flags, coverd by the harmonized flags preceeding sampling intervall (`co_flagging`=`True`)
      get overwritten with the harmonized flag - if they are "better" than this harmonized flag.
      (According to the flagging order of the current flagger.)
    * if the flags got shifted or aggregated backwards, either the flag associated with the first succeeding
      original timestamp, to the harmonized flag (`co_flagging`=`False`),
      or all the flags, coverd by the harmonized flags succeeding sampling intervall (`co_flagging`=`True`)
      get overwritten with the harmonized flag - if they are "better" than this harmonized flag.
      (According to the flagging order of the current flagger.)
    * if the flags got shifted or aggregated to the nearest harmonic index,
      either the flag associated with the flag, nearest, to the harmonized flag (`co_flagging`=`False`),
      or all the flags, covered by the harmonized flags range (`co_flagging`=`True`)
      get overwritten with the harmonized flag - if they are "better" than this harmonized flag.
      (According to the flagging order of the current flagger.)


## Paramater Descriptions

### Aggregation Functions

| keyword    | description                    |
|------------|--------------------------------|
| `"sum"`    | sum of the values              |
| `"mean"`   | arithmetioc mean of the values |
| `"min"`    | minimum value                  |
| `"max"`    | maximum value                  |
| `"median"` | median of the values           |
| `"first"`  | first value                    |
| `"last"`   | last value                     |

### Aggregation Methods

| keyword  | description                                                       |
|----------|-------------------------------------------------------------------|
| `"fagg"` | aggregation result is propagated to the next target grid point    |
| `"bagg"` | aggregation result is propagated to the last target grid point    |
| `"nagg"` | aggregation result is propagated to the closest target grid point |


### Shift Methods

| keyword    | description                                                                    |
|------------|--------------------------------------------------------------------------------|
| `"fshift"` | propagate the last valid value/flag to the grid point or fill with `BAD`/`NAN` |
| `"bshift"` | propagate the next valid value/flag to the grid point or fill with `BAD`/`NAN` |
| `"nshift"` | propagate the closest value/flag to the grid point or fill with `BAD`/`NAN`    |


### Interpolation Methods

- All the `pandas.Series` [interpolation methods](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.interpolate.html)
  are supported
- Available interpolations:
  + `"linear"`
  + `"time"`
  + `"nearest"`
  + `"zero"`
  + `"slinear"`
  + `"quadratic"`
  + `"cubic"`
  + `"spline"`
  + `"barycentric"`
  + `"polynomial"`
  + `"krogh"`
  + `"piecewise_polynomial"`
  + `"spline"`
  + `"pchip"`
  + `"akima"`