Integration of harmonization results
At the moment we don't integrate the harmonized data back. That means, that an shifted/aggregated/interpolated dataset is actually lost before we invoke the next test function. Schemes like 1. harmonize a timeseries and 2. pass the harmonized timeseries to, for example, a spike test therefore don't work. But I think they definitively should.
As the harmonize
-function correctly returns an altered dataset and the corresponding flagger
, the problem is how to integrate these result back. Several reasons come to mind:
- Simply join the data back. That would make it necessary to rename the harmonized variable and would potentially blow up the dataset. We here need to find a way to force the user to assign the harmonization result to a new variable within its configuration
- Sort of merge the datasets like we currently do with the flags. Old and harmonized values will end up in the same
DataFrame
column and the latter overwrite the former wherever they share the same keys. This approach seems to be broken however: After the harmonization, the timeseries will hold all unharmonized and the harmonized values and still does not have a real inferrable frequency, which some tests expect. - Store the harmonization results within another data structure (e.g. a
dict
). While this is first easy and allows to use the same variables names for the harmonized and the original datasets, we run into consistency challanges, when the harmonization result is referenced later on: Which dataset to use (harmonized vs unharmonized)? What to do when the harmonized dataset does not cover the entire test period (pad it with unharmonized data or not)?