user story: "I want to interpolate missing values" implies inconsisten/overcomplex narrations
I currently implement an interpolation method that will replace missing(=nan) values in the data passed.
It is thought to aim the maybe not-so uncommon case, a user wants to fill data gaps to make comparable/referable certain data at timestamps, where the data is missing, but its interpolation may not be too speculative.
The basic idea is, to have a test functions module akin to the harmonization module - but with the difference that there will be no backtracking possibillity of flags / back calculation of data and thus no generation of heaps. (also other dataprocessing funcs, like upsampling/downsampling/general aggregation would go here.)
The problem is, that this causes some complexity on the level of flags, that should be untangled by policy decissions:
- If the data was flagged previously to interpolation by other tests - the flagged values will be passed as
nans
to the interpolation method and thus may get interpolated. I am not sure if that is a desired behavior.
- An alternative would be to exclude the interpolation method from the "get passed BADS as nans" rule.
- On the other hand, the already flagged-BAD values will also be passed to other tests as
nans
after interpolation - so their interpolation makes no difference
- But this would imply, retaining the flags, as they arrive in the interpolation method by default. Considering the common case of a user that might have flagged missing values already - or the case that the missing values already come along with BAD flags, the interpolated values wont get passed to any further tests, if their flags are not reset to "unflagged" by default or switch.
- So it will be inevitable to at least give the possibillity of "unflagging" interpolated values.
- but this may cause some confusion - since values that already have been flagged get overridden by interpolated ones, and the information, this timestamp did hold an invalid value originally gets lost - so it can not be written back into the database/source after the saqc run, without some memorizing workaround.