masking flagged data fails implicitly
masking flagged data with nans, fails implicitly if test changes data points.
example:
>>> data = pd.DataFrame(dict(a=range(6), b=range(6)), index=[1.1,2.2,3.3,4.4,5.5,6.6])
>>> data
a b
1.1 0 0
2.2 1 1
3.3 2 2
4.4 3 3
5.5 4 4
6.6 5 5
>>> mask = data > 2
>>> mask
a b
1.1 False False
2.2 False False
3.3 False False
4.4 True True
5.5 True True
6.6 True True
we keep track of all flagged data by creating a mask. this mask virtually act as already-flagged here.
Next a test is run, that alter the index of the data, anything that add/remove/shift indices. For example a harmonization or an interpolation...
>>> pseudo_harmo = lambda idx: idx//1
>>> newdata = data.copy()
>>> newdata.index = pseudo_harmo(data.index)
>>> newdata
a b
1.0 0 0
2.0 1 1
3.0 2 2
4.0 3 3
5.0 4 4
6.0 5 5
now we are done and we want to re-apply the old data... with newdata[mask] = data[mask]
but we shouldnt because of this:
>>> newdata[mask]
a b
1.0 NaN NaN
2.0 NaN NaN
3.0 NaN NaN
4.0 NaN NaN
5.0 NaN NaN
6.0 NaN NaN
B A M M
This bug exists since quite a long time, i guess, but never come to the surface, because it does not trigger any exception. Also normally only a few data-points are effected, not like in the example...
This is not fixed by dios and the very same example above, could be written with dios.
So in the end we need a different approach to keep track of the flagged-information..
b.