masking flagged data fails implicitly

masking flagged data with nans, fails implicitly if test changes data points.

example:

>>> data = pd.DataFrame(dict(a=range(6), b=range(6)), index=[1.1,2.2,3.3,4.4,5.5,6.6])
>>> data
     a  b
1.1  0  0
2.2  1  1
3.3  2  2
4.4  3  3
5.5  4  4
6.6  5  5

>>> mask = data > 2
>>> mask
         a      b
1.1  False  False
2.2  False  False
3.3  False  False
4.4   True   True
5.5   True   True
6.6   True   True

we keep track of all flagged data by creating a mask. this mask virtually act as already-flagged here.

Next a test is run, that alter the index of the data, anything that add/remove/shift indices. For example a harmonization or an interpolation...

>>> pseudo_harmo = lambda idx: idx//1
>>> newdata = data.copy()
>>> newdata.index = pseudo_harmo(data.index)
>>> newdata
     a  b
1.0  0  0
2.0  1  1
3.0  2  2
4.0  3  3
5.0  4  4
6.0  5  5

now we are done and we want to re-apply the old data... with newdata[mask] = data[mask] but we shouldnt because of this:

>>> newdata[mask]
      a   b
1.0 NaN NaN
2.0 NaN NaN
3.0 NaN NaN
4.0 NaN NaN
5.0 NaN NaN
6.0 NaN NaN

B A M M 🔥... Now we are fucked.. because most proabably we want to keep the mask informations...

This bug exists since quite a long time, i guess, but never come to the surface, because it does not trigger any exception. Also normally only a few data-points are effected, not like in the example...

This is not fixed by dios and the very same example above, could be written with dios.

So in the end we need a different approach to keep track of the flagged-information..

b.