masking="none" functions forcefully overwrite existing flags

The machinery handling functions annotated with masking="none" has some the mentioned undesirable side effect. I try to explain the issues with a few code examples, using the PositionalTranslator for demonstration purposes.


Let's start with a short PositionalTranslator recap:

The translator generates flags, starting with the digit 9 followed by as many digits as test where run on a given field, providing the values 2: BAD, 1: DOUBTFUL and 0: GOOD | UNFLAGGED. Masking is set to DOUBTFUL + 1, i.e. all values with flags larger then DOUBTFUL will be masked between successive function calls. A flag like 91020 therefore indicates, that we called four functions, the first setting the flag DOUBTFUL, the third the flag BAD and the function calls 2 and 3 did not set a specific flag (or GOOD for that matter).


Now consider the following test functions:

@flagging(masking="field")
def flagAll(data, field, flags, *args, **kwargs):
    flags[:, field] = BAD
    return data, flags
@flagging(masking="field")
def flagNone(data, field, flags, *args, **kwargs):
    flags[np.zeros_like(data[field], dtype=bool), field] = BAD  #  all-False mask, no flags will be set
    return data, flags

The baseline

Both functions are annotated with masking="field". The follwing code:

data = pd.DataFrame({"x": np.arange(2)})
saqc = (
    SaQC(data, scheme=PositionalTranslator())
    .flagAll(field="x")
    .flagNone(field="x")
)

leads to the expected results:

>>> saqc.flags
      x
0  9200
1  9200

>>> saqc._flags.history["x"]
        0      1      2
0   255.0    nan    nan 
1   255.0    nan    nan 

The problem

If we now change the annotation to:

@flagging(masking="none")
def flagNone(data, field, flags, *args, **kwargs):
    flags[np.zeros_like(data[field], dtype=bool), field] = BAD  #  all-False mask, no flags will be set
    return data, flags

the above code yields the following results

>>> saqc.flags
      x
0  9222
1  9222

>>> saqc._flags.history["x"]
        0        1        2
0  (255.0)  (255.0)   255.0 
1  (255.0)  (255.0)   255.0 

The explanation

As the code is less straight forward and the (inter-)dependencies between Flags, History and HistAccess are hard to grasp (at least for me), it took me quite a long while to figure out why the two results differ so vastly.

The main difference between the both results is, that with masking="none"

  1. flagNone produced BAD flags
  2. all existing flags are forcefully overwritten

While the first difference is sort of expected and I did find hacky workarounds (i.e. modifying the function), the second is a more problematic behavior. The implicit forcing makes the masking="none" and masking="field" workflows inconsistent, is likely to break the masking behavior and there are no easy workarounds (i.e. I didn't find any).