masking="none" functions forcefully overwrite existing flags
The machinery handling functions annotated with masking="none"
has some the mentioned undesirable side effect. I try to explain the issues with a few code examples, using the PositionalTranslator
for demonstration purposes.
Let's start with a short PositionalTranslator
recap:
The translator generates flags, starting with the digit 9
followed by as many digits as test where run on a given field
, providing the values 2: BAD
, 1: DOUBTFUL
and 0: GOOD | UNFLAGGED
. Masking is set to DOUBTFUL + 1
, i.e. all values with flags larger then DOUBTFUL
will be masked between successive function calls. A flag like 91020
therefore indicates, that we called four functions, the first setting the flag DOUBTFUL
, the third the flag BAD
and the function calls 2 and 3 did not set a specific flag (or GOOD
for that matter).
Now consider the following test functions:
@flagging(masking="field")
def flagAll(data, field, flags, *args, **kwargs):
flags[:, field] = BAD
return data, flags
@flagging(masking="field")
def flagNone(data, field, flags, *args, **kwargs):
flags[np.zeros_like(data[field], dtype=bool), field] = BAD # all-False mask, no flags will be set
return data, flags
The baseline
Both functions are annotated with masking="field"
. The follwing code:
data = pd.DataFrame({"x": np.arange(2)})
saqc = (
SaQC(data, scheme=PositionalTranslator())
.flagAll(field="x")
.flagNone(field="x")
)
leads to the expected results:
>>> saqc.flags
x
0 9200
1 9200
>>> saqc._flags.history["x"]
0 1 2
0 255.0 nan nan
1 255.0 nan nan
The problem
If we now change the annotation to:
@flagging(masking="none")
def flagNone(data, field, flags, *args, **kwargs):
flags[np.zeros_like(data[field], dtype=bool), field] = BAD # all-False mask, no flags will be set
return data, flags
the above code yields the following results
>>> saqc.flags
x
0 9222
1 9222
>>> saqc._flags.history["x"]
0 1 2
0 (255.0) (255.0) 255.0
1 (255.0) (255.0) 255.0
The explanation
As the code is less straight forward and the (inter-)dependencies between Flags
, History
and HistAccess
are hard to grasp (at least for me), it took me quite a long while to figure out why the two results differ so vastly.
The main difference between the both results is, that with masking="none"
-
flagNone
producedBAD
flags - all existing flags are forcefully overwritten
While the first difference is sort of expected and I did find hacky workarounds (i.e. modifying the function), the second is a more problematic behavior. The implicit forcing makes the masking="none"
and masking="field"
workflows inconsistent, is likely to break the masking behavior and there are no easy workarounds (i.e. I didn't find any).