Judgements made under the influence of baddies

This is the good old question of how to deal with BAD-ish flagged values, that happen to get a basis for further flagging decisions.

From the discussions on that topic, i got the impression, we tend to not allow any "judgements made under the influence of baddies" - (to give it an iconic name).

Never the less, we never finally decided that explicitly and with all consequences.

When going through the test functions, i realized, that the behavior towards BAD-flagged values is inconsistent and doesnt follow any apparent logic. I guess in some cases, the topic got forgotten, in some, it was decided in favor of the baddies and in some others, against them.

I think the situation will pile up confusion with the growing number of test functions, when not faced with a consistent policy.

I am in favor of a general sticking to the habit, of never relying on badies, because it follows the logic:

test base contains bad-flagged values ->

test not applicable ->

no test result ->

no flag (=no change to the flag)

which appears intuitive to me!

On the level of test functions, strict exclusion of BAD flagged values from any flags calculation has the advantage, that test results of tests somehow "propagate" through the flagging process:

BAD flagged value-exclusion improves harmonization,
BAD flagged value-exclusion makes the results of tests, that rely on statistical indicators such as variance and mean, more relyable.
from the statistical perspective, we are assigning fewer BAD flags in total, meaning, that the probability, that a BAD flagged value, is actually a baddie, increases, because the tests are less strict.

The drawback of the exclude-all-baddies policy are:

The probabillity, that an unflagged value is actually a baddy, increases as well. But thats a common feature of (statistical) tests.
The flagging result is not invariant towards the sequence of test applications. This can be intended in some cases, in some it produces contra intuitive results

Consider a constants test, that flags value courses, remaining constant for longer than 5 values. If a value in the center of a constant value course of 7 values, gets flagged BAD from another test, this value course is not flagged by the constants test.

So, in some cases, there seems to be a need of exceptions from the never rely on badies rule.

But i think, it is possible to translate all candidates for exceptions into examples, where it is allowed to have gaps in the calculation basis. For example, flagging value courses, that stay constant for 5 minute, and have gaps of 2 minutes maximum could tackle the above example and would flag a 7 minutes value course of constants, although it has a gap in it, resulting from an excluded baddie.

So, however for which policy we decide, i think, we should decide it and make our decision explicit and carefully design the flagging tests according to this policy.

Edited Mar 19, 2020 by Peter Lünenschloß