Skip to content

New clean flagger

Bert Palm requested to merge new_cean_flagger into develop

A new clean Flagger

despite the need of the unified flagger, with backtracking feature, and may complex implementation, i like to suggest a new interface for the flagger, which make the way we use our flagger much more clean and nice.

The only thing i did not (want) to implement is the flag_after and flag_before KWs. In my opinion they are not needed anymore if we have the backtracking feature. Then we can simply write own functions for that, with the same functionality (eg. saqc.flagRange('A', min=23).flagAfter('A', '1d') ) wich will also get rid of some more general function-cluttering KWs.

Following a simple explanation how the new clean flagger should work. Note that the described functionality is already implemented (!).

The best part is about the implementation is, that it is very straight forward and not very complicated. It is and mostly just checking, because all the tricky magic was already implemented in dios.DictOfSeires. So for the most parts this code is just a wrapper around dios.

But also i already implemented this it is not tested. But i'm very positive that it will work like a charm..

create, former initFlags()

Flagger() creates a new instance, and you can pass it around, pass it to an SaQC object etc. At some point, you may want to initialize the flagger with some real existing flags (maybe from another flagger) or just from some plain data (DictOfSeries, DataFrame, ..), so simply call

# start fresh 
from_data = Flagger(data=data)
from_flags = Flagger(flags=flags)

# or from existing flagger
flagger = Flagger()
from_data = flagger.constructor(data=data)
from_flags = flagger.constructor(flags=flags)

getting flags, former getFlags()

To retrieve flags from the flagger use the __getitem__ aka [].

# single variable
A = flagger['A']

# passing list-likes for multible variables
some = flagger[['A', 'B']]

The loc also works, but for simplicity it only works if a mask and column key is passed:

# boolean masks 
mask = [True, False, True, True]
trimmedA = flagger[mask, 'A']

# slices also work
sl = slice('2000/01/01', '2000/01/05')
trimmedA = flagger[sl, 'A']

As boolean masks works anything, that pandas would accept as boolean indexer..

setting/forcing flags, former setFlags()

One can simply add a new flags column by passing a new pandas.Series to a new(!) variable name:

flagger['new'] = pd.Series([0,0,0,9,0], dtype=int)

alternatively one can call flagger.insert(name='Foo', pd.Series(..)).

To set flags, it nearly works same as the getting flags above, but now the new flags are assigned to the selected subset.

# by mask
mask = [True, False, True, True]
flagger[mask, 'A'] = BAD

# set all
flagger[:, 'B'] = BAD

# set multible
flagger[:, ['A', 'B']] = VERY_BAD

But bear in mind that this is not a straight forward assignment of values, the order of the flags is respected. Only flags that are worse than the existing flags are set ! If one needs to force flags by any reason, one can use the force indexer. It works the very same as described above.

flagger.force[mask, 'A'] = BAD

get flagged info, former isFlagged()

To find out which flags are actually set, one can use the comparison operators ==,!=,<,<=,>,>=, which return a boolean pd.Series or a boolean DictOfSeries respectively.

flagger > UNFLAGGED
flagger['A'] < BAD
flagger[['A', 'B']] == BAD_AS_HELL

with this some very nice flag magic is possible:

# set all flags worse than DOUBTFUL (all columns, all rows) to BAD
flagger[flagger >= DOUBTFUL] = BAD

# set all TOO_BADs to BAD in column A and B
flagger.force[flagger[['A', 'B']] == TOO_BAD] = BAD 

other operations former slice() and merge()

To get rid of a column one can call drop() now or simply do a del.

del flagger['A']
flagger.drop('A')

We still have our beloved copy().

other = flagger.copy()

And in future a merge(), that need a rethinking IMO.

Yes ?

Edited by Bert Palm

Merge request reports