New clean flagger
A new clean Flagger
despite the need of the unified flagger, with backtracking feature, and may complex implementation, i like to suggest a new interface for the flagger, which make the way we use our flagger much more clean and nice.
The only thing i did not (want) to implement is the flag_after
and flag_before
KWs. In my opinion they are not needed anymore if we have the backtracking feature. Then we can simply write own functions for that, with the same functionality (eg. saqc.flagRange('A', min=23).flagAfter('A', '1d')
) wich will also get rid of some more general function-cluttering KWs.
Following a simple explanation how the new clean flagger should work. Note that the described functionality is already implemented (!).
The best part is about the implementation is, that it is very straight forward and not very complicated.
It is and mostly just checking, because all the tricky magic was already implemented in dios.DictOfSeires
. So
for the most parts this code is just a wrapper around dios.
But also i already implemented this it is not tested. But i'm very positive that it will work like a charm..
initFlags()
create, former Flagger()
creates a new instance, and you can pass it around, pass it to an SaQC
object etc.
At some point, you may want to initialize the flagger with some real existing flags (maybe from another flagger)
or just from some plain data (DictOfSeries, DataFrame, ..), so simply call
# start fresh
from_data = Flagger(data=data)
from_flags = Flagger(flags=flags)
# or from existing flagger
flagger = Flagger()
from_data = flagger.constructor(data=data)
from_flags = flagger.constructor(flags=flags)
getFlags()
getting flags, former To retrieve flags from the flagger use the __getitem__
aka []
.
# single variable
A = flagger['A']
# passing list-likes for multible variables
some = flagger[['A', 'B']]
The loc also works, but for simplicity it only works if a mask and column key is passed:
# boolean masks
mask = [True, False, True, True]
trimmedA = flagger[mask, 'A']
# slices also work
sl = slice('2000/01/01', '2000/01/05')
trimmedA = flagger[sl, 'A']
As boolean masks works anything, that pandas would accept as boolean indexer..
setFlags()
setting/forcing flags, former One can simply add a new flags column by passing a new pandas.Series
to a new(!)
variable name:
flagger['new'] = pd.Series([0,0,0,9,0], dtype=int)
alternatively one can call flagger.insert(name='Foo', pd.Series(..))
.
To set flags, it nearly works same as the getting flags above, but now the new flags are assigned to the selected subset.
# by mask
mask = [True, False, True, True]
flagger[mask, 'A'] = BAD
# set all
flagger[:, 'B'] = BAD
# set multible
flagger[:, ['A', 'B']] = VERY_BAD
But bear in mind that this is not a straight forward assignment of values, the order of the flags is respected. Only flags that are worse than the existing flags are set ! If one needs to force flags by any reason, one can use the force indexer. It works the very same as described above.
flagger.force[mask, 'A'] = BAD
isFlagged()
get flagged info, former To find out which flags are actually set, one can use the comparison operators ==,!=,<,<=,>,>=
,
which return a boolean pd.Series
or a boolean DictOfSeries
respectively.
flagger > UNFLAGGED
flagger['A'] < BAD
flagger[['A', 'B']] == BAD_AS_HELL
with this some very nice flag magic is possible:
# set all flags worse than DOUBTFUL (all columns, all rows) to BAD
flagger[flagger >= DOUBTFUL] = BAD
# set all TOO_BADs to BAD in column A and B
flagger.force[flagger[['A', 'B']] == TOO_BAD] = BAD
slice()
and merge()
other operations former To get rid of a column one can call drop()
now or simply do a del
.
del flagger['A']
flagger.drop('A')
We still have our beloved copy()
.
other = flagger.copy()
And in future a merge()
, that need a rethinking IMO.
Yes ?