Skip to content

Draft: Saqc dunder operators

Peter Lünenschloß requested to merge saqc_arithmetic_dunder_symmetric into develop

dunder operator Mixin for SaQC class.

The idea is to just dispatch dunder calls to flagGeneric or processGeneric, and strictly constrain arithmetic and logic operators to either be data (arithmetic) or flags (logic) operators.

Arithmetic operators (+, *, -, /, %, **)

Arithmetic dunder operators are confined to operate on data and thus are wrapping processGeneric. Consistency for the arithmetic operators is fairly easily achieved by defining:

qc[field] + t

with t being some scalar (float, int)

qc.processGeneric(field, target=target, func=lambda x: x + t)[target]

(target is always selected so that is a new variable)

qc[field] + A

with A being some Array-like

qc.processGeneric(field, target=target, func=lambda x: x + A)[target]

qc1[field1] + qc2[field2]

In case both operands are SaQC objects, solution remains fairly straight forward.

qc.processGeneric([field1, field2], target=target, func=lambda x,y: x+y)[target]

(with qc being a join of qc1 and qc2)

-qc[field]

Unitary operation also just gets passed to processGeenric

qc.processGeneric(field, target=target, func=lambda x: -x)[target]

Binary operators (&, |, ~)

Binary dunder are confined to wrap flagGeneric and only operate on the flags. The idea is to dispatch the actual logic down to an isflagged-call in flagGeneric.

qc[field] & A

with A being some boolean type Array-like

qc.flagGeneric(field, target=field, func=lambda x: isflagged(x) & A)[field]

qc1[field1] & qc2[field2]

To evade the ambiguity of what happens with the data if both operands are saqc objects, in this case, all data is returned as NaN. Flags are calculated with respect to the operator (& in this case): The call is dispatched to:

qc.flagGeneric([field1,field2], target=target, func=lambda x,y: isflagged(x) & isflagged(y))[target]

(with qc being a join of qc1 and qc2 and target being chosen so that it isnt present in qc)

~qc[field]

Negation is dispatched via::

qc.flagGeneric(field,target, func=lambda x: ~isflagged(x))[target]

Assignment Operator (where(), |=)

Unfortunately, to assign flags resulting from binary dunder application to a variable, additional operators are needed: .where() and |=

qc1[field1].where(qc2[field2])

Achieves flagging of field1 in qc1, according to field2 in qc2. Internally,this is dispatched to:

qc.flagGeneric(field2, target=field1, func=lambda x: isflagged(x))[field1]

(with field1 keeping the data)

qc1[field1].where(qc2[field2], flag=F)

where() can be provided with a flag value to control the set flag. The parameter gets forwarded to the underlying flagGeneric Call:

qc.flagGeneric(field2, target=field1, func=lambda x: isflagged(x), flag=F)[field1]

While where() returns the flagged saqc object, |= operates inplace:

qc1[field1] |= qc2[field2]

equals qc1[field1] = qc1[field1].where(qc2[field2]), and

qc1[field1] |= qc2[field2], F

equals qc1[field1] = qc1[field1].where(qc2[field2], flag=F)

Comperative Oparators (<, <=, >, >=)

Comparative operators also only operate on the flags part of the SaQC input. The idea is to have a shorthand for filtering input data by their flag level.

qc[field] > 0

This will return a new SaQC object, where values are flagged BAD, if their flag level in qc was above 0, and UNFLAGGED otherwise.

(qc[field] > "DOUBTFUL") + 5

this will yield an SaQC object, where data is qc[field] + 5, if field value was flagged better than doubtfull and np.nan otherwise.

Examples

some data:

>> import saqc
>> import pandas as pd
>> dat= pd.Series([1,2,3], index=pd.date_range('2000', periods=3, freq='30min'), name='d1')
>> qc=saqc.SaQC(dat)
>> qc.data
                    d1 | 
====================== | 
2000-01-01 00:00:00  1 | 
2000-01-01 00:30:00  2 | 
2000-01-01 01:00:00  3 | 

Multiply data

>> new_qc = qc * 13
>> new_qc.data
                  d1*13 | 
======================= | 
2000-01-01 00:00:00  13 | 
2000-01-01 00:30:00  26 | 
2000-01-01 01:00:00  39 | 

We can chain and mix with operations:

>> new_qc = (qc * 13) + 5
>> new_ac.data
               (d1*13)+5 | 
======================= | 
2000-01-01 00:00:00  18 | 
2000-01-01 00:30:00  31 | 
2000-01-01 01:00:00  44 |

>> qc['arithmetic_juggling'] = new_qc / qc
>> qc['arithmetic_juggling'].data
                   d1 |            arithmetic_juggling | 
====================== | ============================== | 
2000-01-01 00:00:00  1 | 2000-01-01 00:00:00  18.000000 | 
2000-01-01 00:30:00  2 | 2000-01-01 00:30:00  15.500000 | 
2000-01-01 01:00:00  3 | 2000-01-01 01:00:00  14.666667 | 

make some flags:

>> qc['d2'] = qc.flagRange('d1', max=2)['d1']
>> qc.flags
                      d1 |                         d2 | 
======================== | ========================== | 
2000-01-01 00:00:00 -inf | 2000-01-01 00:00:00   -inf | 
2000-01-01 00:30:00 -inf | 2000-01-01 00:30:00   -inf | 
2000-01-01 01:00:00 -inf | 2000-01-01 01:00:00  255.0 |

flag d1 with value 100, where d2 is not flagged:

>> qc['d1'] = qc['d1'].where(~qc['d2'], flag=100)
>> qc.flags['d1']
                        d1 | 
========================== | 
2000-01-01 00:00:00  100.0 | 
2000-01-01 00:30:00  100.0 | 
2000-01-01 01:00:00   -inf | 

Adding d1 and d2 yields all-NaN data

>> (qc['d1'] + qc['d2']).data

                  d1+d2 | 
======================= | 
2000-01-01 00:00:00 NaN | 
2000-01-01 00:30:00 NaN | 
2000-01-01 01:00:00 NaN | 

because either, d1 or d2 have a flag at every timestamp.

>> qc['d2'].flags
                        d2 | 
========================== | 
2000-01-01 00:00:00   -inf | 
2000-01-01 00:30:00   -inf | 
2000-01-01 01:00:00  255.0 | 

>> qc['d1'].flags                        

                        d1 | 
========================== | 
2000-01-01 00:00:00  100.0 | 
2000-01-01 00:30:00  100.0 | 
2000-01-01 01:00:00   -inf | 

By including all the values of 'd1' that are flagged "better" than 200, we can change the arithmetic operators input:

>> ((qc['d1'] > 200) + qc['d2']).data

             (d1<200)+d2 | 
======================== | 
2000-01-01 00:00:00  2.0 | 
2000-01-01 00:30:00  4.0 | 
2000-01-01 01:00:00  NaN | 

To include the last value that is flagged at level 255 in 'd2`, also change the right operators input:

>> qc_ = (qc['d1'] > 200) + (qc['d2'] > 300)
>> qc_.data

     (d1<200)+(d2<300) | 
====================== | 
2000-01-01 00:00:00  2 | 
2000-01-01 00:30:00  4 | 
2000-01-01 01:00:00  6 | 

Flags, where either 'd1' is flagged above 150 or 'd2' is above 150:

>> qc_ = (qc['d1'] > 150) | (qc['d2'] < 150)
>> qc_.flags

         (d1>150)|(d2<150) | 
========================== | 
2000-01-01 00:00:00  255.0 | 
2000-01-01 00:30:00  255.0 | 
2000-01-01 01:00:00   -inf | 

Note, that qc_ field holds only NaN in the data:

>> qc_.data

      (d1>150)|(d2<150) | 
======================= | 
2000-01-01 00:00:00 NaN | 
2000-01-01 00:30:00 NaN | 
2000-01-01 01:00:00 NaN | 

To combine flags with existing data, use where().

>> qc_=qc['d1'].where((qc['d1'] > 150) | (qc['d2'] > 250))
>> qc_.data
                     d1 | 
======================== | 
2000-01-01 00:00:00  1.0 | 
2000-01-01 00:30:00  2.0 | 
2000-01-01 01:00:00  3.0 | 

>> qc_.flags

                       d1 | 
========================== | 
2000-01-01 00:00:00  100.0 | 
2000-01-01 00:30:00  100.0 | 
2000-01-01 01:00:00  255.0 | 
Edited by Peter Lünenschloß

Merge request reports

Loading