New flagged value masking behaviour
The following changes to the masking of flagged values are implemented:
- we only reinject the masked values at positions that are flagged before and after the call to (test-)function
- if a (test-)function changed the index of a
data
-column (either its shape or any number of timestamps), the reinjection of masked values is skipped
Not the most elegant solution, but a (hopefully) a working one. Any suggestions are welcome @luenensc and @palmb .
This should close the following issues: #66 (closed), #61 (closed), #58 (closed)
Merge request reports
Activity
- Resolved by Bert Palm
- Resolved by David Schäfer
added 1 commit
- 02be38b0 - Apply suggestion to saqc/core/evaluator/evaluator.py
i would really like that to be in a function, but i dont se a simple way, because its split code. And a context manager seems a bit to much :/.. any ideas, how to manage this elegant ?
[EDIT] a function independent of the calling part
local_env, code = compileExpression(expr, data_in, field, flagger, nodata)
but maybe its not so important right now...
Edited by Bert PalmI was also trying to separate that out and even considered a context manager. But I couldn't find satisfying solution. I was thinking of something like:
@contextmanager def maskData(data, flagger): data_masked = data.copy() mask = flagger.getFlags() data_masked[mask] = np.nan yield data_mask
But then? In order to get reinject the masked values we need the result from
evalCode
and I don't see an easy way to get this into the context manager.Then I was also considering a pre- and a post processing function like:
def maskData(data, flagger): data_masked = data.copy() mask = flagger.getFlags() data_masked[mask] = np.nan return data_masked def unmaskData(data_old, flagger_old, data_new, flagger_new): mask_old = flagger.getFlags() mask_new = DictOfSeries() for col, left in data_new.indexes.iteritems(): right = mask_old[col].index # NOTE: ignore columns with changed indices (assumption: harmonization) if left.equals(right): # NOTE: don't overwrite flags that where unset within the called function mask_new[col] = mask_old[col] & flagger_new.isFlagged(field=col) data_new.aloc[mask_new] = data_old[mask_new] return data_new def evalExpression(expr, data, field, flagger, nodata=np.nan): data_in = maskData(data, flagger) local_env, code = compileExpression(expr, data_in, field, flagger, nodata) data_result, flagger_result = evalCode(code, FUNC_MAP, local_env) data_out = unmaskData(data, flagger, data_result, flagger_result) return data_out, flagger_result
But IDK, I wasn't really convinced from that either... It cleans up
evalExpression
a bit, but also sort of rips those integrated steps into unrelated pieces. But as I see it here now, it is maybe not that bad after all?Edited by David Schäferlike i suggested before, we could move this to register.. see new Snippet @ $49
but this time with complete evaluation at runtime (@schaefed)
give a list:
flagRange(data, field, flagger, min=10, max=50, to_mask=[flagger.BAD, 'DOUBTFUL'])
or a single flag
flagRange(data, field, flagger, min=10, max=50, to_mask=flagger.BAD)
or empty for no masking
flagRange(data, field, flagger, min=10, max=50, to_mask=None)
flagRange(data, field, flagger, min=10, max=50, to_mask=[])
feedback wellcome!
Edited by Bert Palm- Resolved by Bert Palm
I am not entirely convinced, that moving the crucial part of evaluation into the module
funcs
makes it easier or more obvious to find. But the idea to make flags-to-mask a user settable options sort of requires, that the masking is part of the function wrapping.So, long story short, I think, that
register.py
should eventually move into thecore
, as it is going to become an integral part of the evaluation machinery.On more thing:
getMask
in snippet $49 is essentially aflagger.isFlagged
on steroids, so I really think, we should implement the option to pass a (possibly empty) sequence of flags toflagger.isFlagged
and reduce the snippet to:from lib.tools import toSequence def register(): def outer(func): name = func.__name__ func = Partial(func, func_name=name) FUNC_MAP[name] = func def inner(data, field, flagger, *args, **kwargs): # NOTE: would be nice to make to_mask a 'real' argument to_mask = toSequence(kwargs.get('to_mask', flagger.BAD)) data_in = maskData(data, flagger, to_mask) data_res, flagger_res = func(data_in, field, flagger, *args, **kwargs) data_out = unmaskData(data, flagger, data_res, flagger_res, to_mask) return data_out, flagger_res return inner return outer def maskData(data, flagger, to_mask): data_masked = data.copy() data_masked[flagger.isFlagged(flag=to_mask)] = np.nan return data_masked def unmaskData(data_old, flagger_old, data_new, flagger_new, to_mask): mask_old = flagger_old.isFlagged(flag=to_mask) mask_new = flagger_new.isFlagged(flag=to_mask) mask_combined = DictOfSeries() for col, left in data_new.indexes.iteritems(): right = mask_old[col].index # NOTE: ignore columns with changed indices (assumption: harmonization) if left.equals(right): # NOTE: don't overwrite flags that where unset within the called function mask_combined[col] = mask_old[col] & mask_new[col] data_new.aloc[mask_combined] = data_old[mask_combined] return data_new
mentioned in issue #69 (closed)
mentioned in issue #70 (closed)
mentioned in merge request !38 (merged)
mentioned in merge request !52 (merged)
mentioned in commit 96f0ed29
closing this. its outdated and was rewritten in !52 (merged). The tests for masking was added manual in commit 96f0ed29