New flagged value masking behaviour

I'm ok with it

added 1 commit

02be38b0 - Apply suggestion to saqc/core/evaluator/evaluator.py

added 1 commit

fd55563f - Update evaluator.py

i would really like that to be in a function, but i dont se a simple way, because its split code. And a context manager seems a bit to much :/.. any ideas, how to manage this elegant ?

[EDIT] a function independent of the calling part

local_env, code = compileExpression(expr, data_in, field, flagger, nodata)

but maybe its not so important right now...

I was also trying to separate that out and even considered a context manager. But I couldn't find satisfying solution. I was thinking of something like:

@contextmanager
def maskData(data, flagger):
    data_masked = data.copy()
    mask = flagger.getFlags()
    data_masked[mask] = np.nan
    yield data_mask

But then? In order to get reinject the masked values we need the result from evalCode and I don't see an easy way to get this into the context manager.

Then I was also considering a pre- and a post processing function like:

def maskData(data, flagger):
    data_masked = data.copy()
    mask = flagger.getFlags()
    data_masked[mask] = np.nan
    return data_masked


def unmaskData(data_old, flagger_old, data_new, flagger_new):
    mask_old = flagger.getFlags()
    mask_new = DictOfSeries()
    for col, left in data_new.indexes.iteritems():
        right = mask_old[col].index
        # NOTE: ignore columns with changed indices (assumption: harmonization)
        if left.equals(right):
            # NOTE: don't overwrite flags that where unset within the called function
            mask_new[col] = mask_old[col] & flagger_new.isFlagged(field=col)
    data_new.aloc[mask_new] = data_old[mask_new]
    return data_new


def evalExpression(expr, data, field, flagger, nodata=np.nan):
   
    data_in = maskData(data, flagger)

    local_env, code = compileExpression(expr, data_in, field, flagger, nodata)
    data_result, flagger_result = evalCode(code, FUNC_MAP, local_env)

    data_out = unmaskData(data, flagger, data_result, flagger_result)

    return data_out, flagger_result

But IDK, I wasn't really convinced from that either... It cleans up evalExpression a bit, but also sort of rips those integrated steps into unrelated pieces. But as I see it here now, it is maybe not that bad after all?

resolved all threads

like i suggested before, we could move this to register.. see new Snippet @ $49

but this time with complete evaluation at runtime (@schaefed)

give a list:

flagRange(data, field, flagger, min=10, max=50, to_mask=[flagger.BAD, 'DOUBTFUL'])

or a single flag

flagRange(data, field, flagger, min=10, max=50, to_mask=flagger.BAD)

or empty for no masking

flagRange(data, field, flagger, min=10, max=50, to_mask=None)

flagRange(data, field, flagger, min=10, max=50, to_mask=[])

feedback wellcome!

I am not entirely convinced, that moving the crucial part of evaluation into the module funcs makes it easier or more obvious to find. But the idea to make flags-to-mask a user settable options sort of requires, that the masking is part of the function wrapping.

So, long story short, I think, that register.py should eventually move into the core, as it is going to become an integral part of the evaluation machinery.

On more thing: getMask in snippet $49 is essentially a flagger.isFlagged on steroids, so I really think, we should implement the option to pass a (possibly empty) sequence of flags to flagger.isFlagged and reduce the snippet to:

from lib.tools import toSequence

def register():
    def outer(func):
        name = func.__name__
        func = Partial(func, func_name=name)
        FUNC_MAP[name] = func

        def inner(data, field, flagger, *args, **kwargs):
            # NOTE: would be nice to make to_mask a 'real' argument
            to_mask = toSequence(kwargs.get('to_mask', flagger.BAD))
            
            data_in = maskData(data, flagger, to_mask)
            data_res, flagger_res = func(data_in, field, flagger, *args, **kwargs)
            data_out = unmaskData(data, flagger, data_res, flagger_res, to_mask)

            return data_out, flagger_res

        return inner

    return outer


def maskData(data, flagger, to_mask):
    data_masked = data.copy()
    data_masked[flagger.isFlagged(flag=to_mask)] = np.nan
    return data_masked


def unmaskData(data_old, flagger_old, data_new, flagger_new, to_mask):
    mask_old = flagger_old.isFlagged(flag=to_mask)
    mask_new = flagger_new.isFlagged(flag=to_mask)
    mask_combined = DictOfSeries()
    for col, left in data_new.indexes.iteritems():
        right = mask_old[col].index
        # NOTE: ignore columns with changed indices (assumption: harmonization)
        if left.equals(right):
            # NOTE: don't overwrite flags that where unset within the called function
            mask_combined[col] = mask_old[col] & mask_new[col]
    data_new.aloc[mask_combined] = data_old[mask_combined]
    return data_new

mentioned in issue #69 (closed)

mentioned in project snippet $49

resolved all threads

mentioned in issue #70 (closed)

mentioned in merge request !38 (merged)

mentioned in merge request !52 (merged)

mentioned in commit 96f0ed29

closing this. its outdated and was rewritten in !52 (merged). The tests for masking was added manual in commit 96f0ed29

closed

New flagged value masking behaviour

Merge request reports

Activity