Skip to content
Snippets Groups Projects

New flagged value masking behaviour

Closed David Schäfer requested to merge masking into develop
All threads resolved!

The following changes to the masking of flagged values are implemented:

  • we only reinject the masked values at positions that are flagged before and after the call to (test-)function
  • if a (test-)function changed the index of a data-column (either its shape or any number of timestamps), the reinjection of masked values is skipped

Not the most elegant solution, but a (hopefully) a working one. Any suggestions are welcome @luenensc and @palmb .

This should close the following issues: #66 (closed), #61 (closed), #58 (closed)

Merge request reports

Loading
Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
  • Bert Palm
  • I'm ok with it

  • David Schäfer added 1 commit

    added 1 commit

    • 02be38b0 - Apply suggestion to saqc/core/evaluator/evaluator.py

    Compare with previous version

  • David Schäfer added 1 commit

    added 1 commit

    Compare with previous version

  • i would really like that to be in a function, but i dont se a simple way, because its split code. And a context manager seems a bit to much :/.. any ideas, how to manage this elegant ?

    [EDIT] a function independent of the calling part

    local_env, code = compileExpression(expr, data_in, field, flagger, nodata)

    but maybe its not so important right now...

    Edited by Bert Palm
  • I was also trying to separate that out and even considered a context manager. But I couldn't find satisfying solution. I was thinking of something like:

    @contextmanager
    def maskData(data, flagger):
        data_masked = data.copy()
        mask = flagger.getFlags()
        data_masked[mask] = np.nan
        yield data_mask

    But then? In order to get reinject the masked values we need the result from evalCode and I don't see an easy way to get this into the context manager.

    Then I was also considering a pre- and a post processing function like:

    def maskData(data, flagger):
        data_masked = data.copy()
        mask = flagger.getFlags()
        data_masked[mask] = np.nan
        return data_masked
    
    
    def unmaskData(data_old, flagger_old, data_new, flagger_new):
        mask_old = flagger.getFlags()
        mask_new = DictOfSeries()
        for col, left in data_new.indexes.iteritems():
            right = mask_old[col].index
            # NOTE: ignore columns with changed indices (assumption: harmonization)
            if left.equals(right):
                # NOTE: don't overwrite flags that where unset within the called function
                mask_new[col] = mask_old[col] & flagger_new.isFlagged(field=col)
        data_new.aloc[mask_new] = data_old[mask_new]
        return data_new
    
    
    def evalExpression(expr, data, field, flagger, nodata=np.nan):
       
        data_in = maskData(data, flagger)
    
        local_env, code = compileExpression(expr, data_in, field, flagger, nodata)
        data_result, flagger_result = evalCode(code, FUNC_MAP, local_env)
    
        data_out = unmaskData(data, flagger, data_result, flagger_result)
    
        return data_out, flagger_result

    But IDK, I wasn't really convinced from that either... It cleans up evalExpression a bit, but also sort of rips those integrated steps into unrelated pieces. But as I see it here now, it is maybe not that bad after all?

    Edited by David Schäfer
  • Bert Palm resolved all threads

    resolved all threads

  • like i suggested before, we could move this to register.. see new Snippet @ $49

    but this time with complete evaluation at runtime (@schaefed)

    give a list:

    flagRange(data, field, flagger, min=10, max=50, to_mask=[flagger.BAD, 'DOUBTFUL'])

    or a single flag

    flagRange(data, field, flagger, min=10, max=50, to_mask=flagger.BAD)

    or empty for no masking

    flagRange(data, field, flagger, min=10, max=50, to_mask=None)
    flagRange(data, field, flagger, min=10, max=50, to_mask=[])

    feedback wellcome!

    Edited by Bert Palm
    • Resolved by Bert Palm

      I am not entirely convinced, that moving the crucial part of evaluation into the module funcs makes it easier or more obvious to find. But the idea to make flags-to-mask a user settable options sort of requires, that the masking is part of the function wrapping.

      So, long story short, I think, that register.py should eventually move into the core, as it is going to become an integral part of the evaluation machinery.

      On more thing: getMask in snippet $49 is essentially a flagger.isFlagged on steroids, so I really think, we should implement the option to pass a (possibly empty) sequence of flags to flagger.isFlagged and reduce the snippet to:

      from lib.tools import toSequence
      
      def register():
          def outer(func):
              name = func.__name__
              func = Partial(func, func_name=name)
              FUNC_MAP[name] = func
      
              def inner(data, field, flagger, *args, **kwargs):
                  # NOTE: would be nice to make to_mask a 'real' argument
                  to_mask = toSequence(kwargs.get('to_mask', flagger.BAD))
                  
                  data_in = maskData(data, flagger, to_mask)
                  data_res, flagger_res = func(data_in, field, flagger, *args, **kwargs)
                  data_out = unmaskData(data, flagger, data_res, flagger_res, to_mask)
      
                  return data_out, flagger_res
      
              return inner
      
          return outer
      
      
      def maskData(data, flagger, to_mask):
          data_masked = data.copy()
          data_masked[flagger.isFlagged(flag=to_mask)] = np.nan
          return data_masked
      
      
      def unmaskData(data_old, flagger_old, data_new, flagger_new, to_mask):
          mask_old = flagger_old.isFlagged(flag=to_mask)
          mask_new = flagger_new.isFlagged(flag=to_mask)
          mask_combined = DictOfSeries()
          for col, left in data_new.indexes.iteritems():
              right = mask_old[col].index
              # NOTE: ignore columns with changed indices (assumption: harmonization)
              if left.equals(right):
                  # NOTE: don't overwrite flags that where unset within the called function
                  mask_combined[col] = mask_old[col] & mask_new[col]
          data_new.aloc[mask_combined] = data_old[mask_combined]
          return data_new
      
  • Bert Palm mentioned in issue #69 (closed)

    mentioned in issue #69 (closed)

  • Bert Palm mentioned in project snippet $49

    mentioned in project snippet $49

  • Bert Palm resolved all threads

    resolved all threads

  • Bert Palm mentioned in issue #70 (closed)

    mentioned in issue #70 (closed)

  • David Schäfer mentioned in merge request !38 (merged)

    mentioned in merge request !38 (merged)

  • Bert Palm mentioned in merge request !52 (merged)

    mentioned in merge request !52 (merged)

  • Bert Palm mentioned in commit 96f0ed29

    mentioned in commit 96f0ed29

  • closing this. its outdated and was rewritten in !52 (merged). The tests for masking was added manual in commit 96f0ed29

  • closed

  • Please register or sign in to reply
    Loading