New source-target workflow/semantics

This issue provides a summary of our discussion from 2021-11-02 around #234 (closed).

Our current source-target semantics is as follows: If a function specifies a target (i.e. saqc.flagUni(field="a", target="x") do the following:

  1. call saqc.copyVariable(source="a", target="x")
  2. call saqc.flagUni(field="x")

We had difficulties to transfer our current source-target semantics from the univariate to multivariate use cases (i.e. saqc.flagMulti(field=["a", "b"], target="x")) as we couldn't find good answers to the following set of challenges:

  1. What variable should we copy, a or b or should we try to generate an aggregate from a and b (for both data and flags)?
  2. As there seems to be no obvious answer to 1, we decided that we shouldn't copy any variable at all and instead generate an empty variable x (i.e. all-NaN-data, all-UNFLAGGED-flags, empty History). This is in itself non trivial, as the index of x cannot be known upfront (union/intersection of a and b, only as or bs index, etc)
  3. With 2. in place, the workflows for univariate and multivariate functions differ considerably (i.e. auto copy and history-update in the former, empty data/flags in the latter).

We decided to solve the issue by removing target from the 'main' function interface. All functions, that need an explicit target have to implement it the respective parameter list and handle it themself. This basically kills the existing 'implicit-copy' workflow, which seems to be reasonable because:

  • for most 'normal' flagging functions the target workflow is merely syntactic sugar for an explicit call to saqc.copyVariable
  • if we find a solution to the consistency problem outlined above we can reintroduce the current behavior without invalidating exiting configurations/setups, but not the other way round.

This solution comes with pros and cons.

  • Pros:
  • Cons:
    • Configurations might get lengthy because of potentially many calls to saqc.copyVariable. We could mellow the situation by equipping saqc.copyVariable with an option to copy multiple variables at once (e.g. saqc.copyVariable(field=["a", "b", "c"], target=["x", "y", "z"])
    • If all multivariate function implement there own target handling, it might become tricky to enforce (somewhat) consistent behavior in the long run (starting with the name of the parameter target)

The following (implementation) implications are currently unclear:

  • How should the merging of Historys within the register machinery handle functions with an explicit target? Is the current workflow compatible or do we need a rework of this central, but also error-prone part of SaQC?

I hope I summarized our discussion correctly and in enough detail @palmb and @luenensc . Please feel free to comment and/or make changes to the text above.

Edited by David Schäfer