New source-target workflow/semantics
This issue provides a summary of our discussion from 2021-11-02 around #234 (closed).
Our current source-target semantics is as follows: If a function specifies a target
(i.e. saqc.flagUni(field="a", target="x")
do the following:
- call
saqc.copyVariable(source="a", target="x")
- call
saqc.flagUni(field="x")
We had difficulties to transfer our current source-target semantics from the univariate to multivariate use cases (i.e. saqc.flagMulti(field=["a", "b"], target="x")
) as we couldn't find good answers to the following set of challenges:
- What variable should we copy,
a
orb
or should we try to generate an aggregate froma
andb
(for bothdata
andflags
)? - As there seems to be no obvious answer to 1, we decided that we shouldn't copy any variable at all and instead generate an empty variable
x
(i.e. all-NaN
-data
, all-UNFLAGGED
-flags
, emptyHistory
). This is in itself non trivial, as the index ofx
cannot be known upfront (union/intersection ofa
andb
, onlya
s orb
s index, etc) - With 2. in place, the workflows for univariate and multivariate functions differ considerably (i.e. auto copy and history-update in the former, empty data/flags in the latter).
We decided to solve the issue by removing target
from the 'main' function interface. All functions, that need an explicit target have to implement it the respective parameter list and handle it themself. This basically kills the existing 'implicit-copy' workflow, which seems to be reasonable because:
- for most 'normal' flagging functions the
target
workflow is merely syntactic sugar for an explicit call tosaqc.copyVariable
- if we find a solution to the consistency problem outlined above we can reintroduce the current behavior without invalidating exiting configurations/setups, but not the other way round.
This solution comes with pros and cons.
- Pros:
- We don't have hard to explain consistency issues.
- The
core
will be simplified (from a code point of view). - More explicit in favor of implicit behavior
- Cons:
- Configurations might get lengthy because of potentially many calls to
saqc.copyVariable
. We could mellow the situation by equippingsaqc.copyVariable
with an option to copy multiple variables at once (e.g.saqc.copyVariable(field=["a", "b", "c"], target=["x", "y", "z"])
- If all multivariate function implement there own
target
handling, it might become tricky to enforce (somewhat) consistent behavior in the long run (starting with the name of the parametertarget
)
- Configurations might get lengthy because of potentially many calls to
The following (implementation) implications are currently unclear:
- How should the merging of
History
s within the register machinery handle functions with an explicittarget
? Is the current workflow compatible or do we need a rework of this central, but also error-prone part ofSaQC
?
I hope I summarized our discussion correctly and in enough detail @palmb and @luenensc . Please feel free to comment and/or make changes to the text above.