How to source-target multivariate functions?
While working on !320 (closed) I realized that out source-target concept for multivariate functions is not yet sound enough.
Consider a univariate function call like:
saqc.flagUnivariate(field=["a", "b"], target=["x", "y"])
Here the semantics is reasonable and (as a reminder) as follows:
- Copy
a->xandb->y. This includesdata,FlagsandHistory - Call the function as
saqc.flagUnvariate(field="x") ; saqc.flagUnivariate(field="y")
When we go into multivariate terrain, a similar function call behaves differently. An example:
saqc.flagMultivariate(field=["a", "b"], target=["x", "y"])
Here we could still follow a similar semantics like:
- Copy
a->xandb->y. This includesdata,FlagsandHistory - Call the function as
saqc.flagMultivariate(field=["a", "b"], target=["x", "y"])
But as flagMultivariate gets both field and target and is free to do what it wants to do, things already start to get a bit messy... The following questions could come to mind:
- Does it generally make sense to map
atoxandbtoyor should they be 'fresh' and empty variables? - Is it sensible to have two separated histories for
xandy, where each variable carries only the legacy of their source variable? Or shouldxandyreflect the 'merge' of the histories ofaandb?
If we go into the more likely (?) multivariate use case, where a single target is generated from multiple fields, we are completely on yet undefined terrain:
saqc.flagMultivariate(field=["a", "b"], target="x")
Now, how should we generate x? From a-> x or from b -> x? As an 'empty' variable?. While the latter would make sense to me, we would also loose the histories from a and b (which we shouldn't, IMO).
Currently (but this might rapidly change), I tend to the following for multivariate functions:
- Initialize all
targets as empty variables, i.e.data[t].isna() for t in target(restriction: allfieldvariables need the same index) - All targets get same
HistoryandFlagswhich are the product of allHistorys andFlagsfrom allfieldvariables. How such aHistorymerge could look like however, is still not clear to me.