transform: applies a function on non-overlapping chunks
That one really is nothing more than a chunkwise generic. But calling it through pandas by generics func argument is a little laborious, since it is a grouper method. But introducing an additional frequency-chunk keyword in to generic might be a solution to merge the methods. I would like that, since otherwise transfomrs field of application is quite narrow
rolling: basic rolling window application
I thought we wanted to remove our own rolling-implementation from saqc anyways and just wrap pandas implementation. Of course, one could empower the generics even more, and also merge it into there, via a rolling keyword. (although, the additional keyword arguments of rolling, like min_periods and center are commonly used and would also need to be exposed, so the signaure would grow quite a lot.
Maybe it would be possible to unify implementation by having something like rollingGeneric that wraps around an empowered (and signature rich) underlying _generic?
So...If we would pack everything into processGeneric we gain three additional parameters:
freq or window -> transform and rolling
center and min_periods only rolling
As the signature of processGeneric is relatively slim, this maybe would be that bad... But we probably would also need anothe parameter to distinguish rolling from grouping, right? So the final signature would be something like:
I am currently not sure, what I prefer, @palmb, what do you think about this? If we keep the three different functions, I think, we should at least streamline their implementations and maybe rename transform to, let's say grouping. Additionally, what to do you think about renaming processGeneric to process?
i don't like the idea of parameters that are only used with specific other parameters. There might be use-cases where this makes sense, but in this partikular case, i think we should keep them separate. IMO it makes the partikular functionality clearer at first glance, IOW its separate the concerns and also support more specific typehints and in general is less error-prone due to less cases to handle (in each function).
i prefer a setup like these (including new names):
apply: the old processGeneric
rolling a feature complete wrapper around pandas.rolling (with maybe some additional stuff, if we need it)
groupby a feature complete wrapper around pandas.groupby (with maybe some additional stuff, if we need it)
I like @palmb suggestion. Id also just go for copying the pandas interface and having those common operators functionally seperated.
But naming wise, i would add some suffix/prefix, that comprises those functions under the apply-operator-in-some-mannor notion. So i would, for example, take the extra verbosity and actually call the functions rollingApply, groupApply or so. Also, this way, we can avoid the confusion, that our groupBy function is actually not a function that groups stuff, but a function that applies a custom function on groups and returns the result.
I was thinking to start working on a groupby function but realized that there are a few points we need to consider.
groupby a feature complete wrapper around pandas.groupby (with maybe some additional stuff, if we need it)
Feature complete is IMO not a goal we should try to achieve. Considering the general idea of DataFrame.groupy which is to group the entire dataset based on a (possibly compound) key to apply a function on these groups doesn't work well with our heterogeneous DictOfSeries implementation and also breaks the idea, that the columns a saqc function works on is given by field.
So I think we should aim for Series.groupby instead, which either groups by the index or by the values itself. As we are almost exclusively working with datetime indices and I don't really see the use case of value grouping (but maybe you disagree), I think our SaQC.grouby function is more related to Series.resample instead. BTW, this also offers a possibility to keep the current transformation behavior, if we wrap something like the following:
The thing is however, that we already have resampling operations, so we don't need that anymore... So I am pretty unsure, if we actually need a groupby operation, that goes beyond SaQC.transform and maybe we don't need that at all? What do you think @palmb and @luenensc ?