them rolling pitfalls...
We do? Why? Couldn't we provide that in
reindex
?
... yes of course! But it kind of goes against the spirit of the reindex
base function as an independent thing. Since that is now the base for forward reindexing, like resampling, and for inversion of reindexing operations. But i think thats oki. Only thing, is, that in this case the method would have to be named "invert_last", and not "auto" - since it isnt deducable from the general purpose of the function anymore, what it does "automatically" (since reindex
is no inversion specific function, like concatFlags
)
I would do so, unless you are not yet convinced, that this combo approach will work out in the long run...
hmmm. I mean, there comes some extra care/caution necessary when parameterising the reindex
- transferFlags
combo for harmonisation/deharmonisation. For example when to use broadcast
and when to deactivate the dfilter
. (see examples in the docstring) ... so maybe some smaller conveniance wrapper, that still do seperate reindex
and transferFlags
would come in handy - but definetly, we can now get rid of the concatFlags
monolith,,
hm.. yes from a practical perspective those are the reindexing methods always and only used together with the linear interpolation and its inversion. But the "linear" interpolation is a way of aggregating data points/calculating new datapoints in dependence of old ones. Where as the associated reindexing method, (so the method that selects the indices for the linear
interpolation) is independently from what actually happens with those values. It is a "reindex" method - it would be confusing to have among those reindexers a data aggregation shortcut..
i think as part of the demystifying of the harmonisation/deharmonisation, it would be good to keep those concepts seperated - so, to have method="fromPillar"
and data_ggregation="linear"
as the parameterisation for "linear harmonisation" with reindex
- althoug there is no other data_aggregation
method allowed together with fromPillar
/fromLastNext
...
Peter Lünenschloß (9b61b9a1) at 28 Mar 18:44
fixed bug in isValid
Peter Lünenschloß (f069bba2) at 28 Mar 18:40
Found that a huge partition of docstrings are messed up.
MR only contains doc reformttings nd no content or code changes.
Fixing the following formatting errors:
docurator
, but dont know.)Peter Lünenschloß (a9abd264) at 28 Mar 18:40
Merge branch 'docMaintenance' into 'develop'
... and 1 more commit
Peter Lünenschloß (f069bba2) at 28 Mar 18:35
fixing 'unexpected unindent''
Peter Lünenschloß (a5e87711) at 28 Mar 18:24
Peter Lünenschloß (a2c04a58) at 28 Mar 18:24
Merge branch 'deprecateClutter' into 'develop'
... and 1 more commit
adresses #431
deprecating some inferior/unused (outlier detection) functions to emphasize actually used/helpfull functions:
Uhm, yes, Grubbs
is a well known notion kind of. But i actually havent ever encountered anybody deciding specifically for the Grubbs Test. People might know it, but usually they can not tell whats the actual difference to the regular ZScoring
. And this difference gets quite meaningless in real world data sets.
So:
ZScoring
(X-Sigma-rule) and the Grubbs
test rely on the data being normally distributed!Zscore
of the value "under test" - thats every values difference from the Mean of the sample, devided by the samples standard deviationmedian
as central moment ("MAD") is a much more easily paramtrisable and understandable method...In general, its a not wildly used test, that also is a little opaque when it comes to intendently modifying it, due to that students-t-disribution calculation. So it does not really recommand itself for a built-in outlier detection method that we curate kind of, while it can be easily added through a generic application. Also we have actually no real experience with its performance and application and we wont ever recommand using it. It just sits there in the library, and there are dozens of other Outlier dection methods also with popular names - so its a bit random to keep the Grubbs test of the posibilities.
Are there any reasons other than
flagUniLOF
works better?
Yes kind of, but flagUniLOF
also just works better;)
I did not expect to get such a magic as suggested in my interpretation of idea 2 into saqc
, since it sounds a little to good to actually work out.
Regarding the question of your suggestion being good enough, i am the wrong person to ask, since i do not really have a problem with how things are right now, but i also had something like that in mind, to adress the need of circumventing adding temporally data to the SaQC object
.
But, i really dont know how efficient a fix that is if one is really concerned about blowing up and deflating the SaQC
object in the process of a function application...
First, by assigning a new SaQC
object via tmp = qc.resample("f", target="f'", freq="4h")
, dont we duplicate/copy all the other fields, present in qc
? Or does it spawn a SaQC
object only consisting of the variable target
? Not actually sure, but dont think so...
Second, for example, in resample
, a new field temp_field
is generated with a reindex
application. And one could do:
qc_tmp = self.reindex(field, new_target, some_method, validate_function)
as suggested. And than one does the actual reindexing on self
:
self = self.reindex(field, actual, reindex, operation)
But in the next step, both, the results from the first and the second reindex operation are needed in one SaQC
object, for the processGeneric
that assigns NaN to field
where new_target
is False...
So, actually, all the stuff would have to have happened in a seperate/temporal SaQC
object and finally, the result would have had to be assigned to self
...
... i mean that is a doable policy, but arent we copying/duplicating a lot of variables than, in the process of every function application?
for example flagDummy
is rendered:
Function does nothing but returning data and flags.
:param : :type : param str | list[str] field: Variable to process. :param : :type : param str | list[str]? target: Variable name to which the results are written. target will be created if it does not exist. Defaults to field. :param : :type : param Any? dfilter: Defines which observations will be masked based on the already existing flags. Any data point with a flag equal or worse to this threshold will be passed as NaN to the function. Defaults to the DFILTER_ALL value of the translation scheme. :param : :type : param Any? flag: The flag value the function uses to mark observations. Defaults to the BAD value of the translation scheme. :param : :type : returns saqc.SaQC: the updated SaQC object
Fixed by !833 (merged)
David Schäfer (a9b44ba9) at 28 Mar 15:49
David Schäfer (4b3bb016) at 28 Mar 15:49
Merge branch 'docurator-fix' into 'develop'
... and 1 more commit
David Schäfer (a9b44ba9) at 28 Mar 15:44
fix docurator for functions with empty paraemetr list