docu for flagDriftScale

d1c14115 · Juliane Geller · fb03a2b9 · d1c14115
Commit d1c14115 authored 4 years ago by Juliane Geller
--- a/saqc/funcs/functions.py
+++ b/saqc/funcs/functions.py
@@ -1036,6 +1036,63 @@ def flagDriftScale(data, field, flagger, fields_scale1, fields_scale2, segment_f
                      metric=lambda x, y: scipy.spatial.distance.pdist(np.array([x, y]),
                                                                                    metric='cityblock')/len(x),
                      linkage_method='single', **kwargs):
+    """
+       The function transforms variables with different scales to one single scale and then flags value courses that
+       significantly deviate from a group of normal value courses. The scaling transformation is performed via linear
+       regression. The remaining steps are performed analogously to flagDriftFromNorm. The documentation of
+       flagDriftFromNorm gives a more detailed presentation of the remaining steps.
+
+       Parameters
+       ----------
+       data : dios.DictOfSeries
+           A dictionary of pandas.Series, holding all the data.
+       field : str
+           A dummy parameter.
+       flagger : saqc.flagger
+           A flagger object, holding flags and additional informations related to `data`.
+       fields_scale1 : str
+           List of fieldnames in data to be included into the flagging process which are scaled according to scaling
+           scheme 1.
+       fields_scale2 : str
+           List of fieldnames in data to be included into the flagging process which are scaled according to scaling
+           scheme 2.
+       segment_freq : str
+           An offset string, determining the size of the seperate datachunks that the algorihm is to be piecewise
+           applied on.
+       norm_spread : float
+           A parameter limiting the maximum "spread" of the timeseries, allowed in the "normal" group. See Notes section
+           for more details.
+       norm_frac : float, default 0.5
+           Has to be in [0,1]. Determines the minimum percentage of variables, the "normal" group has to comprise to be the
+           normal group actually. The higher that value, the more stable the algorithm will be with respect to false
+           positives. Also, nobody knows what happens, if this value is below 0.5.
+       metric : Callable[(numpyp.array, numpy-array), float]
+           A distance function. It should be a function of 2 1-dimensional arrays and return a float scalar value.
+           This value is interpreted as the distance of the two input arrays. The default is the averaged manhatten metric.
+           See the Notes section to get an idea of why this could be a good choice.
+       linkage_method : {"single", "complete", "average", "weighted", "centroid", "median", "ward"}, default "single"
+           The linkage method used for hierarchical (agglomerative) clustering of the timeseries.
+           See the Notes section for more details.
+           The keyword gets passed on to scipy.hierarchy.linkage. See its documentation to learn more about the different
+           keywords (References [1]).
+           See wikipedia for an introduction to hierarchical clustering (References [2]).
+       kwargs
+
+       Returns
+       -------
+       data : dios.DictOfSeries
+           A dictionary of pandas.Series, holding all the data.
+       flagger : saqc.flagger
+           The flagger object, holding flags and additional Informations related to `data`.
+           Flags values may have changed relatively to the input flagger.
+
+       References
+       ----------
+       Documentation of the underlying hierarchical clustering algorithm:
+           [1] https://docs.scipy.org/doc/scipy/reference/generated/scipy.cluster.hierarchy.linkage.html
+       Introduction to Hierarchical clustering:
+           [2] https://en.wikipedia.org/wiki/Hierarchical_clustering
+       """

    fields = fields_scale1 + fields_scale2
    data_to_flag = data[fields].to_df()