updated FunctionDescriptions.md

c73f22d9 · David Schäfer · f4186b5c · c73f22d9
Commit c73f22d9 authored 5 years ago by David Schäfer
--- a/docs/FunctionDescriptions.md
+++ b/docs/FunctionDescriptions.md
@@ -10,10 +10,10 @@ Main documentation of the implemented functions, their purpose and parameters an
 - [seasonalRange](#seasonalrange)
 - [clear](#clear)
 - [force](#force)
- - [sliding_outlier](#sliding_outlier)
+ - [spikes_basic](#spikes_basic)
 - [spikes_simpleMad](#spikes_simpleMad)
- - [spikes_Basic](#spikes_basic)
- - [Spikes_SpektrumBased](#spikes_spektrumbased)
+ - [spikes_slidingZscore](#spikes_slidingZscore)
+ - [spikes_spektrumBased](#spikes_spektrumBased)
 - [constant](#constant)
 - [constants_varianceBased](#constants_variancebased)
 - [soilMoisture_plateaus](#soilmoisture_plateaus)
@@ -141,52 +141,37 @@ force()

 Force flags to a flag-value.

-## sliding_outlier
-
-Detect outlier/spikes by a given method in a sliding window.
-
+## spikes_basic
 ```
-sliding_outlier(winsz="1h", dx="1h", count=1, deg=1, z=3.5, method="modZ")
+spikes_basic(thresh, tolerance, window_size)
 ```

-| parameter  | data type            | default value | description |
-| ---------  | -----------          | ----          | ----------- |
-| winsz      | offset-string/integer | `"1h"`        | size of the sliding window, the *method* is applied on      |
-| dx         | offset-string/integer | `"1h"`        | the step size the sliding window is continued after calculation |
-| count      | integer              | `1`           | the minimal count, a possible outlier needs, to be flagged   |
-| deg        | integer              | `1"`          | the degree of the polynomial fit, to calculate the residual  |
-| z          | float                | `3.5`         | z-parameter for the *method* (see description)               |
-| method     | string               | `"modZ"`      | the method outlier are detected with                         |
+| parameter   | data type | default value | description                                                                                  |
+| ------      | ------    | ------        | ----                                                                                         |
+| thresh      | float     |               | Minimum jump margin for spikes. See condition (1).                                           |
+| tolerance   | float     |               | Range of area, containing al "valid return values". See condition (2).                       |
+| window_size | ftring    |               | An offset string, denoting the maximal length of "spikish" value courses. See condition (3). |

-Parameter notes: 
- - `winsz` and `dx` must be of same type, mixing of offset and integer is not supported and will fail.
- - if offset-strings only work with datetime indexed data
+A basic outlier test, that is designed to work for harmonized, as well as raw
+(not-harmonized) data.

-The algorithm works as follows:
-  1.  a window of size `winsz` is cut from the data
-  2.  normalisation - (the data is fit by a polynomial of the given degree `deg`, which is subtracted from the data)
-  3.  the outlier detection `method` is applied on the residual, and possible outlier are marked
-  4.  the window (on the data) is continued by `dx` to the next data-slot
-  5.  start over from 1. until the end of data is reached
-  6.  all potential outlier, that are detected `count`-many times, are flagged as outlier 
+The values $`x_{n}, x_{n+1}, .... , x_{n+k} `$ of a passed timeseries $`x`$,
+are considered spikes, if:

-The possible outlier detection methods are *zscore* and *modZ*. 
-In the following description, the residual (calculated from a slice by the sliding window) is referred as *data*.
+1. $`|x_{n-1} - x_{n + s}| > `$ `thresh`, $` s \in \{0,1,2,...,k\} `$

-The **zscore** (Z-score) [1] mark every value as possible outlier, which fulfill:
-```math
- |r - m| > s * z
-```
-with $` r, m, s, z `$: data, data mean, data standard deviation, `z`.
+2. $`|x_{n-1} - x_{n+k+1}| < `$ `tolerance`

-The **modZ** (modified Z-score) [1] mark every value as possible outlier, which fulfill:
-```math
- 0.6745 * |r - M| > mad * z > 0
-```
-with $` r, M, mad, z `$: data, data median, data median absolute deviation, `z`.
+3. $` |y_{n-1} - y_{n+k+1}| < `$ `window_size`, with $`y `$, denoting the series
+   of timestamps associated with $`x `$.

-See also:
-[1] https://www.itl.nist.gov/div898/handbook/eda/section3/eda35h.htm
+By this definition, spikes are values, that, after a jump of margin `thresh`(1),
+are keeping that new value level they jumped to, for a timespan smaller than
+`window_size` (3), and do then return to the initial value level -
+within a tolerance margin of `tolerance` (2).  
+
+Note, that this characterization of a "spike", not only includes one-value
+outliers, but also plateau-ish value courses.

 ## spikes_simpleMad

@@ -215,60 +200,70 @@ Note: This function should only applied on normalised data.
 See also:
 [1] https://www.itl.nist.gov/div898/handbook/eda/section3/eda35h.htm

+## spikes_slidingZscore
+
+Detect outlier/spikes by a given method in a sliding window.

-## Spikes_Basic
 ```
-Spikes_Basic(thresh, tolerance, window_size)
+spikes_slidingZscore(winsz="1h", dx="1h", count=1, deg=1, z=3.5, method="modZ")
 ```

-| parameter   | data type | default value | description |
-| ------      | ------    | ------        | ----        |
-| thresh      | float     |               | Minimum jump margin for spikes. See condition (1). |
-| tolerance   | float     |               | Range of area, containing al "valid return values". See condition (2). |
-| window_size | ftring    |               | An offset string, denoting the maximal length of "spikish" value courses. See condition (3). |
+| parameter | data type             | default value | description                                                     |
+| --------- | -----------           | ----          | -----------                                                     |
+| winsz     | offset-string/integer | `"1h"`        | size of the sliding window, the *method* is applied on          |
+| dx        | offset-string/integer | `"1h"`        | the step size the sliding window is continued after calculation |
+| count     | integer               | `1`           | the minimal count, a possible outlier needs, to be flagged      |
+| deg       | integer               | `1"`          | the degree of the polynomial fit, to calculate the residual     |
+| z         | float                 | `3.5`         | z-parameter for the *method* (see description)                  |
+| method    | string                | `"modZ"`      | the method outlier are detected with                            |

-A basic outlier test, that is designed to work for harmonized, as well as raw
-(not-harmonized) data.
-
-The values $`x_{n}, x_{n+1}, .... , x_{n+k} `$ of a passed timeseries $`x`$,
-are considered spikes, if:
-
-1. $`|x_{n-1} - x_{n + s}| > `$ `thresh`, $` s \in \{0,1,2,...,k\} `$
-
-2. $`|x_{n-1} - x_{n+k+1}| < `$ `tolerance`
+Parameter notes: 
+ - `winsz` and `dx` must be of same type, mixing of offset and integer is not supported and will fail.
+ - if offset-strings only work with datetime indexed data

-3. $` |y_{n-1} - y_{n+k+1}| < `$ `window_size`, with $`y `$, denoting the series
-   of timestamps associated with $`x `$.
+The algorithm works as follows:
+  1.  a window of size `winsz` is cut from the data
+  2.  normalisation - (the data is fit by a polynomial of the given degree `deg`, which is subtracted from the data)
+  3.  the outlier detection `method` is applied on the residual, and possible outlier are marked
+  4.  the window (on the data) is continued by `dx` to the next data-slot
+  5.  start over from 1. until the end of data is reached
+  6.  all potential outlier, that are detected `count`-many times, are flagged as outlier 

-By this definition, spikes are values, that, after a jump of margin `thresh`(1),
-are keeping that new value level they jumped to, for a timespan smaller than
-`window_size` (3), and do then return to the initial value level -
-within a tolerance margin of `tolerance` (2).  
+The possible outlier detection methods are *zscore* and *modZ*. 
+In the following description, the residual (calculated from a slice by the sliding window) is referred as *data*.

-Note, that this characterization of a "spike", not only includes one-value
-outliers, but also plateau-ish value courses.
+The **zscore** (Z-score) [1] mark every value as possible outlier, which fulfill:
+```math
+ |r - m| > s * z
+```
+with $` r, m, s, z `$: data, data mean, data standard deviation, `z`.

-The implementation is a time-window based version of an outlier test from the
-UFZ Python library, that can be found [here](https://git.ufz.de/chs/python/blob/master/ufz/level1/spike.py).
+The **modZ** (modified Z-score) [1] mark every value as possible outlier, which fulfill:
+```math
+ 0.6745 * |r - M| > mad * z > 0
+```
+with $` r, M, mad, z `$: data, data median, data median absolute deviation, `z`.

+See also:
+[1] https://www.itl.nist.gov/div898/handbook/eda/section3/eda35h.htm

-## Spikes_SpektrumBased
+## spikes_spektrumBased

 ```
-Spikes_SpektrumBased(raise_factor=0.15, dev_cont_factor=0.2,
+spikes_spektrumBased(raise_factor=0.15, dev_cont_factor=0.2,
                     noise_barrier=1, noise_window_size="12h", noise_statistic="CoVar",
                     smooth_poly_order=2, filter_window_size=None)
 ```

-| parameter          | data type | default value | description |
-| ------             | ------    | ------        | ----        |
-| raise_factor       | float     | `0.15`        | Minimum change margin for a datapoint to become a candidate for a spike. See condition (1). |
-| dev_cont_factor    | float     | `0.2`         | See condition (2). |
-| noise_barrier      | float     | `1`           | Upper bound for noisyness of data surrounding potential spikes. See condition (3).|
-| noise_window_range | string    | `"12h"`       | Any offset string. Determines the range of the timewindow of the "surrounding" data of a potential spike. See condition (3). |
-| noise_statistic    | string    | `"CoVar"`     | Operator to calculate noisyness of data, surrounding potential spike. Either `"Covar"` (=Coefficient od Variation) or `"rvar"` (=relative Variance).|
-| smooth_poly_order  | integer   | `2`           | Order of the polynomial fit, applied for smoothing|
-| filter_window_size      | Nonetype or string   | `None` | Options: <br/> - `None` <br/> - any offset string <br/><br/> Controlls the range of the smoothing window applied with the Savitsky-Golay filter. If None is passed (default), the window size will be two times the sampling rate. (Thus, covering 3 values.) If you are not very well knowing what you are doing - do not change that value. Broader window sizes caused unexpected results during testing phase.|
+| parameter          | data type          | default value | description                                                                                                                                                                                                                                                                                                                                                                                                        |
+| ------             | ------             | ------        | ----                                                                                                                                                                                                                                                                                                                                                                                                               |
+| raise_factor       | float              | `0.15`        | Minimum change margin for a datapoint to become a candidate for a spike. See condition (1).                                                                                                                                                                                                                                                                                                                        |
+| dev_cont_factor    | float              | `0.2`         | See condition (2).                                                                                                                                                                                                                                                                                                                                                                                                 |
+| noise_barrier      | float              | `1`           | Upper bound for noisyness of data surrounding potential spikes. See condition (3).                                                                                                                                                                                                                                                                                                                                 |
+| noise_window_range | string             | `"12h"`       | Any offset string. Determines the range of the timewindow of the "surrounding" data of a potential spike. See condition (3).                                                                                                                                                                                                                                                                                       |
+| noise_statistic    | string             | `"CoVar"`     | Operator to calculate noisyness of data, surrounding potential spike. Either `"Covar"` (=Coefficient od Variation) or `"rvar"` (=relative Variance).                                                                                                                                                                                                                                                               |
+| smooth_poly_order  | integer            | `2`           | Order of the polynomial fit, applied for smoothing                                                                                                                                                                                                                                                                                                                                                                 |
+| filter_window_size | Nonetype or string | `None`        | Options: <br/> - `None` <br/> - any offset string <br/><br/> Controlls the range of the smoothing window applied with the Savitsky-Golay filter. If None is passed (default), the window size will be two times the sampling rate. (Thus, covering 3 values.) If you are not very well knowing what you are doing - do not change that value. Broader window sizes caused unexpected results during testing phase. |


 The function detects and flags spikes in input data series by evaluating the
@@ -302,7 +297,6 @@ Dorigo, W. et al: Global Automated Quality Control of In Situ Soil Moisture
 Data from the international Soil Moisture Network. 2013. Vadoze Zone J.
 doi:10.2136/vzj2012.0097.

-
 ## constant

 ```