Update FunctionDescriptions.md (spikes_spectrumBased)

e4f17056 · Peter Lünenschloß · c111c372 · e4f17056
Commit e4f17056 authored 5 years ago by Peter Lünenschloß
--- a/docs/FunctionDescriptions.md
+++ b/docs/FunctionDescriptions.md
@@ -15,8 +15,8 @@ missing(nodata=NaN)
 ```
 ### Description
 The Function flags those values in the the passed data series, that are 
-associated with "missing" data. The missing value indicator (np.nan by default),
-can be altered to any other value by passing this new value to the 
+associated with "missing" data. The missing data indicator (`np.nan` by default)
+, can be altered to any other value by passing this new value to the 
 parameter `nodata`.

 ## sesonalRange
@@ -64,6 +64,29 @@ mad(length, z=3.5, freq=None)
 Spikes_Basic(thresh=7, tol=0, length="15min")
 ```
 ### Description
+A basic outlier test, that is designed to work for harmonized, as well as raw 
+(not-harmonized) data.
+
+The values x(n), x(n+1), .... , x(n+k) of a passed timeseries x, are considered
+spikes, if:
+
+1. |x(n-1) - x(n + s)| > `thresh`, for all integers s in {0,1,2,...,k}
+
+2. |x(n-1) - x(n+k+1)| < `tol`
+
+3. |x(n-1).index - x(n+k+1).index| < `length`
+
+By this definition, spikes are values, that, after a jump of margin `thresh`(1), 
+are keeping that new value level they jumped to, for a timespan smaller than 
+`length` (3), and do then return to the initial value level - 
+within a tolerance margin of `tol` (2).  
+Note, that this characterization of a "spike", not only includes one-value 
+outliers, but also plateau-ish value courses.
+
+The implementation is a time-window based version of an outlier test from the 
+UFZ Python library, that can be found here:
+
+https://git.ufz.de/chs/python/blob/master/ufz/level1/spike.py


 ## Spikes_SpektrumBased
@@ -75,6 +98,34 @@ Spikes_SpektrumBased(filter_window_size="3h", raise_factor=0.15, dev_cont_factor
 ```
 ### Description

+The function detects and flags spikes in input data series by evaluating the 
+the timeseries' derivatives and applying some conditions to it. 
+
+NOTE, that the dataseries-to-be flagged is supposed to be harmonized to an 
+equadistant frequencie grid.
+
+A datapoint x(k) of a dataseries x, is considered a spike, if:
+
+1. The quotient to its preceeding datapoint exceeds a certain bound:
+    * x(k)/x(k-1) > 1 + `raise_factor`, or:
+    * x(k)/x(k-1) < 1 - `raise_factor`
+2. The quotient of the datas second derivate x'', at the preceeding 
+   and subsequent timestamps is close enough to 1:
+    * (1 - `dev_cont_factor`) < | x''(k-1)/x''(k+1) |, and
+    * (1 + `dev_cont_factor`) > | x''(k-1)/x''(k+1) |   
+3. The dataset, surrounding x(k), within `noise_window_size` range, but excluding 
+   x(k), is not too noisy. Wheras the noisyness gets measured by 
+   `noise_statistic`: 
+    * 'noise_statistic'(x.index(k-'noise_window_size'),...,
+      x.index(k+'noise_window') < `noise_barrier`
+
+
+This Function is a generalization of the Spectrum based Spike flagging 
+mechanism as presented in:
+
+Dorigo,W,.... Global Automated Quality Control of In Situ Soil Moisture 
+Data from the international Soil Moisture Network. 2013. Vadoze Zone J. 
+doi:10.2136/vzj2012.0097.

 ## constant
 ### Signature