Skip to content
Snippets Groups Projects
Commit 3cf8dda6 authored by Peter Lünenschloß's avatar Peter Lünenschloß
Browse files

Update FunctionDescriptions.md (formula typesetting, parameter tabs)

parent da10e27d
No related branches found
No related tags found
No related merge requests found
......@@ -19,6 +19,11 @@ associated with "missing" data. The missing data indicator (`np.nan` by default)
, can be altered to any other value by passing this new value to the
parameter `nodata`.
| parameter | description |
| ------ | ------ |
| nodata | Value. (Default = np.nan). Any value, that shall indicate missing data in the passed dataseries. |
## sesonalRange
### Signature
```
......@@ -61,33 +66,38 @@ mad(length, z=3.5, freq=None)
## Spikes_Basic
### Signature
```
Spikes_Basic(thresh=7, tol=0, length="15min")
Spikes_Basic(thresh, tolerance, window_size)
```
### Description
A basic outlier test, that is designed to work for harmonized, as well as raw
(not-harmonized) data.
The values x(n), x(n+1), .... , x(n+k) of a passed timeseries x, are considered
spikes, if:
The values $`x_{n}, x_{n+1}, .... , x_{n+k} `$ of a passed timeseries $`x`$,
are considered spikes, if:
1. |x(n-1) - x(n + s)| > `thresh`, for all integers s in {0,1,2,...,k}
1. $`|x_{n-1} - x_{n + s}| > `$ `thresh`, $` s \in \{0,1,2,...,k\} `$
2. |x(n-1) - x(n+k+1)| < `tol`
2. $`|x_{n-1} - x_{n+k+1}| < `$ `tolerance`
3. |x(n-1).index - x(n+k+1).index| < `length`
3. $` |y_{n-1} - y_{n+k+1}| < `$ `window_size`, with $`y `$, denoting the series
of timestamps associated with $`x `$.
By this definition, spikes are values, that, after a jump of margin `thresh`(1),
are keeping that new value level they jumped to, for a timespan smaller than
`length` (3), and do then return to the initial value level -
within a tolerance margin of `tol` (2).
`window_size` (3), and do then return to the initial value level -
within a tolerance margin of `tolerance` (2).
Note, that this characterization of a "spike", not only includes one-value
outliers, but also plateau-ish value courses.
The implementation is a time-window based version of an outlier test from the
UFZ Python library, that can be found here:
https://git.ufz.de/chs/python/blob/master/ufz/level1/spike.py
UFZ Python library, that can be found [here](https://git.ufz.de/chs/python/blob/master/ufz/level1/spike.py).
| parameter | description |
| ------ | ------ |
| thresh | Float. <br/> Minimum jump margin for spikes. See condition (1). |
| tolerance | Float. <br/> Range of area, containing al "valid return values". See condition (2). |
| window_size | Offset String. <br/> An offset string, denoting the maximal length of "spikish" value courses. See condition (3). |
## Spikes_SpektrumBased
### Signature
......@@ -99,25 +109,25 @@ Spikes_SpektrumBased(filter_window_size="3h", raise_factor=0.15, dev_cont_factor
### Description
The function detects and flags spikes in input data series by evaluating the
the timeseries' derivatives and applying some conditions to it.
the timeseries' derivatives and applying some conditions to them.
NOTE, that the dataseries-to-be flagged is supposed to be harmonized to an
equadistant frequencie grid.
A datapoint x(k) of a dataseries x, is considered a spike, if:
A datapoint $`x_k `$ of a dataseries $`x`$,
is considered a spike, if:
1. The quotient to its preceeding datapoint exceeds a certain bound:
* x(k)/x(k-1) > 1 + `raise_factor`, or:
* x(k)/x(k-1) < 1 - `raise_factor`
2. The quotient of the datas second derivate x'', at the preceeding
* $`|\frac{x_k}{x_{k-1}}| > 1 +`$ `raise_factor`, or:
* $`|\frac{x_k}{x_{k-1}}| < 1 -`$ `raise_factor`
2. The quotient of the datas second derivate $`x''`$, at the preceeding
and subsequent timestamps is close enough to 1:
* (1 - `dev_cont_factor`) < | x''(k-1)/x''(k+1) |, and
* (1 + `dev_cont_factor`) > | x''(k-1)/x''(k+1) |
3. The dataset, surrounding x(k), within `noise_window_size` range, but excluding
x(k), is not too noisy. Wheras the noisyness gets measured by
`noise_statistic`:
* 'noise_statistic'(x.index(k-'noise_window_size'),...,
x.index(k+'noise_window') < `noise_barrier`
* $`|\frac{x''_{k-1}}{x''_{k+1}} | > 1 -`$ `dev_cont_factor`, and
* $`|\frac{x''_{k-1}}{x''_{k+1}} | < 1 +`$ `dev_cont_factor`
3. The dataset, $`X_k`$, surrounding $`x_{k}`$, within `noise_window_size` range,
but excluding $`x_{k}`$, is not too noisy. Wheras the noisyness gets measured
by `noise_statistic`:
* `noise_statistic`$`(X_k) <`$ `noise_barrier`
This Function is a generalization of the Spectrum based Spike flagging
......@@ -127,6 +137,16 @@ Dorigo,W,.... Global Automated Quality Control of In Situ Soil Moisture
Data from the international Soil Moisture Network. 2013. Vadoze Zone J.
doi:10.2136/vzj2012.0097.
All parameters default to the values given there.
| parameter | description |
| ------ | ------ |
| raise_factor | Float. (Default=0.15). <br/> Minimum change margin for a datapoint to become a candidate for a spike. See condition (1). |
| dev_cont_factor | Float. (Default=0.2). <br/> See condition (2). |
| noise_barrier| Float. (Default=1). <br/> Upper bound for noisyness of data surrounding potential spikes. See condition (3).|
| noise_window_size| Offset String. (Default='12h'). <br/> Size of the timewindow of the "surrounding" data of a potential spike. See condition (3). |
| noise_statistic| String. (Default="CoVar"). <br/> Operator to calculate noisyness of data, surrounding potential spike. Either "Covar" (=Coefficient od Variation) or "rvar" (=relative Variance).|
## constant
### Signature
```
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment