Skip to content
Snippets Groups Projects
Commit e4f17056 authored by Peter Lünenschloß's avatar Peter Lünenschloß
Browse files

Update FunctionDescriptions.md (spikes_spectrumBased)

parent c111c372
No related branches found
No related tags found
No related merge requests found
......@@ -15,8 +15,8 @@ missing(nodata=NaN)
```
### Description
The Function flags those values in the the passed data series, that are
associated with "missing" data. The missing value indicator (np.nan by default),
can be altered to any other value by passing this new value to the
associated with "missing" data. The missing data indicator (`np.nan` by default)
, can be altered to any other value by passing this new value to the
parameter `nodata`.
## sesonalRange
......@@ -64,6 +64,29 @@ mad(length, z=3.5, freq=None)
Spikes_Basic(thresh=7, tol=0, length="15min")
```
### Description
A basic outlier test, that is designed to work for harmonized, as well as raw
(not-harmonized) data.
The values x(n), x(n+1), .... , x(n+k) of a passed timeseries x, are considered
spikes, if:
1. |x(n-1) - x(n + s)| > `thresh`, for all integers s in {0,1,2,...,k}
2. |x(n-1) - x(n+k+1)| < `tol`
3. |x(n-1).index - x(n+k+1).index| < `length`
By this definition, spikes are values, that, after a jump of margin `thresh`(1),
are keeping that new value level they jumped to, for a timespan smaller than
`length` (3), and do then return to the initial value level -
within a tolerance margin of `tol` (2).
Note, that this characterization of a "spike", not only includes one-value
outliers, but also plateau-ish value courses.
The implementation is a time-window based version of an outlier test from the
UFZ Python library, that can be found here:
https://git.ufz.de/chs/python/blob/master/ufz/level1/spike.py
## Spikes_SpektrumBased
......@@ -75,6 +98,34 @@ Spikes_SpektrumBased(filter_window_size="3h", raise_factor=0.15, dev_cont_factor
```
### Description
The function detects and flags spikes in input data series by evaluating the
the timeseries' derivatives and applying some conditions to it.
NOTE, that the dataseries-to-be flagged is supposed to be harmonized to an
equadistant frequencie grid.
A datapoint x(k) of a dataseries x, is considered a spike, if:
1. The quotient to its preceeding datapoint exceeds a certain bound:
* x(k)/x(k-1) > 1 + `raise_factor`, or:
* x(k)/x(k-1) < 1 - `raise_factor`
2. The quotient of the datas second derivate x'', at the preceeding
and subsequent timestamps is close enough to 1:
* (1 - `dev_cont_factor`) < | x''(k-1)/x''(k+1) |, and
* (1 + `dev_cont_factor`) > | x''(k-1)/x''(k+1) |
3. The dataset, surrounding x(k), within `noise_window_size` range, but excluding
x(k), is not too noisy. Wheras the noisyness gets measured by
`noise_statistic`:
* 'noise_statistic'(x.index(k-'noise_window_size'),...,
x.index(k+'noise_window') < `noise_barrier`
This Function is a generalization of the Spectrum based Spike flagging
mechanism as presented in:
Dorigo,W,.... Global Automated Quality Control of In Situ Soil Moisture
Data from the international Soil Moisture Network. 2013. Vadoze Zone J.
doi:10.2136/vzj2012.0097.
## constant
### Signature
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment