-
David Schäfer authoreda76297d2
Implemented QC functions
Main documentation of the implemented functions, their purpose and parameters and their description.
Index
- Miscellaneous
- Spike Detection
- Constant Detection
- Break Detection
- Time Series Harmonization
- Soil Moisture
- Machine Learning
Miscellaneous
range
range(min, max)
parameter | data type | default value | description |
---|---|---|---|
min | float | upper bound for valid values | |
max | float | lower bound for valid values |
The function flags all values, that exceed the closed interval [
min
, max
]
.
seasonalRange
sesonalRange(min, max, startmonth=1, endmonth=12, startday=1, endday=31)
parameter | data type | default value | description |
---|---|---|---|
min | float | upper bound for valid values | |
max | float | lower bound for valid values | |
startmonth | integer | 1 |
interval start month |
endmonth | integer | 12 |
interval end month |
startday | integer | 1 |
interval start day |
endday | integer | 31 |
interval end day |
The function does the same as range
(flags all data, that exceed the interval [
min
, max
]
),
but only, if the timestamp of the data-point lies in a time interval defined by day and month only.
The year is not used by the interval calculation.
The left interval boundary is defined by startmonth
and startday
, the right by endmonth
and endday
.
Both boundaries are inclusive.
If the left side occurs later in the year than the right side, the interval is extended over the change of year
(e.g. an interval of [01/12, 01/03], will flag values in december, january and february).
Note: Only works for datetime indexed data
isolated
isolated(window, group_size=1, continuation_range='1min')
parameter | data type | default value | description |
---|---|---|---|
window | offset string | The range, within there are no valid values allowed for a valuegroup to get flagged isolated. See condition (1) and (2). | |
group_size | integer | 1 |
The upper bound for the size of a value group to be considered an isolated group. See condition (3). |
continuation_range | offset string | "1min" |
The upper bound for the temporal extension of a value group to be considered an isolated group. See condition (4). Only relevant if group_size > 1. |
The function flags isolated values / value groups.
Isolated values are values / value groups,
that, in a range of window
,
are surrounded either by already flagged or missing values only.
The function defaults to flag isolated single values only. But the parameters allow for detections of more complex isolation definitions, including groups of isolated values.
A continuous group of timeseries values
x_{k}, x_{k+1},...,x_{k+n}
is considered to be "isolated", if:
- There are no values, preceeding
x_{k}
withinwindow
or all the preceeding values within this range are flagged - There are no values, succeeding
x_{k+n}
, withinwindow
, or all the succeeding values within this range are flagged -
n \leq
group_size
-
|y_{k} - y_{n+k}| <
continuation_range
, withy
, denoting the series of timestamps associated withx
.
missing
missing(nodata=NaN)
parameter | data type | default value | description |
---|---|---|---|
nodata | any | NAN |
Value indicating missing values in the passed data. |
The function flags those values in the the passed data series, that are
associated with "missing" data. The missing data indicator (default: NAN
), can
be altered to any other value by passing this value to the parameter nodata
.
clear
clear()
Remove all previously set flags.
force
force(flag)
parameter | data type | default value | description |
---|---|---|---|
flag | float/GOOD/BAD/UNFLAGGED | GOOD | flag to force |
Force flags to the given flag value.
Spike Detection
spikes_basic
spikes_basic(thresh, tolerance, window_size)
parameter | data type | default value | description |
---|---|---|---|
thresh | float | Minimum jump margin for spikes. See condition (1). | |
tolerance | float | Range of area, containing all "valid return values". See condition (2). | |
window_size | string | An offset string, denoting the maximum length of "spikish" value courses. See condition (3). |
A basic outlier test, that is designed to work for harmonized, as well as raw (not-harmonized) data.
The values x_{n}, x_{n+1}, .... , x_{n+k}
of a passed timeseries x
,
are considered spikes, if:
-
|x_{n-1} - x_{n + s}| >
thresh
,s \in \{0,1,2,...,k\}
-
|x_{n-1} - x_{n+k+1}| <
tolerance
-
|y_{n-1} - y_{n+k+1}| <
window_size
, withy
, denoting the series of timestamps associated withx
.
By this definition, spikes are values, that, after a jump of margin thresh
(1),
are keeping that new value level they jumped to, for a timespan smaller than
window_size
(3), and do then return to the initial value level -
within a tolerance margin of tolerance
(2).
Note, that this characterization of a "spike", not only includes one-value outliers, but also plateau-ish value courses.
spikes_simpleMad
Flag outlier by simple median absolute deviation test.
spikes_simpleMad(winsz="1h", z=3.5)
parameter | data type | default value | description |
---|---|---|---|
winsz | offset-string or int | "1h" |
size of the sliding window, where the modified Z-score is applied on |
z | float | 3.5 |
z-parameter the modified Z-score |
The modified Z-score [1] is used to detect outlier. All values are flagged as outlier, if in any slice of the sliding window, a value fulfills:
0.6745 * |x - M| > mad * z > 0
with x, M, mad, z
: window data, window median, window median absolute deviation, z
.
The window is moved by one frequency step.
Note: This function should only be applied on normalized data.
See also: [1] https://www.itl.nist.gov/div898/handbook/eda/section3/eda35h.htm