Skip to content
Snippets Groups Projects

Implemented QC functions

range

Signature

range(min, max)

Description

missing

Signature

missing(nodata=NaN)

Description

The Function flags those values in the the passed data series, that are associated with "missing" data. The missing data indicator (np.nan by default) , can be altered to any other value by passing this new value to the parameter nodata.

parameter data format description
nodata any Value. (Default = np.nan). Any value, that shall indicate missing data in the passed dataseries. (If value is not np.nan, evaluation will be performed by nodata == data)

sesonalRange

Signature

sesonalRange(min, max, startmonth=1, endmonth=12, startday=1, endday=31)

clear

Signature

clear()

Description

force

Signature

force()

Description

sliding_outlier

Signature

sliding_outlier(winsz="1h", dx="1h", count=1, deg=1, z=3.5, method="modZ")

Description

mad

Signature

mad(length, z=3.5, freq=None)

Description

Spikes_Basic

Signature

Spikes_Basic(thresh, tolerance, window_size)

Description

A basic outlier test, that is designed to work for harmonized, as well as raw (not-harmonized) data.

The values

xn,xn+1,....,xn+kx_{n}, x_{n+1}, .... , x_{n+k}
of a passed timeseries
xx
, are considered spikes, if:

  1. xn1xn+s>|x_{n-1} - x_{n + s}| >
    thresh,
    s{0,1,2,...,k}s \in \{0,1,2,...,k\}

  2. xn1xn+k+1<|x_{n-1} - x_{n+k+1}| <
    tolerance

  3. |y_{n-1} - y_{n+k+1}| <
    window_size, with
    y
    , denoting the series of timestamps associated with
    x
    .

By this definition, spikes are values, that, after a jump of margin thresh(1), are keeping that new value level they jumped to, for a timespan smaller than window_size (3), and do then return to the initial value level - within a tolerance margin of tolerance (2).

Note, that this characterization of a "spike", not only includes one-value outliers, but also plateau-ish value courses.

The implementation is a time-window based version of an outlier test from the UFZ Python library, that can be found here.

parameter data format description
thresh Float. Minimum jump margin for spikes. See condition (1).
tolerance Float. Range of area, containing al "valid return values". See condition (2).
window_size Offset String. An offset string, denoting the maximal length of "spikish" value courses. See condition (3).

Spikes_SpektrumBased

Signature

Spikes_SpektrumBased(raise_factor=0.15, dev_cont_factor=0.2,
                     noise_barrier=1, noise_window_size="12h", noise_statistic="CoVar",
                     smooth_poly_order=2, filter_window_size=None)

Description

The function detects and flags spikes in input data series by evaluating the the timeseries' derivatives and applying some conditions to them.

NOTE, that the dataseries-to-be flagged is supposed to be harmonized to an equadistant frequencie grid.

A datapoint

x_k
of a dataseries
x
, is considered a spike, if:

  1. The quotient to its preceeding datapoint exceeds a certain bound:
    • |\frac{x_k}{x_{k-1}}| > 1 +
      raise_factor, or:
    • |\frac{x_k}{x_{k-1}}| < 1 -
      raise_factor
  2. The quotient of the datas second derivate
    x''
    , at the preceeding and subsequent timestamps is close enough to 1:
    • |\frac{x''_{k-1}}{x''_{k+1}} | > 1 -
      dev_cont_factor, and
    • |\frac{x''_{k-1}}{x''_{k+1}} | < 1 +
      dev_cont_factor
  3. The dataset,
    X_k
    , surrounding
    x_{k}
    , within noise_window_range range, but excluding
    x_{k}
    , is not too noisy. Wheras the noisyness gets measured by noise_statistic:
    • noise_statistic
      (X_k) <
      noise_barrier

NOTE, that the derivative is calculated after applying a savitsky-golay filter to

x
.

This Function is a generalization of the Spectrum based Spike flagging mechanism as presented in:

Dorigo,W,.... Global Automated Quality Control of In Situ Soil Moisture Data from the international Soil Moisture Network. 2013. Vadoze Zone J. doi:10.2136/vzj2012.0097.

All parameters default to the values given there.

parameter data format description
raise_factor Float. (0.15). Minimum change margin for a datapoint to become a candidate for a spike. See condition (1).
dev_cont_factor Float. (0.2). See condition (2).
noise_barrier Float. (1). Upper bound for noisyness of data surrounding potential spikes. See condition (3).
noise_window_range Offset String. ("12h"). Range of the timewindow of the "surrounding" data of a potential spike. See condition (3).
noise_statistic String. ("CoVar"). Operator to calculate noisyness of data, surrounding potential spike. Either "Covar" (=Coefficient od Variation) or "rvar" (=relative Variance).
smooth_poly_order Integer. Order of the polynomial fit, applied for smoothing
filter_window_size Offset String. (None) Range of the smoothing window. The default value (='None') results in a window range, equalling 3 times the sampling rate and thus including always 3 values in a smoothing window.

constant

Signature

constant(eps, length, thmin=None)

Description

constants_varianceBased

Signature

constants_varianceBased(plateau_window_min="12h", plateau_var_limit=0.0005,
                        var_total_nans=Inf, var_consec_nans=Inf)

Description

Function flags plateaus/series of constant values. Any set of consecutive values

x_k,..., x_{k+n}
of a timeseries
x
is flagged, if:

  1. n >
    plateau_window_min
  2. \sigma(x_k,..., x_{k+n})
    < plateau_var_limit

NOTE, that the dataseries-to-be flagged is supposed to be harmonized to an equadistant frequencie grid.

parameter data format (default) description
plateau_window_min Offset String. Minimum barrier for the duration, values have to be continouos to be plateau canditaes. See condition (1).
plateau_var_limit Float. (0.0005). Barrier, the variance of a group of values must not exceed to be flagged a plateau .See condition (2).
var_total_nans Integer (np.inf) Maximum number of nan values allowed, for a calculated variance to be valid.
var_consec_nans Integer (np.inf) Maximum number of consecutive nan values allowed, for a calculated variance to be valid.

SoilMoistureSpikes

Signature

SoilMoistureSpikes(filter_window_size="3h", raise_factor=0.15, dev_cont_factor=0.2,
                   noise_barrier=1, noise_window_size="12h", noise_statistic="CoVar")

Description

The Function is just a wrapper around flagSpikes_spektrumBased, from the spike detection library and performs a call to this function with a parameter set, referring to:

Dorigo,W,.... Global Automated Quality Control of In Situ Soil Moisture Data from the international Soil Moisture Network. 2013. Vadoze Zone J. doi:10.2136/vzj2012.0097.

SoilMoistureBreaks

Signature

SoilMoistureBreaks(diff_method="raw", filter_window_size="3h",
                   rel_change_rate_min=0.1, abs_change_min=0.01, first_der_factor=10,
                   first_der_window_size="12h", scnd_der_ratio_margin_1=0.05,
                   scnd_der_ratio_margin_2=10, smooth_poly_order=2)

Description

The Function is just a wrapper around flagBreaks_spektrumBased, from the breaks detection library and performs a call to this function with a parameter set, referring to:

Dorigo,W,.... Global Automated Quality Control of In Situ Soil Moisture Data from the international Soil Moisture Network. 2013. Vadoze Zone J. doi:10.2136/vzj2012.0097.

SoilMoistureByFrost

Signature

SoilMoistureByFrost(soil_temp_reference, tolerated_deviation="1h", frost_level=0)

Description

SoilMoistureByPrecipitation

Signature

SoilMoistureByPrecipitation(prec_reference, sensor_meas_depth=0,
                            sensor_accuracy=0, soil_porosity=0,
                            std_factor=2, std_factor_range="24h")

Description

Breaks_SpektrumBased

Signature

Breaks_SpektrumBased(diff_method="raw", filter_window_size="3h",
                     rel_change_min=0.1, abs_change_min=0.01, first_der_factor=10,
                     first_der_window_size="12h", scnd_der_ratio_margin_1=0.05,
                     scnd_der_ratio_margin_2=10, smooth_poly_order=2)

Description

The function flags breaks (jumps/drops) in input measurement series by evaluating its derivatives.

NOTE, that the dataseries-to-be flagged is supposed to be harmonized to an equadistant frequencie grid.

NOTE, that the derivative is calculated after applying a savitsky-golay filter to

x
.

A value

x_k
of a data series
x
, is flagged a break, if:

  1. x_k
    represents a sufficient absolute jump in the course of data values:
    • |x_k - x_{k-1}| >
      abs_change_min
  2. x_k
    represents a sufficient relative jump in the course of data values:
    • |\frac{x_k - x_{k-1}}{x_k}| >
      rel_change_min
  3. Let
    X_k
    be the set of all values that lie within a first_der_window_range range around
    x_k
    . Then, for its arithmetic mean
    \bar{X_k}
    , following equation has to hold:
    • |x'_k| >
      first_der_factor
      \times \bar{X_k}
  4. The second derivations quatients are "sufficiently equalling 1":
    • 1 -
      scnd_der_ratio_margin_1
      < \frac{|x''_k|}{|x_{k''+1}|} < 1 +
      scnd_der_ratio_margin_1
  5. The the succeeding second derivatives values quotient has to be sufficiently high:
    • |\frac{x''_{k+1}}{x''_{k+2}'}| >
      scnd_der_ratio_margin_2

This Function is a generalization of the Spectrum based Spike flagging mechanism as presented in:

Dorigo,W,.... Global Automated Quality Control of In Situ Soil Moisture Data from the international Soil Moisture Network. 2013. Vadoze Zone J. doi:10.2136/vzj2012.0097.

All parameters default to the values given there.

parameter data format (default) description
rel_change_min cell cell
abs_change_min cell cell
first_der_factor cell cell
first_der_window_range cell cell
scnd_der_ratio_margin cell cell
cell cell cell
cell cell cell