Skip to content
Snippets Groups Projects

Miscellaneous

A collection of unrelated quality check functions.

Index

flagRange

flagRange(min, max)
parameter data type default value description
min float The upper bound for valid values
max float The lower bound for valid values

The function flags all values outside the closed interval [min, max].

flagSeasonalRange

flagSeasonalRange(min, max, startmonth=1, endmonth=12, startday=1, endday=31)
parameter data type default value description
min float The upper bound for valid values
max float The lower bound for valid values
startmonth integer 1 The interval start month
endmonth integer 12 The interval end month
startday integer 1 The interval start day
endday integer 31 The interval end day

The function does the same as flagRange, but only if the timestamp of the data-point lies in a defined interval, which is build from days and months only. In particular, the year is not considered in the Interval.

The left boundary is defined by startmonth and startday, the right boundary by endmonth and endday. Both boundaries are inclusive. If the left side occurs later in the year than the right side, the interval is extended over the change of year (e.g. an interval of [01/12, 01/03], will flag values in December, January and February).

NOTE: Only works for time-series-like datasets.

flagIsolated

flagIsolated(window, gap_window, group_window) 
parameter data type default value description
gap_window offset string The minimum size of the gap before and after a group of valid values, which makes this group considered as isolated. See condition (2) and (3)
group_window offset string The maximum size of an isolated group of valid data. See condition (1).

The function flags arbitrary large groups of values, if they are surrounded by sufficiently large data gaps. A gap is defined as group of missing and/or flagged values.

A continuous group of values x_{k}, x_{k+1},...,x_{k+n} with timestamps t_{k}, t_{k+1}, ..., t_{k+n} is considered to be isolated, if:

  1. t_{k+n} - t_{k} \le group_window
  2. None of the values x_i, ..., x_{k-1}, with t_{k-1} - t_{i} \ge gap_window is valid or unflagged
  3. None of the values x_{k+n+1}, ..., x_{j}, with t_{j} - t_{k+n+1} \ge gap_window is valid or unflagged

flagMissing

flagMissing(nodata=NaN)
parameter data type default value description
nodata any NAN A value that defines missing data

flagDTW

flagDTW(refdatafield='SM1', window = 25, min_distance = 0.25, method_dtw = "fast")
parameter data type default value description
window int 25 The number of datapoints to be included in each comparison window.
min_distance float 0.5 The minimum distance of two graphs to be classified as "different".
method_dtw string "fast" Implementation of DTW algorithm - "exact" for the normal implementation of DTW, "fast" for the fast implementation.
ref_datafield string Name of the reference datafield ("correct" values) with which the actual datafield is compared.

This function compares the data with a reference datafield (given in ref_datafield) of values we assume to be correct. The comparison is undertaken window-based, i.e. the two data fields are compared window by window, with overlapping windows. The function flags those values that lie in the middle of a window that exceeds a minimum distance value (given in min_distance).

As comparison algorithm, we use the Dynamic Time Warping (DTW) Algorithm that accounts for temporal and spacial offsets when calculating the distance. For a demonstration of the DTW, see the Wiki entry "Results for rain data set" in Pattern Recognition with Wavelets.

The function flags all values indicating missing data.

clearFlags

clearFlags()

The funcion removes all previously set flags.

forceFlags

forceFlags(flag)
parameter data type default value description
flag float/flagging constant GOOD The flag that is set unconditionally

The functions overwrites all previous set flags with the given flag.

flagDummy

flagDummy()

Identity function, i.e. the function does nothing.