Snippets Groups Projects

The internal GitLab Runner Service is temporarily degraded. Jobs might be queued longer than usual. Your own project or group runners as well as the HIFIS Runners are unaffected.

Update Miscellaneous.md · 0dc59c06
Juliane Geller authored 4 years ago

0dc59c06

Miscellaneous.md 7.04 KiB

Miscellaneous

A collection of unrelated quality check functions.

Index

flagRange
flagSeasonalRange
flagIsolated
flagDTW
flagMissing
clearFlags
forceFlags

flagRange

flagRange(min, max)

parameter	data type	default value	description
min	float		The upper bound for valid values
max	float		The lower bound for valid values

The function flags all values outside the closed interval [min, max].

flagSeasonalRange

flagSeasonalRange(min, max, startmonth=1, endmonth=12, startday=1, endday=31)

parameter	data type	default value	description
min	float		The upper bound for valid values
max	float		The lower bound for valid values
startmonth	integer	`1`	The interval start month
endmonth	integer	`12`	The interval end month
startday	integer	`1`	The interval start day
endday	integer	`31`	The interval end day

The function does the same as flagRange, but only if the timestamp of the data-point lies in a defined interval, which is build from days and months only. In particular, the year is not considered in the Interval.

The left boundary is defined by startmonth and startday, the right boundary by endmonth and endday. Both boundaries are inclusive. If the left side occurs later in the year than the right side, the interval is extended over the change of year (e.g. an interval of [01/12, 01/03], will flag values in December, January and February).

NOTE: Only works for time-series-like datasets.

flagIsolated

flagIsolated(window, gap_window, group_window)

parameter	data type	default value	description
gap_window	offset string		The minimum size of the gap before and after a group of valid values, which makes this group considered as isolated. See condition (2) and (3)
group_window	offset string		The maximum size of an isolated group of valid data. See condition (1).

The function flags arbitrary large groups of values, if they are surrounded by sufficiently large data gaps. A gap is defined as group of missing and/or flagged values.

A continuous group of values x_{k}, x_{k+1},...,x_{k+n} with timestamps t_{k}, t_{k+1}, ..., t_{k+n} is considered to be isolated, if:

t_{k+n} - t_{k} \le group_window
None of the values x_i, ..., x_{k-1}, with t_{k-1} - t_{i} \ge gap_window is valid or unflagged
None of the values x_{k+n+1}, ..., x_{j}, with t_{j} - t_{k+n+1} \ge gap_window is valid or unflagged

flagMissing

flagMissing(nodata=NaN)

parameter	data type	default value	description
nodata	any	`NAN`	A value that defines missing data

flagDTW

flagDTW(refdatafield='SM1', window = 25, min_distance = 0.25, method_dtw = "fast")

parameter	data type	default value	description
window	int	`25`	The number of datapoints to be included in each comparison window.
min_distance	float	`0.5`	The minimum distance of two graphs to be classified as "different".
method_dtw	string	`"fast"`	Implementation of DTW algorithm - "exact" for the normal implementation of DTW, "fast" for the fast implementation.
ref_datafield	string		Name of the reference datafield ("correct" values) with which the actual datafield is compared.

This function compares the data with a reference datafield (given in ref_datafield) of values we assume to be correct. The comparison is undertaken window-based, i.e. the two data fields are compared window by window, with overlapping windows. The function flags those values that lie in the middle of a window that exceeds a minimum distance value (given in min_distance).

As comparison algorithm, we use the Dynamic Time Warping (DTW) Algorithm that accounts for temporal and spacial offsets when calculating the distance. For a demonstration of the DTW, see the Wiki entry "Results for rain data set" in Pattern Recognition with Wavelets.

The function flags all values indicating missing data.

clearFlags

clearFlags()

The funcion removes all previously set flags.

forceFlags

forceFlags(flag)

parameter	data type	default value	description
flag	float/flagging constant	GOOD	The flag that is set unconditionally

The functions overwrites all previous set flags with the given flag.

flagDummy

flagDummy()

Identity function, i.e. the function does nothing.