# Miscellaneous

A collection of unrelated quality check functions.

## Index

- [flagRange](#flagrange)
- [flagSeasonalRange](#flagseasonalrange)
- [flagIsolated](#flagisolated)
- [flagDTW](#flagdtw)
- [flagMissing](#flagmissing)
- [clearFlags](#clearflags)
- [forceFlags](#forceflags)



## flagRange

```
flagRange(min, max)
```
| parameter | data type | default value | description                      |
| --------- | --------- | ------------- | -----------                      |
| min       | float     |               | The upper bound for valid values |
| max       | float     |               | The lower bound for valid values |


The function flags all values outside the closed interval
$`[`$`min`, `max`$`]`$.

## flagSeasonalRange

```
flagSeasonalRange(min, max, startmonth=1, endmonth=12, startday=1, endday=31)
```

| parameter  | data type   | default value | description                      |
| ---------  | ----------- | ----          | -----------                      |
| min        | float       |               | The upper bound for valid values |
| max        | float       |               | The lower bound for valid values |
| startmonth | integer     | `1`           | The interval start month         |
| endmonth   | integer     | `12`          | The interval end month           |
| startday   | integer     | `1`           | The interval start day           |
| endday     | integer     | `31`          | The interval end day             |

The function does the same as `flagRange`, but only if the timestamp of the
data-point lies in a defined interval, which is build from days and months only. 
In particular, the *year* is not considered in the Interval. 

The left 
boundary is defined by `startmonth` and `startday`, the right boundary by `endmonth`
and `endday`. Both boundaries are inclusive. If the left side occurs later
in the year than the right side, the interval is extended over the change of
year (e.g. an interval of [01/12, 01/03], will flag values in December,
January and February).

NOTE: Only works for time-series-like datasets.


## flagIsolated

```
flagIsolated(window, gap_window, group_window) 

```

| parameter    | data type                                                     | default value | description                                                                                                                                    |
|--------------|---------------------------------------------------------------|---------------|------------------------------------------------------------------------------------------------------------------------------------------------|
| gap_window   | [offset string](docs/ParameterDescriptions.md#offset-strings) |               | The minimum size of the gap before and after a group of valid values, which makes this group considered as isolated. See condition (2) and (3) |
| group_window | [offset string](docs/ParameterDescriptions.md#offset-strings) |               | The maximum size of an isolated group of valid data. See condition (1).                                                                        |

The function flags arbitrary large groups of values, if they are surrounded by sufficiently
large data gaps. A gap is defined as group of missing and/or flagged values.

A continuous group of values
$`x_{k}, x_{k+1},...,x_{k+n}`$ with timestamps $`t_{k}, t_{k+1}, ..., t_{k+n}`$
is considered to be isolated, if:
1. $` t_{k+n} - t_{k} \le `$ `group_window`
2. None of the values $` x_i, ..., x_{k-1} `$, with $`t_{k-1} - t_{i} \ge `$ `gap_window` is valid or unflagged
3. None of the values $` x_{k+n+1}, ..., x_{j} `$, with $`t_{j} - t_{k+n+1} \ge `$ `gap_window` is valid or unflagged


## flagMissing

```
flagMissing(nodata=NaN)
```

| parameter | data type  | default value  | description                       |
| --------- | ---------- | -------------- | -----------                       |
| nodata    | any        | `NAN`          | A value that defines missing data |




## flagDTW

```                            
flagDTW(refdatafield='SM1', window = 25, min_distance = 0.25, method_dtw = "fast")
``` 


| parameter             | data type                                                     | default value | description                                                                                                                                                |
|-----------------------|---------------------------------------------------------------|---------------|------------------------------------------------------------------------------------------------------------------------------------------------------------|
| window                |  int                                                          | `25`          |The number of datapoints to be included in each comparison window.                                             |
| min_distance          | float                                                         | `0.5`         |The minimum distance of two graphs to be classified as "different".                                      |
| method_dtw            | string                                                        | `"fast"`      |Implementation of DTW algorithm - "exact" for the normal implementation of DTW, "fast" for the fast implementation.                                                           |
| ref_datafield         | string                                                        |               |Name of the reference datafield ("correct" values) with which the actual datafield is compared.                                             |


This function compares the data with a reference datafield (given in `ref_datafield`) of values we assume to be correct. The comparison is undertaken window-based, i.e. the two data fields are compared window by window, with overlapping windows. The function flags those values that lie in the middle of a window that exceeds a minimum distance value (given in `min_distance`). 

As comparison algorithm, we use the [Dynamic Time Warping (DTW) Algorithm](https://en.wikipedia.org/wiki/Dynamic_time_warping) that accounts for temporal and spacial offsets when calculating the distance. For a demonstration of the DTW, see the Wiki entry "Results for rain data set" in [Pattern Recognition with Wavelets](https://git.ufz.de/rdm-software/saqc/-/wikis/Pattern-Recognition-with-Wavelets#Results). 

The function flags all values indicating missing data.





## clearFlags

```
clearFlags()
```

The funcion removes all previously set flags.

## forceFlags

```
forceFlags(flag)
```
| parameter | data type                                                                   | default value | description                          |
| --------- | -----------                                                                 | ----          | -----------                          |
| flag      | float/[flagging constant](docs/ParameterDescriptions.md#flagging-constants) | GOOD          | The flag that is set unconditionally |

The functions overwrites all previous set flags with the given flag.


## flagDummy

```
flagDummy()
```

Identity function, i.e. the function does nothing.