Skip to content
Snippets Groups Projects
Commit cca49d10 authored by David Schäfer's avatar David Schäfer
Browse files

Update GenericFunctions.md

parent 2aab13fd
No related branches found
No related tags found
No related merge requests found
# Generic Functions
Generic Functions provide a possibility to implement simple quality checks
directly within the configuration using a simple, Python based extension
language.
Generic Functions provide a way to leverage cross-variable conditions
and to implement simple quality checks directly within the configuration.
## Why?
The underlying idea is, that in most real world datasets many errors
can be explained by the dataset itself. Think of a an active, fan-cooled
measurement device: no matter how precise the instrument may work, problems
are to expected when the fan stop working or the battery voltage
drops below a certain threshold. While these dependencies are easy to
[formalize](#a-real-world-example) on a per dataset basis, it is quite
challenging to translate them into general purpose source code.
## Specification
Generic funtions are used in the same manner as their
Generic functions are used in the same manner as their
[non-generic counterparts](docs/FunctionDescriptions.md). The basic
signature looks like that:
```sh
flagGeneric(func=<expression>, flag=<flagging_constant>)
```
where `<expression>` is composed of the [supported constructs](#supported-constructs)
and `<flag_constant>` is either one of the predefined
[flagging constants](docs/ParameterDescriptions.md#flagging-constants) or any value supported
by the flagger in use.
and `<flag_constant>` is one of the predefined
[flagging constants](docs/ParameterDescriptions.md#flagging-constants) (default: `BAD`)
## Examples
### Simple comparisons
#### Task
Flag all values of variable `x` when variable `y` falls below a certain threashold
Flag all values of variable `x` when variable `y` falls below a certain threshold
#### Configuration file
| varname | test |
|---------|---------------------------|
| `x` | `flagGeneric(func=y < 0)` |
| varname | test |
|---------|-------------------------|
| x | flagGeneric(func=y < 0) |
### Calculations
......@@ -35,9 +44,9 @@ Flag all values of variable `x` that exceed 3 standard deviations of variable `y
#### Configuration file
| varname | test |
|---------|---------------------------------------|
| `x` | `flagGeneric(func=this > std(y) * 3)` |
| varname | test |
|---------|-------------------------------------|
| x | flagGeneric(func=this > std(y) * 3) |
### Special functions
......@@ -46,20 +55,53 @@ Flag variable `x` where variable `y` is flagged and variable `x` has missing val
#### Configuration file
| varname | test |
|---------|-------------------------------------------------------|
| `x` | `flagGeneric(func=this > isflagged(y) & ismissing(z)` |
| varname | test |
|---------|-----------------------------------------------------|
| x | flagGeneric(func=this > isflagged(y) & ismissing(z) |
### A real world example
Let's consider a dataset like the following:
| date | meas | fan | volt |
|------------------|------|-----|------|
| 2018-06-01 12:00 | 3.56 | 1 | 12.1 |
| 2018-06-01 12:10 | 4.7 | 0 | 12.0 |
| 2018-06-01 12:20 | 0.1 | 1 | 11.5 |
| 2018-06-01 12:30 | 3.62 | 1 | 12.1 |
| ... | | | |
#### Task
Flag variable `meas` where variable `fan` equals 0 and variable `volt`
is lower than `12.0`.
#### Configuration file
We can directly implement the condition as follows:
| varname | test |
|---------|----------------------------------------------|
| meas | flagGeneric(func=(fan == 0) (volt < 12.0)) |
But we could also quality check our independent variables first
and than leverage this information later on:
| varname | test |
|---------|---------------------------------------------------------|
| * | missing() |
| fan | flagGeneric(func=this == 0) |
| volt | flagGeneric(func=this < 12.0) |
| meas | flagGeneric(func=isflagged(fan) &vert; isflagged(volt)) |
## Variable References
All variables of the processed dataset are available within generic functions, so
arbitrary cross references are possible. The variable of intereset
All variables of the processed dataset are available within generic functions,
so arbitrary cross references are possible. The variable of interest
is furthermore available with the special reference `this`, so the second
[example](#calculations) could be rewritten as:
| varname | test |
|---------|------------------------------------|
| `x` | `flagGeneric(func=x > std(y) * 3)` |
| varname | test |
|---------|----------------------------------|
| x | flagGeneric(func=x > std(y) * 3) |
When referencing other variables, their flags will be respected during evaluation
of the generic expression. So, in the example above only previously
......@@ -70,7 +112,7 @@ unflagged values of `x` and `y` are used within the expression `x > std(y)*3`.
### Operators
#### Comparsions
#### Comparison
The following comparison operators are available:
| Operator | Description |
......@@ -82,15 +124,15 @@ The following comparison operators are available:
| `>=` | `True` if the values of the left operand are greater or equal than the values of the right operand |
| `<=` | `True` if the values of the left operand are smaller or equal than the values of the right operand |
#### Arithmetics
#### Arithmetic
The following arithmetic operators are supported:
| Operator | Description |
|----------|----------------|
| `+` | addition |
| `-` | substraction |
| `-` | subtraction |
| `*` | multiplication |
| `/` | division |
| `**` | exponantion |
| `**` | exponentiation |
| `%` | modulus |
#### Bitwise
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment