Skip to content
Snippets Groups Projects
GenericFunctions.md 7.01 KiB
Newer Older
# Generic Functions
David Schäfer's avatar
David Schäfer committed
## Generic Flagging Functions
David Schäfer's avatar
David Schäfer committed
Generic flagging functions provide a way to leverage cross-variable quality
constraints and to implement simple quality checks directly within the configuration.

### Why?
The underlying idea is, that in most real world datasets many errors
can be explained by the dataset itself. Think of a an active, fan-cooled
measurement device: no matter how precise the instrument may work, problems
David Schäfer's avatar
David Schäfer committed
are to be expected when the fan stops working or the power supply 
drops below a certain threshold. While these dependencies are easy to 
[formalize](#a-real-world-example) on a per dataset basis, it is quite
David Schäfer's avatar
David Schäfer committed
challenging to translate them into generic source code.
David Schäfer's avatar
David Schäfer committed
### Specification
Generic flagging functions are used in the same manner as their
[non-generic counterparts](docs/FunctionIndex.md). The basic 
signature looks like that:
```sh
flagGeneric(func=<expression>, flag=<flagging_constant>)
```
where `<expression>` is composed of the [supported constructs](#supported-constructs)
and `<flag_constant>` is one of the predefined
David Schäfer's avatar
David Schäfer committed
[flagging constants](ParameterDescriptions.md#flagging-constants) (default: `BAD`)
Generic flagging functions are expected to evaluate to a boolean value, i.e. only 
constructs returning `True` or `False` are accepted. All other expressions will
fail during the runtime of `SaQC`.
David Schäfer's avatar
David Schäfer committed
### Examples
David Schäfer's avatar
David Schäfer committed
#### Simple comparisons
David Schäfer's avatar
David Schäfer committed
##### Task
Flag all values of variable `x` when variable `y` falls below a certain threshold
David Schäfer's avatar
David Schäfer committed
##### Configuration file
```
varname ; test                    
#-------;------------------------
x       ; flagGeneric(func=y < 0) 
```
David Schäfer's avatar
David Schäfer committed
#### Calculations
David Schäfer's avatar
David Schäfer committed
##### Task
Flag all values of variable `x` that exceed 3 standard deviations of variable `y`

David Schäfer's avatar
David Schäfer committed
##### Configuration file
```
varname ; test
#-------;---------------------------------
x       ; flagGeneric(func=x > std(y) * 3)
```
#### Special functions
David Schäfer's avatar
David Schäfer committed
##### Task
Flag variable `x` where variable `y` is flagged and variable `x` has missing values

David Schäfer's avatar
David Schäfer committed
##### Configuration file
```
varname ; test
#-------;----------------------------------------------
x       ; flagGeneric(func=isflagged(y) & ismissing(z))
```
David Schäfer's avatar
David Schäfer committed
#### A real world example
Let's consider a dataset like the following:

| date             | meas | fan | volt |
|------------------|------|-----|------|
| 2018-06-01 12:00 | 3.56 |   1 | 12.1 |
| 2018-06-01 12:10 |  4.7 |   0 | 12.0 |
| 2018-06-01 12:20 |  0.1 |   1 | 11.5 |
| 2018-06-01 12:30 | 3.62 |   1 | 12.1 |
| ...              |      |     |      |

David Schäfer's avatar
David Schäfer committed
##### Task
Flag variable `meas` where variable `fan` equals 0 and variable `volt`
is lower than `12.0`.

David Schäfer's avatar
David Schäfer committed
##### Configuration file
We can directly implement the condition as follows:
David Schäfer's avatar
David Schäfer committed
```
varname ; test
#-------;-----------------------------------------------
meas    ; flagGeneric(func=(fan == 0) \|  (volt < 12.0))
```
But we could also quality check our independent variables first
and than leverage this information later on:
David Schäfer's avatar
David Schäfer committed
```
varname ; test
#-------;----------------------------------------------------
'.*'    ; flagMissing()
fan     ; flagGeneric(func=fan == 0)
volt    ; flagGeneric(func=volt < 12.0)
meas    ; flagGeneric(func=isflagged(fan) \| isflagged(volt))
```

## Generic Processing

Generic processing functions provide a way to evaluate mathmetical operations 
and functions on the variables of a given dataset.
David Schäfer's avatar
David Schäfer committed
### Why
In many real-world use cases, quality control is embedded into a larger data 
processing pipeline and it is not unusual to even have certain processing 
requirements as a part of the quality control itself. Generic processing 
functions make it easy to enrich a dataset through the evaluation of a
given expression.

### Specification
The basic signature looks like that:
```sh
procGeneric(func=<expression>)
```
where `<expression>` is composed of the [supported constructs](#supported-constructs).


## Variable References
All variables of the processed dataset are available within generic functions,
so arbitrary cross references are possible. The variable of interest 
is furthermore available with the special reference `this`, so the second 
[example](#calculations) could be rewritten as: 
David Schäfer's avatar
David Schäfer committed
```
varname ; test
#-------;------------------------------------
x       ; flagGeneric(func=this > std(y) * 3)
```

When referencing other variables, their flags will be respected during evaluation
David Schäfer's avatar
David Schäfer committed
of the generic expression. So, in the example above only values of `x` and `y`, that
are not already flagged with `BAD` will be used the avaluation of `x > std(y)*3`. 


## Supported constructs

### Operators

#### Comparison

The following comparison operators are available:
| Operator | Description                                                                                        |
|----------|----------------------------------------------------------------------------------------------------|
| `==`     | `True` if the values of the operands are equal                                                     |
| `!=`     | `True` if the values of the operands are not equal                                                 |
| `>`      | `True` if the values of the left operand are greater than the values of the right operand          |
| `<`      | `True` if the values of the left operand are smaller than the values of the right operand          |
| `>=`     | `True` if the values of the left operand are greater or equal than the values of the right operand |
| `<=`     | `True` if the values of the left operand are smaller or equal than the values of the right operand |

#### Arithmetics
The following arithmetic operators are supported:
| Operator | Description    |
|----------|----------------|
| `+`      | addition       |
| `-`      | subtraction    |
| `*`      | multiplication |
| `/`      | division       |
| `**`     | exponentiation |
| `%`      | modulus        |

#### Bitwise
The bitwise operators also act as logical operators in comparison chains

| Operator | Description       |
|----------|-------------------|
| `&`      | binary and        |
Bert Palm's avatar
Bert Palm committed
| `\|`     | binary or         |
| `^`      | binary xor        |
| `~`      | binary complement |

### Functions
All functions expect a [variable reference](#variable-references)
as the only non-keyword argument (see [here](#special-functions))

David Schäfer's avatar
David Schäfer committed
#### Mathematical Functions

| Name        | Description                       |
|-------------|-----------------------------------|
| `abs`       | absolute values of a variable     |
| `max`       | maximum value of a variable       |
| `min`       | minimum value of a variable       |
| `mean`      | mean value of a variable          |
| `sum`       | sum of a variable                 |
| `std`       | standard deviation of a variable  |
| `len`       | the number of values for variable |
David Schäfer's avatar
David Schäfer committed

#### Special Functions

| Name        | Description                       |
|-------------|-----------------------------------|
| `ismissing` | check for missing values          |
| `isflagged` | check for flags                   |

### Constants
Generic functions support the same constants as normal functions, a detailed 
David Schäfer's avatar
David Schäfer committed
list is available [here](ParameterDescriptions.md#constants).