-
Bert Palm authoredceb3cea9
Generic Functions
Generic Functions provide a way to leverage cross-variable conditions and to implement simple quality checks directly within the configuration.
Why?
The underlying idea is, that in most real world datasets many errors can be explained by the dataset itself. Think of a an active, fan-cooled measurement device: no matter how precise the instrument may work, problems are to expected when the fan stop working or the battery voltage drops below a certain threshold. While these dependencies are easy to formalize on a per dataset basis, it is quite challenging to translate them into general purpose source code.
Specification
Generic functions are used in the same manner as their non-generic counterparts. The basic signature looks like that:
flagGeneric(func=<expression>, flag=<flagging_constant>)
where <expression>
is composed of the supported constructs
and <flag_constant>
is one of the predefined
flagging constants (default: BAD
)
Examples
Simple comparisons
Task
Flag all values of variable x
when variable y
falls below a certain threshold
Configuration file
varname | test |
---|---|
x | flagGeneric(func=y < 0) |
Calculations
Task
Flag all values of variable x
that exceed 3 standard deviations of variable y
Configuration file
varname | test |
---|---|
x | flagGeneric(func=this > std(y) * 3) |
Special functions
Task
Flag variable x
where variable y
is flagged and variable x
has missing values
Configuration file
varname | test |
---|---|
x | flagGeneric(func=isflagged(y) & ismissing(z)) |
A real world example
Let's consider a dataset like the following:
date | meas | fan | volt |
---|---|---|---|
2018-06-01 12:00 | 3.56 | 1 | 12.1 |
2018-06-01 12:10 | 4.7 | 0 | 12.0 |
2018-06-01 12:20 | 0.1 | 1 | 11.5 |
2018-06-01 12:30 | 3.62 | 1 | 12.1 |
... |
Task
Flag variable meas
where variable fan
equals 0 and variable volt
is lower than 12.0
.
Configuration file
We can directly implement the condition as follows:
varname | test |
---|---|
meas | flagGeneric(func=(fan == 0) | (volt < 12.0)) |
But we could also quality check our independent variables first and than leverage this information later on:
varname | test |
---|---|
* | missing() |
fan | flagGeneric(func=this == 0) |
volt | flagGeneric(func=this < 12.0) |
meas | flagGeneric(func=isflagged(fan) | isflagged(volt)) |
Variable References
All variables of the processed dataset are available within generic functions,
so arbitrary cross references are possible. The variable of interest
is furthermore available with the special reference this
, so the second
example could be rewritten as:
varname | test |
---|---|
x | flagGeneric(func=x > std(y) * 3) |
When referencing other variables, their flags will be respected during evaluation
of the generic expression. So, in the example above only previously
unflagged values of x
and y
are used within the expression x > std(y)*3
.