-
Juliane Geller authored12e018c8
Generic Functions
Generic Flagging Functions
Generic flagging functions provide for cross-variable quality constraints and to implement simple quality checks directly within the configuration.
Why?
In most real world datasets many errors can be explained by the dataset itself. Think of a an active, fan-cooled measurement device: no matter how precise the instrument may work, problems are to be expected when the fan stops working or the power supply drops below a certain threshold. While these dependencies are easy to formalize on a per dataset basis, it is quite challenging to translate them into generic source code.
Specification
Generic flagging functions are used in the same manner as their non-generic counterparts. The basic signature looks like that:
flagGeneric(func=<expression>, flag=<flagging_constant>)
where <expression>
is composed of the supported constructs
and <flag_constant>
is one of the predefined
flagging constants (default: BAD
).
Generic flagging functions are expected to return a boolean value, i.e. True
or False
. All other expressions will
fail during the runtime of SaQC
.
Examples
Simple comparisons
Task
Flag all values of x
where y
falls below 0.
Configuration file
varname ; test
#-------;------------------------
x ; flagGeneric(func=y < 0)
Calculations
Task
Flag all values of x
that exceed 3 standard deviations of y
.
Configuration file
varname ; test
#-------;---------------------------------
x ; flagGeneric(func=x > std(y) * 3)
Special functions
Task
Flag all values of x
where: y
is flagged and z
has missing values.
Configuration file
varname ; test
#-------;----------------------------------------------
x ; flagGeneric(func=isflagged(y) & ismissing(z))
A real world example
Let's consider the following dataset:
date | meas | fan | volt |
---|---|---|---|
2018-06-01 12:00 | 3.56 | 1 | 12.1 |
2018-06-01 12:10 | 4.7 | 0 | 12.0 |
2018-06-01 12:20 | 0.1 | 1 | 11.5 |
2018-06-01 12:30 | 3.62 | 1 | 12.1 |
... |
Task
Flag meas
where fan
equals 0 and volt
is lower than 12.0
.
Configuration file
There are various options. We can directly implement the condition as follows:
varname ; test
#-------;-----------------------------------------------
meas ; flagGeneric(func=(fan == 0) \| (volt < 12.0))
But we could also quality check our independent variables first and than leverage this information later on:
varname ; test
#-------;----------------------------------------------------
'.*' ; flagMissing()
fan ; flagGeneric(func=fan == 0)
volt ; flagGeneric(func=volt < 12.0)
meas ; flagGeneric(func=isflagged(fan) \| isflagged(volt))
Generic Processing
Generic processing functions provide a way to evaluate mathmetical operations and functions on the variables of a given dataset.
Why
In many real-world use cases, quality control is embedded into a larger data processing pipeline and it is not unusual to even have certain processing requirements as a part of the quality control itself. Generic processing functions make it easy to enrich a dataset through the evaluation of a given expression.
Specification
The basic signature looks like that:
procGeneric(func=<expression>)
where <expression>
is composed of the supported constructs.
Variable References
All variables of the processed dataset are available within generic functions,
so arbitrary cross references are possible. The variable of interest
is furthermore available with the special reference this
, so the second
example could be rewritten as:
varname ; test
#-------;------------------------------------
x ; flagGeneric(func=this > std(y) * 3)
When referencing other variables, their flags will be respected during evaluation
of the generic expression. So, in the example above only values of x
and y
, that
are not already flagged with BAD
will be used the avaluation of x > std(y)*3
.