-
David Schäfer authored289467fb
Generic Functions
Generic Flagging Functions
Generic flagging functions provide for custom cross-variable quality constraints, directly implemented using the :doc:`Python API <../gettingstarted/TutorialAPI>` or the :doc:`Configuration System <../documentation/ConfigurationFiles>`.
Why?
In most real world datasets many errors can be explained by the dataset itself. Think of a an active, fan-cooled measurement device: no matter how precise the instrument may work, problems are to be expected when the fan stops working or the power supply drops below a certain threshold. While these dependencies are easy to :ref:`formalize <documentation/GenericFunctions:a real world example>` on a per dataset basis, it is quite challenging to translate them into generic source code. That is why we instrumented SaQC to cope with such situations.
Generic Flagging - Specification
Generic flagging functions are used in the same manner as their non-generic counterparts. The basic signature looks like that:
flagGeneric(field, func=<expression>, flag=<flag_constant>)
where <expression>
is either a callable (Python API) or an expression
composed of the supported constructs
and <flag_constant>
is either one of the predefined
:ref:`flagging constants <modules/saqcConstants:Flagging Constants>`
(default: BAD
) or a valid value of the chosen flagging scheme. Generic flagging functions
are expected to return a collection of boolean values, i.e. one True
or False
for every
value in field
. All other expressions will fail during runtime of SaQC.
Examples
The following sections show some contrived but realistic examples, highlighting the potential of :py:meth:`flagGeneric <saqq.SaQC.flagGeneric>`. Let's first generate a dummy dataset, to lead us through the following code snippets:
>>> qc.data #doctest:+NORMALIZE_WHITESPACE
x | y | z |
============== | =============== | ============== |
2020-01-01 12 | 2020-01-01 2 | 2020-01-01 34 |
2020-01-02 87 | 2020-01-02 12 | 2020-01-02 23 |
2020-01-03 45 | 2020-01-03 33 | 2020-01-03 89 |
2020-01-04 31 | 2020-01-04 133 | 2020-01-04 56 |
2020-01-05 18 | 2020-01-05 8 | 2020-01-05 5 |
2020-01-06 99 | 2020-01-06 33 | 2020-01-06 1 |
Simple constraints
Task: Flag all values of x
where x
is smaller than 30
As to be expected, the usual comparison operators are supported.
Cross variable constraints
Task: Flag all values of x
where y
is larger than 30
Multiple cross variable constraints
Task: Flag all values of x
where y
is larger than 30 and z
is smaller than 50:
In order to pass multiple variables to func
, we need to also specify multiple field
elements.
Note: to combine boolean expressions using one the available logical operators, they single expressions
need to be put in parentheses.
Arithmetics
Task: Flag all values of x
, that exceed the arithmetic mean of y
and z
Special functions
Task: Flag all values of x
, that exceed 2 standard deviations of z
.
Task: Flag all values of x
where y
is flagged.
A real world example
Let's consider the following dataset:
>>> qc.data #doctest:+NORMALIZE_WHITESPACE
meas | fan | volt |
========================= | ======================= | ========================= |
2018-06-01 12:00:00 3.56 | 2018-06-01 12:00:00 1 | 2018-06-01 12:00:00 12.1 |
2018-06-01 12:10:00 4.70 | 2018-06-01 12:10:00 0 | 2018-06-01 12:10:00 12.0 |
2018-06-01 12:20:00 0.10 | 2018-06-01 12:20:00 1 | 2018-06-01 12:20:00 11.5 |
2018-06-01 12:30:00 3.62 | 2018-06-01 12:30:00 1 | 2018-06-01 12:30:00 12.1 |
Task: Flag meas
where fan
equals 0 and volt
is lower than 12.0
.
Configuration file: There are various options. We can directly implement the condition as follows:
But we could also quality check our independent variables first and than leverage this information later on:
Generic Processing
Generic processing functions provide a way to evaluate mathematical operations and functions on the variables of a given dataset.
Why
In many real-world use cases, quality control is embedded into a larger data processing pipeline. It is not unusual to even have certain processing requirements as a part of the quality control itself. Generic processing functions make it easy to enrich a dataset through the evaluation of a given expression.
Generic Processing - Specification
The basic signature looks like that:
processGeneric(field, func=<expression>)
where <expression>
is either a callable (Python API) or an expression composed of the
supported constructs (Configuration File).
Example
Let's use :py:meth:`flagGeneric <saqq.SaQC.processGeneric>` to calculate the mean value of several variables in a given dataset. We start with dummy data again:
Supported constructs
Operators
Comparison Operators
The following comparison operators are available:
Operator | Description |
---|---|
== |
True if the values of the operands are equal |
!= |
True if the values of the operands are not equal |
> |
True if the values of the left operand are greater than the values of the right operand |
< |
True if the values of the left operand are smaller than the values of the right operand |
>= |
True if the values of the left operand are greater or equal than the values of the right operand |
<= |
True if the values of the left operand are smaller or equal than the values of the right operand |
Logical operators
The bitwise operators act as logical operators in comparison chains
Operator | Description |
---|---|
& |
binary and |
| |
binary or |
^ |
binary xor |
~ |
binary complement |
Arithmetic Operators
The following arithmetic operators are supported:
Operator | Description |
---|---|
+ |
addition |
- |
subtraction |
* |
multiplication |
/ |
division |
** |
exponentiation |
% |
modulus |
Functions
Mathematical Functions
Name | Description |
---|---|
abs |
absolute values of a variable |
max |
maximum value of a variable |
min |
minimum value of a variable |
mean |
mean value of a variable |
sum |
sum of a variable |
std |
standard deviation of a variable |
abs |
Pointwise absolute Value Function. |
max |
Maximum Value Function. Ignores NaN. |
min |
Minimum Value Function. Ignores NaN. |
mean |
Mean Value Function. Ignores NaN. |
sum |
Summation. Ignores NaN. |
len |
Standart Deviation. Ignores NaN. |
exp |
Pointwise Exponential. |
log |
Pointwise Logarithm. |
nanLog |
Logarithm, returning NaN for zero input, instead of -inf. |
std |
Standart Deviation. Ignores NaN. |
var |
Variance. Ignores NaN. |
median |
Median. Ignores NaN. |
count |
Count Number of values. Ignores NaNs. |
id |
Identity. |
diff |
Returns a Series` diff. |
scale |
Scales data to [0,1] Interval. |
zScore |
Standardize with Standart Deviation. |
madScore |
Standardize with Median and MAD. |
iqsScore |
Standardize with Median and inter quantile range. |
Miscellaneous Functions
Name | Description |
---|---|
isflagged |
Pointwise, checks if a value is flagged |
len |
Returns the length of passed series |