Skip to content
Snippets Groups Projects
GenericFunctions.rst 25.40 KiB

Generic Functions

Generic Flagging Functions

Generic flagging functions provide for custom cross-variable quality constraints, directly implemented using the :doc:`Python API <../gettingstarted/TutorialAPI>` or the :doc:`Configuration System <../documentation/ConfigurationFiles>`.

Why?

In most real world datasets many errors can be explained by the dataset itself. Think of a an active, fan-cooled measurement device: no matter how precise the instrument may work, problems are to be expected when the fan stops working or the power supply drops below a certain threshold. While these dependencies are easy to :ref:`formalize <documentation/GenericFunctions:a real world example>` on a per dataset basis, it is quite challenging to translate them into generic source code. That is why we instrumented SaQC to cope with such situations.

Generic Flagging - Specification

Generic flagging functions are used in the same manner as their non-generic counterparts. The basic signature looks like that:

flagGeneric(field, func=<expression>, flag=<flag_constant>)

where <expression> is either a callable (Python API) or an expression composed of the supported constructs and <flag_constant> is either one of the predefined :ref:`flagging constants <modules/saqcConstants:Flagging Constants>` (default: BAD) or a valid value of the chosen flagging scheme. Generic flagging functions are expected to return a collection of boolean values, i.e. one True or False for every value in field. All other expressions will fail during runtime of SaQC.

Examples

The following sections show some contrived but realistic examples, highlighting the potential of :py:meth:`flagGeneric <saqq.SaQC.flagGeneric>`. Let's first generate a dummy dataset, to lead us through the following code snippets:

>>> qc.data  #doctest:+NORMALIZE_WHITESPACE
             x |               y |              z |
============== | =============== | ============== |
2020-01-01  12 | 2020-01-01    2 | 2020-01-01  34 |
2020-01-02  87 | 2020-01-02   12 | 2020-01-02  23 |
2020-01-03  45 | 2020-01-03   33 | 2020-01-03  89 |
2020-01-04  31 | 2020-01-04  133 | 2020-01-04  56 |
2020-01-05  18 | 2020-01-05    8 | 2020-01-05   5 |
2020-01-06  99 | 2020-01-06   33 | 2020-01-06   1 |

Simple constraints

Task: Flag all values of x where x is smaller than 30

As to be expected, the usual comparison operators are supported.

Cross variable constraints

Task: Flag all values of x where y is larger than 30

Multiple cross variable constraints

Task: Flag all values of x where y is larger than 30 and z is smaller than 50:

In order to pass multiple variables to func, we need to also specify multiple field elements. Note: to combine boolean expressions using one the available logical operators, they single expressions need to be put in parentheses.

Arithmetics

Task: Flag all values of x, that exceed the arithmetic mean of y and z

Special functions

Task: Flag all values of x, that exceed 2 standard deviations of z.

Task: Flag all values of x where y is flagged.

A real world example

Let's consider the following dataset:

>>> qc.data  #doctest:+NORMALIZE_WHITESPACE
                     meas |                     fan |                      volt |
========================= | ======================= | ========================= |
2018-06-01 12:00:00  3.56 | 2018-06-01 12:00:00   1 | 2018-06-01 12:00:00  12.1 |
2018-06-01 12:10:00  4.70 | 2018-06-01 12:10:00   0 | 2018-06-01 12:10:00  12.0 |
2018-06-01 12:20:00  0.10 | 2018-06-01 12:20:00   1 | 2018-06-01 12:20:00  11.5 |
2018-06-01 12:30:00  3.62 | 2018-06-01 12:30:00   1 | 2018-06-01 12:30:00  12.1 |

Task: Flag meas where fan equals 0 and volt is lower than 12.0.

Configuration file: There are various options. We can directly implement the condition as follows:

But we could also quality check our independent variables first and than leverage this information later on:

Generic Processing

Generic processing functions provide a way to evaluate mathematical operations and functions on the variables of a given dataset.

Why

In many real-world use cases, quality control is embedded into a larger data processing pipeline. It is not unusual to even have certain processing requirements as a part of the quality control itself. Generic processing functions make it easy to enrich a dataset through the evaluation of a given expression.

Generic Processing - Specification

The basic signature looks like that:

processGeneric(field, func=<expression>)

where <expression> is either a callable (Python API) or an expression composed of the supported constructs (Configuration File).

Example

Let's use :py:meth:`flagGeneric <saqq.SaQC.processGeneric>` to calculate the mean value of several variables in a given dataset. We start with dummy data again:

Supported constructs

Operators

Comparison Operators

The following comparison operators are available:

Operator Description
== True if the values of the operands are equal
!= True if the values of the operands are not equal
> True if the values of the left operand are greater than the values of the right operand
< True if the values of the left operand are smaller than the values of the right operand
>= True if the values of the left operand are greater or equal than the values of the right operand
<= True if the values of the left operand are smaller or equal than the values of the right operand

Logical operators

The bitwise operators act as logical operators in comparison chains

Operator Description
& binary and
| binary or
^ binary xor
~ binary complement

Arithmetic Operators

The following arithmetic operators are supported:

Operator Description
+ addition
- subtraction
* multiplication
/ division
** exponentiation
% modulus

Functions

Mathematical Functions

Name Description
abs absolute values of a variable
max maximum value of a variable
min minimum value of a variable
mean mean value of a variable
sum sum of a variable
std standard deviation of a variable
abs Pointwise absolute Value Function.
max Maximum Value Function. Ignores NaN.
min Minimum Value Function. Ignores NaN.
mean Mean Value Function. Ignores NaN.
sum Summation. Ignores NaN.
len Standart Deviation. Ignores NaN.
exp Pointwise Exponential.
log Pointwise Logarithm.
nanLog Logarithm, returning NaN for zero input, instead of -inf.
std Standart Deviation. Ignores NaN.
var Variance. Ignores NaN.
median Median. Ignores NaN.
count Count Number of values. Ignores NaNs.
id Identity.
diff Returns a Series` diff.
scale Scales data to [0,1] Interval.
zScore Standardize with Standart Deviation.
madScore Standardize with Median and MAD.
iqsScore Standardize with Median and inter quantile range.

Miscellaneous Functions

Name Description
isflagged Pointwise, checks if a value is flagged
len Returns the length of passed series