Skip to content
Snippets Groups Projects
Lennart Schmidt's avatar
466b75e4

System for automated Quality Control (SaQC)

Quality Control of numerical data requires a significant amount of domain knowledge and practical experience. Finding a robust setup of quality tests is usually a time-consuming and iterative endeavor, even for experts.

SaQC is a Python-framework that addresses the explorative nature of quality control by offering a high degree of functionality as well as flexibility regarding its set-up and configuration. At the same time, it is easy-to-use: Most aspects of quality control, like

  • test parametrization
  • test evaluation and
  • test exploration

are easy to configure in plain text files.

Below its userinterface, though, SaQC is highly customizable and extensible. A modular structure and well-defined interfaces allow the extension with new quality check routines. Additionally, many core components, like the flagging scheme, are exchangeable.

Example config

Why?

When implementating data workflows in environmental sciences, our experience as research data managers is that there is a significant knowledge gap between the people that collect the data and those responsible for the processing and the quality-control of these datasets. While the former usually have a solid understanding of the underlying measurement principles and the errors that might result from it, the latter are mostly software developers with expertise in data processing.

The main objective of SaQC is to bridge this gap by allowing both parties to focus on their strengths: The data collector/owner should be able to express his/her ideas in an easy and succint way while the actual implementation of the data processing and quality-control is left to the respective developers.

How?

The most import aspect of SaQC, the general configuration of the system, is text-based. All the magic takes place in a semicolon-separated table file listing the variables within the dataset to inspect, quality control and/or modify.

Example config

While a good (but still growing) number of predefined and highly configurable functions are included and ready to use, SaQC additionally ships with a python based extension language. The extension language allows the user to define custom tests directly in the configuration text file. This way, cross-variable conditions can be implemented to identify erroneous measurements, e.g. "The measurement of my variable of interest is erroneous if the battery voltage of the measurement device drops below 12 volts".

For a more specific round trip to some of SaQC's possibilities, please refer to our GettingStarted.

Installation

pip

SaQC is available on the Python Package Index (PyPI) and can be installed using pip:

python -m pip install saqc

Manual installation

The latest development version is directly available from the gitlab server of the Helmholtz Center for Environmental Research. All the dependenencies are listed here and are easily resolvable with:

python -m pip install -r requirements.txt

Usage

Command line interface (CLI)

SaQC provides a basic CLI to get you started. As soon as tha basic inputs, a dataset and the configuration file are prepared, running SaQC is as simple as:

python -m saqc \
    --config path_to_configuration.txt \
    --data path_to_data.csv \
    --outfile path_to_output.csv

Integration into larger workfows

The main function is exposed and can be used in within your own programs.

License

Copyright(c) 2019, Helmholtz-Zentrum fuer Umweltforschung GmbH - UFZ. All rights reserved.

The "System for Automated Quality Control" is free software. You can redistribute it and/or modify it under the terms of the GNU General Public License as published by the free Software Foundation either version 3 of the License, or (at your option) any later version. See the license for detaily.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.