diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index d10ea5b19bab0fe4d1a0adcda964defdc3162f1f..90a3e1f9826b61dc44c39346dafa71898b19dec4 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -1,19 +1,6 @@ # Development Environment -We recommend an virtual python environment for development. The setup process consists of the follwing simply steps: - -1. Create a fresh environment with: - ```sh - python -m venv saqc_dev - ``` -2. Activate the created environment - ``` - source saqc_dev/bin/activate - ``` -3. Install the dependencies - ```sh - python -m pip install -r requirements.txt - ``` - +We recommend a virtual python environment for development. The setup process is described in detail in our [GettingStarted](docs/GettingStarted.md). + # Testing SaQC comes with an extensive test suite based on [pytest](https://docs.pytest.org/en/latest/). In order to run all tests execute: @@ -26,7 +13,7 @@ python -m pytest . ## Naming ### Code -We follow the follwing naming conventions +We follow the follwing naming conventions: - Classes: CamelCase - Functions: camelCase - Variables/Arguments: snake_case @@ -36,9 +23,9 @@ We follow the follwing naming conventions ## Formatting We use (black)[https://black.readthedocs.io/en/stable/] with a line length if 120 characters. -Within the `SaQC` root directory run `black -l 120` +Within the `SaQC` root directory run `black -l 120`. ## Imports -Only absolute imports are accepted +Only absolute imports are accepted. diff --git a/README.md b/README.md index 562230bde710142252d75c2fa374c1ee9e4aa7a8..96f878eb5da3dfd5d3e1aac58a6e03376cc07fce 100644 --- a/README.md +++ b/README.md @@ -3,7 +3,7 @@ Quality Control of numerical data requires a significant amount of domain knowledge and practical experience. Finding a robust setup of quality tests that identifies as many suspicious values as possible, without -removing valid data, is usually a time-consuming and iterative endeavor, +removing valid data, is usually a time-consuming endeavor, even for experts. SaQC is both, a Python framework and a command line application, that @@ -13,8 +13,8 @@ and simple configuration system. Below its user interface, SaQC is highly customizable and extensible. A modular structure and well-defined interfaces make it easy to extend -the system with custom quality checks and even core components, like -the flagging scheme, are exchangeable. +the system with custom quality checks. Furthermore, even core components like +the flagging scheme are exchangeable.  @@ -30,13 +30,13 @@ data processing. The main objective of SaQC is to bridge this gap by allowing both parties to focus on their strengths: The data collector/owner should be -able to express his/her ideas in an easy and succinct way, while the actual +able to express his/her ideas in an easy way, while the actual implementation of the algorithms is left to the respective developers. ## How? -`SaQC` is both a command line application controlled by text based and a python +`SaQC` is both a command line application controlled by a text based configuration file and a python module with a simple API. While a good (but still growing) number of predefined and highly configurable @@ -45,13 +45,13 @@ additionally ships with a python based [extension language](docs/GenericFunctions.md) for quality and general purpose data processing. -For a more specific round trip to some of SaQC's possibilities, please refer to +For a more specific round trip to some of SaQC's possibilities, we refer to our [GettingStarted](docs/GettingStarted.md). ### SaQC as a command line application Most of the magic is controlled by a -[semicolon-separated table file](saqc/docs/ConfigurationFiles.md) listing the variables of the +[semicolon-separated text file](saqc/docs/ConfigurationFiles.md) listing the variables of the dataset and the routines to inspect, quality control and/or process them. The content of such a configuration could look like this: @@ -98,6 +98,7 @@ can be installed using [pip](https://pip.pypa.io/en/stable/): ```sh python -m pip install saqc ``` +For a more detailed installion guide, see [GettingStarted](docs/GettingStarted.md). ### Anaconda Currently we don't provide pre-build conda packages but the installing of `SaQC` @@ -117,11 +118,11 @@ straightforward: The latest development version is directly available from the [gitlab](https://git.ufz.de/rdm-software/saqc) server of the [Helmholtz Center for Environmental Research](https://www.ufz.de/index.php?en=33573). -More details on how to setup an respective environment are available -[here](CONTRIBUTING.md#development-environment) +More details on how to install using the gitlab server are available +[here](docs/GettingStarted.md). ### Python version -The minimum Python version required is 3.6. +The minimum Python version required is 3.6.1. ## Usage diff --git a/docs/ConfigurationFiles.md b/docs/ConfigurationFiles.md index aa60017d3411c1b3e15ffe069cf991432348e2d3..b660206945cf20993f7c0d8ad9b6dc3a19fc8122 100644 --- a/docs/ConfigurationFiles.md +++ b/docs/ConfigurationFiles.md @@ -4,8 +4,7 @@ The behaviour of SaQC is completely controlled by a text based configuration fil ## Format SaQC expects its configuration files to be semicolon-separated text files with a fixed header. Each row of the configuration file lists -one variable and one or several test functions, which will be evaluated to -procduce a result for the given variable. +one variable and one or several test functions that are applied on the given variable. ### Header names @@ -27,15 +26,15 @@ many other programming languages and looks like this: flagRange(min=0, max=100) ``` Here the function `flagRange` is called and the values `0` and `100` are passed -to the parameters `min` and `max` respectively. As we (currently) value readablity +to the parameters `min` and `max` respectively. As we value readablity of the configuration more than conciseness of the extrension language, only keyword arguments are supported. That means that the notation `flagRange(0, 100)` is not a valid replacement for the above example. ## Examples ### Single Test -Every row lists one test per variable, if you want to call multiple tests on -a specific variable (and you probably want to), list them in separate rows +Every row lists one test per variable. If you want to call multiple tests on +a specific variable (and you probably want to), list them in separate rows: ``` varname | test #-------|---------------------------------- @@ -47,7 +46,7 @@ y | flagRange(min=-10, max=40) ### Multiple Tests A row lists multiple tests for a specific variable in separate columns. All test -columns need to share the common prefix `test`. +columns need to share the common prefix `test`: ``` varname ; test_1 ; test_2 ; test_3 @@ -70,11 +69,9 @@ x ; constants_flagBasic(window="3h") ### Plotting As the process of finding a good quality check setup is somewhat experimental, SaQC -provides a possibility to plot the results of the test functions. In -order to opt-into this feture add the optional columns `plot` and set it -to `True` whenever you want to see the result of the evaluation. These plots are -meant to provide a quick and easy visual evaluation of the test setup and not to -yield 'publication-ready' results +provides a possibility to plot the results of the test functions. To use this feature, add the optional column `plot` and set it +to `True` for all results you want to plot. These plots are +meant to provide a quick and easy visual evaluation of the test. ``` varname ; test ; plot #-------;----------------------------------;----- @@ -84,13 +81,12 @@ x ; constants_flagBasic(window="3h") ; True y ; flagRange(min=-10, max=40)` ; ``` -### Regular Expressions -Some of the most basic tests (e.g. checks for missing values or range tests) but -also the more elaborated functions available (e.g. aggregation or interpolation +### Regular Expressions in `varname` column +Some of the tests (e.g. checks for missing values, range tests or interpolation functions) are very likely to be used on all or at least several variables of the processed dataset. As it becomes quite cumbersome to list all these variables seperately, only to call the same functions with the same -parameters over and over again, SaQC supports regular expressions +parameters, SaQC supports regular expressions within the `varname` column. Please not that a `varname` needs to be quoted (with `'` or `"`) in order to be interpreted as a regular expression. diff --git a/docs/Customizations.md b/docs/Customizations.md index b5438944ce2181f0cc4b17733281ac1a96ed74af..af438fcf6dc622f0a3aacfa679fc8ab7712de83c 100644 --- a/docs/Customizations.md +++ b/docs/Customizations.md @@ -2,28 +2,26 @@ SaQC comes with a continuously growing number of pre-implemented [quality check and processing routines](docs/FunctionIndex.md) and flagging schemes. -For any sufficiently large use case however, chances are high, that the +For any sufficiently large use case however it is very likely that the functions provided won't fulfill all your needs and requirements. -Acknowledging our insufficiency to address all (im-)possible use cases, we -designed the system in a way, that makes it's extension and customization as -simple as possible. The main extensions options, namely +Acknowledging the impossibility to address all imaginable use cases, we +designed the system to allow for extensions and costumizations. The main extensions options, namely [quality check routines](#custom-quality-check-routines) and the [flagging scheme](#custom-flagging-schemes) are described within this documents. ## Custom quality check routines In case you are missing quality check routines, you are of course very -welcome to file an feature request issue on the project's +welcome to file a feature request issue on the project's [gitlab repository](https://git.ufz.de/rdm-software/saqc). However, if -you are more the no-biggie-I-get-this-done-by-myself type of person, +you are more the "no-way-I-get-this-done-by-myself" type of person, SaQC provides two ways to integrate custom routines into the system: 1. The [extension language](docs/GenericFunctions.md) 2. An [interface](#interface) to the evaluation machinery ### Interface -In order to make a function usable within the evaluation framework of SaQC it needs -to implement the following interface: +In order to make a function usable within the evaluation framework of SaQC the following interface is needed: ```python def yourTestFunction( @@ -39,17 +37,16 @@ def yourTestFunction( | Name | Description | |-----------|--------------------------------------------------------------------------------------------------| -| `data` | The actual dataset | -| `field` | The field/column within `data`, the function is checking/processing | -| `flagger` | A instance of a flagger, responsible for the translation of test results into quality attributes | -| `args` | Any other arguments needed to parameterize the function | -| `kwargs` | Any other keyword arguments needed to parameterize the function | +| `data` | The actual dataset. | +| `field` | The field/column within `data`, that function is processing. | +| `flagger` | An instance of a flagger, responsible for the translation of test results into quality attributes. | +| `args` | Any other arguments needed to parameterize the function. | +| `kwargs` | Any other keyword arguments needed to parameterize the function. | ### Integrate into SaQC In order make your function available to the system it needs to be registered. We provide the decorator -[`register`](saqc/functions/register.py) in the module `saqc.functions.register`, to integrate your -test functions into SaQC. A complete, yet useless example might -look like that: +[`register`](saqc/functions/register.py) in the module `saqc.functions.register` to integrate your +test functions into SaQC. Here is a complete dummy example: ```python from saqc.functions.register import register @@ -61,8 +58,7 @@ def yourTestFunction(data, field, flagger, *args, **kwargs): ### Example The function [`flagRange`](saqc/funcs/functions.py) provides a simple, yet complete implementation of -a quality check routine. You might want to look into its implementation before you start writing your -own. +a quality check routine. You might want to look into its implementation as a reference for your own. ## Custom flagging schemes diff --git a/docs/FlaggingSchemes.md b/docs/FlaggingSchemes.md index e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..edca5253ca8747048e2c4f3738dc9da522726222 100644 --- a/docs/FlaggingSchemes.md +++ b/docs/FlaggingSchemes.md @@ -0,0 +1,10 @@ +# DMP flagging scheme + +## Possible flags + +The DMP scheme produces the following flag constants: + +* "ok" +* "doubtfull" +* "bad" + diff --git a/docs/GenericFunctions.md b/docs/GenericFunctions.md index ba3bfc3fc5fc577734778abc00bd4bd1a19faeb8..9f91d4fd53385f2b7f7f795ad3b83c5ee21085bc 100644 --- a/docs/GenericFunctions.md +++ b/docs/GenericFunctions.md @@ -2,11 +2,11 @@ ## Generic Flagging Functions -Generic flagging functions provide a way to leverage cross-variable quality +Generic flagging functions provide for cross-variable quality constraints and to implement simple quality checks directly within the configuration. ### Why? -The underlying idea is, that in most real world datasets many errors +In most real world datasets many errors can be explained by the dataset itself. Think of a an active, fan-cooled measurement device: no matter how precise the instrument may work, problems are to be expected when the fan stops working or the power supply @@ -23,9 +23,8 @@ flagGeneric(func=<expression>, flag=<flagging_constant>) ``` where `<expression>` is composed of the [supported constructs](#supported-constructs) and `<flag_constant>` is one of the predefined -[flagging constants](ParameterDescriptions.md#flagging-constants) (default: `BAD`) -Generic flagging functions are expected to evaluate to a boolean value, i.e. only -constructs returning `True` or `False` are accepted. All other expressions will +[flagging constants](ParameterDescriptions.md#flagging-constants) (default: `BAD`). +Generic flagging functions are expected to return a boolean value, i.e. `True` or `False`. All other expressions will fail during the runtime of `SaQC`. @@ -34,7 +33,7 @@ fail during the runtime of `SaQC`. #### Simple comparisons ##### Task -Flag all values of variable `x` when variable `y` falls below a certain threshold +Flag all values of `x` where `y` falls below 0. ##### Configuration file ``` @@ -46,7 +45,7 @@ x ; flagGeneric(func=y < 0) #### Calculations ##### Task -Flag all values of variable `x` that exceed 3 standard deviations of variable `y` +Flag all values of `x` that exceed 3 standard deviations of `y`. ##### Configuration file ``` @@ -57,7 +56,7 @@ x ; flagGeneric(func=x > std(y) * 3) #### Special functions ##### Task -Flag variable `x` where variable `y` is flagged and variable `x` has missing values +Flag all values of `x` where: `y` is flagged and `z` has missing values. ##### Configuration file ``` @@ -67,7 +66,7 @@ x ; flagGeneric(func=isflagged(y) & ismissing(z)) ``` #### A real world example -Let's consider a dataset like the following: +Let's consider the following dataset: | date | meas | fan | volt | |------------------|------|-----|------| @@ -78,11 +77,11 @@ Let's consider a dataset like the following: | ... | | | | ##### Task -Flag variable `meas` where variable `fan` equals 0 and variable `volt` +Flag `meas` where `fan` equals 0 and `volt` is lower than `12.0`. ##### Configuration file -We can directly implement the condition as follows: +There are various options. We can directly implement the condition as follows: ``` varname ; test #-------;----------------------------------------------- diff --git a/docs/GettingStarted.md b/docs/GettingStarted.md index 272a8875affcb2b6fcd77c53950c25171ed2b585..d2b6c90afe0af08bbbf435068792faeee668eab1 100644 --- a/docs/GettingStarted.md +++ b/docs/GettingStarted.md @@ -1,7 +1,7 @@ # Getting started with SaQC -This "getting started" assumes that you have Python version 3.6 or 3.7 -installed. +Requirements: this tutorial assumes that you have Python version 3.6.1 or newer +installed, and that both your operating system and Python version are in 64-bit. ## Contents @@ -25,57 +25,69 @@ for your needs is using the Python Package Index (PyPI). Following good Python practice, you will first want to create a new virtual environment that you install SaQC into by typing the following in your console: -### On Unix/Mac-systems + ```sh # if you have not installed venv yet, do so: -python3 -m pip install --user virtualenv +python -m pip install --user virtualenv # move to the directory where you want to create your virtual environment cd YOURDIR # create virtual environment called "env_saqc" -python3 -m venv env_saqc - +python -m venv env_saqc + +``` +To activate your virtual environment, you need to type the following: + + +##### On Unix/Mac-systems + +```sh # activate the virtual environment source env_saqc/bin/activate ``` -### On Windows +##### On Windows ```sh -# if you have not installed venv yet, do so: -pip install virtualenv - -# move to the directory where you want to create your virtual environment -cd YOURDIR - -# create virtual environment called "env_saqc" -virtualenv env_saqc - # move to the Scripts directory in "env_saqc" cd env_saqc/Scripts -# activate the environment +# activate the virtual environment ./activate ``` ## 2. Get SaQC -Now get saqc via PyPI as well: +### Via PyPI + +Type ```sh python -m pip install saqc ``` -or download it directly from the [GitLab-repository](https://git.ufz.de/rdm/saqc). -Get all required packages: +### From Gitlab repository + +Download SaQC directly from the [GitLab-repository](https://git.ufz.de/rdm/saqc) to make sure you use the most recent version: ```sh +# clone gitlab - repository +git clone https://git.ufz.de/rdm-software/saqc + +# switch to the folder where you installed saqc +cd saqc + +# install all required packages pip install -r requirements.txt + +# install all required submodules +git submodule update --init --recursive ``` + ## 3. Training tour The following passage guides you through the essentials of the usage of SaQC via @@ -122,10 +134,15 @@ flags that are set during one test are always passed on to the subsequent one. ### Run SaQC Remember to have your virtual environment activated: + +##### On Unix/Mac-systems + ```sh source env_saqc/bin/activate ``` -or respectively on Windows: + +##### On Windows + ```sh cd env_saqc/Scripts ./activate @@ -140,10 +157,11 @@ cd saqc From here, you can run saqc and tell it to run the tests from the toy config-file on the toy dataset via the `-c` and `-d` options: ```sh -saqc -c ressources/data/myconfig.csv -d ressources/data/data.csv +python -m saqc -c ressources/data/myconfig.csv -d ressources/data/data.csv ``` +If you installed saqc via PYPi, you can omit "python -m". -Which will output this plot: +The command will output this plot:  diff --git a/docs/ParameterDescriptions.md b/docs/ParameterDescriptions.md index 15eccf64ac0609267f0fcefe8e3973f472112c59..8fcfa0511100177240701bb9338174bf4dfde27a 100644 --- a/docs/ParameterDescriptions.md +++ b/docs/ParameterDescriptions.md @@ -25,8 +25,8 @@ The following flag constants are available and can be used to mark the quality o | `BAD` | At least on test failed on the values and is therefore considered to be invalid | | `UNFLAGGED` | The value has not got a flag yet. This might mean, that all tests passed or that no tests ran | -How these aliases will be translated into 'real' flags (output of SaQC) dependes on the flagger implementation -and might range from numerical values to string concstants +How these aliases will be translated into 'real' flags (output of SaQC) dependes on the [flagging scheme](FlaggingSchemes.md) +and might range from numerical values to string constants. ### Numerical Constants | Alias | Description | diff --git a/requirements.txt b/requirements.txt index 08c5d471a58678d86c92d77fb0e13e40310c94d6..5a8333fca668a0c43377f5ae0467bd4140c10209 100644 --- a/requirements.txt +++ b/requirements.txt @@ -2,12 +2,12 @@ attrs==19.3.0 Click==7.1.2 cycler==0.10.0 dtw==1.4.0 +kiwisolver==1.2.0 importlib-metadata==1.7.0 joblib==0.16.0 -kiwisolver==1.1.0 llvmlite==0.31.0 +mlxtend==0.17.3 matplotlib==3.3.0 -mlxtend==0.17.2 more-itertools==8.4.0 numba==0.48.0 numpy==1.19.1 @@ -17,18 +17,18 @@ outlier-utils==0.0.3 packaging==20.4 pandas==1.0.1 pluggy==0.13.1 +pyparsing==2.4.7 py==1.9.0 pyarrow==1.0.0 -pyparsing==2.4.6 pytest-lazy-fixture==0.6.3 pytest==6.0.1 python-dateutil==2.8.1 -python-intervals==1.10.0 +python-intervals==1.10.0.post1 pytz==2020.1 PyWavelets==1.1.1 +zipp==3.1.0 +wcwidth==0.2.5 scipy==1.5.2 -scikit-learn==0.23.1 +scikit-learn==0.23.2 six==1.15.0 -wcwidth==0.1.8 -zipp==2.2.0 astor==0.8.1 diff --git a/saqc/core/register.py b/saqc/core/register.py index e5163c3b6ba24f13ab6d8a4ed69e432b7d4137c8..2863f182556a0a0ca7cd2e0a20ae8c2f73e173b6 100644 --- a/saqc/core/register.py +++ b/saqc/core/register.py @@ -179,8 +179,8 @@ class SaQCFunc(Func): def _unmaskData(self, data_old, flagger_old, data_new, flagger_new): to_mask = flagger_old.BAD if self.to_mask is None else self.to_mask - mask_old = flagger_old.isFlagged(flag=to_mask) - mask_new = flagger_new.isFlagged(flag=to_mask) + mask_old = flagger_old.isFlagged(flag=to_mask, comparator="==") + mask_new = flagger_new.isFlagged(flag=to_mask, comparator="==") for col, left in data_new.indexes.iteritems(): if col not in mask_old: diff --git a/saqc/funcs/harm_functions.py b/saqc/funcs/harm_functions.py index 37f745c2a92011ea9be02f86dc82d5326fbeffec..974ed3fc5fc54c959e0e008137fb5dd097b9ffb2 100644 --- a/saqc/funcs/harm_functions.py +++ b/saqc/funcs/harm_functions.py @@ -20,18 +20,18 @@ from saqc.funcs.proc_functions import ( logger = logging.getLogger("SaQC") @register -def harm_shift2Grid(data, field, flagger, freq, method="nshift", drop_flags=None, **kwargs): +def harm_shift2Grid(data, field, flagger, freq, method="nshift", to_drop=None, **kwargs): data, flagger = proc_fork(data, field, flagger) data, flagger = proc_shift( - data, field, flagger, freq, method, drop_flags=drop_flags, empty_intervals_flag=flagger.UNFLAGGED, **kwargs + data, field, flagger, freq, method, to_drop=to_drop, empty_intervals_flag=flagger.UNFLAGGED, **kwargs ) return data, flagger @register def harm_aggregate2Grid( - data, field, flagger, freq, value_func, flag_func=np.nanmax, method="nagg", drop_flags=None, **kwargs + data, field, flagger, freq, value_func, flag_func=np.nanmax, method="nagg", to_drop=None, **kwargs ): data, flagger = proc_fork(data, field, flagger) @@ -44,7 +44,7 @@ def harm_aggregate2Grid( flag_agg_func=flag_func, method=method, empty_intervals_flag=flagger.UNFLAGGED, - drop_flags=drop_flags, + to_drop=to_drop, all_na_2_empty=True, **kwargs, ) @@ -52,17 +52,17 @@ def harm_aggregate2Grid( @register -def harm_linear2Grid(data, field, flagger, freq, drop_flags=None, **kwargs): +def harm_linear2Grid(data, field, flagger, freq, to_drop=None, **kwargs): data, flagger = proc_fork(data, field, flagger) data, flagger = proc_interpolateGrid( - data, field, flagger, freq, "time", drop_flags=drop_flags, empty_intervals_flag=flagger.UNFLAGGED, **kwargs + data, field, flagger, freq, "time", to_drop=to_drop, empty_intervals_flag=flagger.UNFLAGGED, **kwargs ) return data, flagger @register def harm_interpolate2Grid( - data, field, flagger, freq, method, order=1, drop_flags=None, **kwargs, + data, field, flagger, freq, method, order=1, to_drop=None, **kwargs, ): data, flagger = proc_fork(data, field, flagger) data, flagger = proc_interpolateGrid( @@ -72,7 +72,7 @@ def harm_interpolate2Grid( freq, method=method, inter_order=order, - drop_flags=drop_flags, + to_drop=to_drop, empty_intervals_flag=flagger.UNFLAGGED, **kwargs, ) @@ -80,10 +80,10 @@ def harm_interpolate2Grid( @register -def harm_deharmonize(data, field, flagger, method, drop_flags=None, **kwargs): +def harm_deharmonize(data, field, flagger, method, to_drop=None, **kwargs): data, flagger = proc_projectFlags( - data, str(field) + ORIGINAL_SUFFIX, flagger, method, source=field, drop_flags=drop_flags, **kwargs + data, str(field) + ORIGINAL_SUFFIX, flagger, method, source=field, to_drop=to_drop, **kwargs ) data, flagger = proc_drop(data, field, flagger) data, flagger = proc_rename(data, str(field) + ORIGINAL_SUFFIX, flagger, field) diff --git a/saqc/funcs/proc_functions.py b/saqc/funcs/proc_functions.py index 1a10b1c03633669beb17f8931bc82b11d0f6075c..9d889603402eafa8ff9428ddc0ba0ec98e30d0b8 100644 --- a/saqc/funcs/proc_functions.py +++ b/saqc/funcs/proc_functions.py @@ -173,7 +173,7 @@ def proc_interpolateGrid( freq, method, inter_order=2, - drop_flags=None, + to_drop=None, downgrade_interpolation=False, empty_intervals_flag=None, **kwargs @@ -200,7 +200,7 @@ def proc_interpolateGrid( If there your selected interpolation method can be performed at different 'orders' - here you pass the desired order. - drop_flags : list or string, default None + to_drop : list or string, default None Flags that refer to values you want to drop before interpotion - effectively excluding grid points from interpolation, that are only surrounded by values having a flag in them, that is listed in drop flags. Default results in the flaggers 'BAD' flag to be the drop_flag. @@ -221,7 +221,7 @@ def proc_interpolateGrid( if empty_intervals_flag is None: empty_intervals_flag = flagger.BAD - drop_mask = dropper(field, drop_flags, flagger, flagger.BAD) + drop_mask = dropper(field, to_drop, flagger, flagger.BAD) drop_mask |= flagscol.isna() drop_mask |= datcol.isna() datcol[drop_mask] = np.nan @@ -324,7 +324,7 @@ def proc_resample( max_invalid_total_f=np.inf, flag_agg_func=max, empty_intervals_flag=None, - drop_flags=None, + to_drop=None, all_na_2_empty=False, **kwargs ): @@ -392,10 +392,10 @@ na_ser.resample('10min').apply(lambda x: x.count()) checking for "max_total_invalid_f" and "max_consec_invalid_f patterns". Default triggers flagger.BAD to be assigned. - drop_flags : list or string, default None + to_drop : list or string, default None Flags that refer to values you want to drop before resampling - effectively excluding values that are flagged - with a flag in drop_flags from the resampling process - this means that they also will not be counted in the - the max_consec/max_total evaluation. Drop_flags = None results in NO flags being dropped initially. + with a flag in to_drop from the resampling process - this means that they also will not be counted in the + the max_consec/max_total evaluation. to_drop = None results in NO flags being dropped initially. """ data = data.copy() @@ -404,7 +404,7 @@ na_ser.resample('10min').apply(lambda x: x.count()) if empty_intervals_flag is None: empty_intervals_flag = flagger.BAD - drop_mask = dropper(field, drop_flags, flagger, []) + drop_mask = dropper(field, to_drop, flagger, []) datcol.drop(datcol[drop_mask].index, inplace=True) flagscol.drop(flagscol[drop_mask].index, inplace=True) if all_na_2_empty: @@ -446,12 +446,12 @@ na_ser.resample('10min').apply(lambda x: x.count()) @register -def proc_shift(data, field, flagger, freq, method, drop_flags=None, empty_intervals_flag=None, **kwargs): +def proc_shift(data, field, flagger, freq, method, to_drop=None, empty_intervals_flag=None, **kwargs): """ Function to shift data points to regular (equidistant) timestamps. Values get shifted according to the keyword passed to 'method'. - Note: all data nans get excluded defaultly from shifting. If drop_flags is None - all BAD flagged values get + Note: all data nans get excluded defaultly from shifting. If to_drop is None - all BAD flagged values get excluded as well. 'nshift' - every grid point gets assigned the nearest value in its range ( range = +/-(freq/2) ) @@ -474,9 +474,9 @@ def proc_shift(data, field, flagger, freq, method, drop_flags=None, empty_interv A Flag, that you want to assign to grid points, where no values are avaible to be shifted to. Default triggers flagger.BAD to be assigned. - drop_flags : list or string, default None + to_drop : list or string, default None Flags that refer to values you want to drop before shifting - effectively, excluding values that are flagged - with a flag in drop_flags from the shifting process. Default - Drop_flags = None - results in flagger.BAD + with a flag in to_drop from the shifting process. Default - to_drop = None - results in flagger.BAD values being dropped initially. """ @@ -487,7 +487,7 @@ def proc_shift(data, field, flagger, freq, method, drop_flags=None, empty_interv if empty_intervals_flag is None: empty_intervals_flag = flagger.BAD - drop_mask = dropper(field, drop_flags, flagger, flagger.BAD) + drop_mask = dropper(field, to_drop, flagger, flagger.BAD) drop_mask |= datcol.isna() datcol[drop_mask] = np.nan datcol.dropna(inplace=True) @@ -529,7 +529,7 @@ def proc_transform(data, field, flagger, func, **kwargs): @register -def proc_projectFlags(data, field, flagger, method, source, freq=None, drop_flags=None, **kwargs): +def proc_projectFlags(data, field, flagger, method, source, freq=None, to_drop=None, **kwargs): """ The Function projects flags of "source" onto flags of "field". Wherever the "field" flags are "better" then the @@ -575,7 +575,7 @@ def proc_projectFlags(data, field, flagger, method, source, freq=None, drop_flag The freq determines the projection range for the projection method. See above description for more details. Defaultly (None), the sampling frequency of source is used. - drop_flags: list or String + to_drop: list or String Flags referring to values that are to drop before flags projection. Relevant only when projecting wiht an inverted shift method. Defaultly flagger.BAD is listed. @@ -620,7 +620,7 @@ def proc_projectFlags(data, field, flagger, method, source, freq=None, drop_flag # # starting with the dropping and its memorization: - drop_mask = dropper(field, drop_flags, flagger, flagger.BAD) + drop_mask = dropper(field, to_drop, flagger, flagger.BAD) drop_mask |= target_datcol.isna() target_flagscol_drops = target_flagscol[drop_mask] target_flagscol.drop(drop_mask[drop_mask].index, inplace=True) diff --git a/saqc/lib/plotting.py b/saqc/lib/plotting.py index 859653899f935be0786063d156ca1c9aed3e8f3b..9b780d70198b3e6085b6fda21f5524c7d9c619a6 100644 --- a/saqc/lib/plotting.py +++ b/saqc/lib/plotting.py @@ -165,11 +165,18 @@ def _plotMultipleVariables( ncols += [ncols_rest] gs_kw = dict(width_ratios=_layout_data_to_table_ratio) - layout = dict(figsize=_figsize, sharex=True, tight_layout=True, squeeze=False, gridspec_kw=gs_kw) + layout = dict( + figsize=_figsize, + sharex=True, + tight_layout=True, + squeeze=False, + gridspec_kw=gs_kw if show_tab else {} + ) # plot max. 4 plots per figure allaxs = [] for n in range(nfig): + fig, axs = plt.subplots(nrows=ncols[n], ncols=2 if show_tab else 1, **layout) for ax in axs: @@ -180,7 +187,7 @@ def _plotMultipleVariables( plot_ax, tab_ax = ax _plotInfoTable(tab_ax, tar, _plotstyle, len(tar["data"])) else: - plot_ax = ax + plot_ax = ax[0] _plotFromDicts(plot_ax, tar, _plotstyle) diff --git a/saqc/lib/tools.py b/saqc/lib/tools.py index 43f7769ab9de54dfe20af4611a8afb2ab62c6cf2..3f1121a5363b87e92b6fb2d0963a8ee794e50642 100644 --- a/saqc/lib/tools.py +++ b/saqc/lib/tools.py @@ -334,13 +334,13 @@ def isQuoted(string): return bool(re.search(r"'.*'|\".*\"", string)) -def dropper(field, drop_flags, flagger, default): +def dropper(field, to_drop, flagger, default): drop_mask = pd.Series(False, flagger.getFlags(field).index) - if drop_flags is None: - drop_flags = default - drop_flags = toSequence(drop_flags) - if len(drop_flags) > 0: - drop_mask |= flagger.isFlagged(field, flag=drop_flags) + if to_drop is None: + to_drop = default + to_drop = toSequence(to_drop) + if len(to_drop) > 0: + drop_mask |= flagger.isFlagged(field, flag=to_drop) return drop_mask diff --git a/test/funcs/test_harm_funcs.py b/test/funcs/test_harm_funcs.py index ecf1d78e7d2d787f6b6c1a8319fe41c247f4d1f6..97cd96f05aca1e3b2f92dc77624ff6865839921c 100644 --- a/test/funcs/test_harm_funcs.py +++ b/test/funcs/test_harm_funcs.py @@ -220,7 +220,7 @@ def test_wrapper(data, flagger): freq = "15min" flagger = flagger.initFlags(data) - harm_linear2Grid(data, field, flagger, freq, drop_flags=None) - harm_aggregate2Grid(data, field, flagger, freq, value_func=np.nansum, method="nagg", drop_flags=None) - harm_shift2Grid(data, field, flagger, freq, method="nshift", drop_flags=None) + harm_linear2Grid(data, field, flagger, freq, to_drop=None) + harm_aggregate2Grid(data, field, flagger, freq, value_func=np.nansum, method="nagg", to_drop=None) + harm_shift2Grid(data, field, flagger, freq, method="nshift", to_drop=None) harm_interpolate2Grid(data, field, flagger, freq, method="spline")