Skip to content
Snippets Groups Projects
Commit af811929 authored by Peter Lünenschloß's avatar Peter Lünenschloß
Browse files

replaced links to temporal markdown processing products

parent 525447b2
No related branches found
No related tags found
7 merge requests!685Release 2.4,!684Release 2.4,!567Release 2.2.1,!566Release 2.2,!501Release 2.1,!372fix doctest snippets,!369Current documentation
Pipeline #52183 failed with stages
in 7 minutes and 6 seconds
Showing
with 42 additions and 2409 deletions
......@@ -27,7 +27,7 @@ clean:
rm -rf _build _static
rm -rf ../$(FUNCTIONS)
mkdir _static
for k in $(MDLIST); do rm -r "$$k"_m2r; done
# for k in $(MDLIST); do rm -r "$$k"_m2r; done
# trigger (saqc) customized documentation pipeline
doc:
......@@ -36,7 +36,7 @@ doc:
# generate environment table from dictionary
python make_env_tab.py
# make rest folders from markdown folders
for k in $(MDLIST); do python make_md_to_rst.py -p sphinx-doc/"$$k" -sr ".."; done
# for k in $(MDLIST); do python make_md_to_rst.py -p sphinx-doc/"$$k" -sr ".."; done
# make the html build
@$(SPHINXBUILD) -M html "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
......
......@@ -13,52 +13,52 @@ The tutorial guides through the following steps:
Initially, we introduce and motivate regularisation techniques and we do import the tutorial data.
* :ref:`Why Regularisation <cook_books_md_m2r/DataRegularisation:Why Regularisation>`
* :ref:`Tutorial Data <cook_books_md_m2r/DataRegularisation:Tutorial Data>`
* :ref:`Why Regularisation <cook_books/DataRegularisation:Why Regularisation>`
* :ref:`Tutorial Data <cook_books/DataRegularisation:Tutorial Data>`
#.
We will get an overview over the main :ref:`Regularisation <cook_books_md_m2r/DataRegularisation:regularisations>` methods, starting with the shift.
We will get an overview over the main :ref:`Regularisation <cook_books/DataRegularisation:regularisations>` methods, starting with the shift.
* :ref:`Shift <cook_books_md_m2r/DataRegularisation:shift>`
* :ref:`Shift <cook_books/DataRegularisation:shift>`
* :ref:`Target Parameter <cook_books_md_m2r/DataRegularisation:target parameter>`
* :ref:`Target Parameter <cook_books/DataRegularisation:target parameter>`
* :ref:`Freq Parameter <cook_books_md_m2r/DataRegularisation:freq parameter>`
* :ref:`Method Parameter <cook_books_md_m2r/DataRegularisation:shifting method>`
* :ref:`Valid Data <cook_books_md_m2r/DataRegularisation:Valid Data>`
* :ref:`Freq Parameter <cook_books/DataRegularisation:freq parameter>`
* :ref:`Method Parameter <cook_books/DataRegularisation:shifting method>`
* :ref:`Valid Data <cook_books/DataRegularisation:Valid Data>`
#.
We introduce the notion of *valid* data and see how sparse intervals and those with multiple values interact with
regularisation.
* :ref:`Data Loss and Empty Intervals <cook_books_md_m2r/DataRegularisation:data loss and empty intervals>`
* :ref:`Data Loss and Empty Intervals <cook_books/DataRegularisation:data loss and empty intervals>`
* :ref:`Empty Intervals <cook_books_md_m2r/DataRegularisation:empty intervals>`
* :ref:`Empty Intervals <cook_books/DataRegularisation:empty intervals>`
* :ref:`Valid Data <cook_books_md_m2r/DataRegularisation:Valid Data>`
* :ref:`Data Reduction <cook_books_md_m2r/DataRegularisation:data reduction>`
* :ref:`Minimize Shifting <cook_books_md_m2r/DataRegularisation:minimize shifting distance>`
* :ref:`Valid Data <cook_books/DataRegularisation:Valid Data>`
* :ref:`Data Reduction <cook_books/DataRegularisation:data reduction>`
* :ref:`Minimize Shifting <cook_books/DataRegularisation:minimize shifting distance>`
#.
We use the Aggregation and the Interpolation method.
* :ref:`Aggregation <cook_books_md_m2r/DataRegularisation:aggregation>`
* :ref:`Aggregation <cook_books/DataRegularisation:aggregation>`
* :ref:`Function Parameter <cook_books_md_m2r/DataRegularisation:aggregation functions>`
* :ref:`Method Parameter <cook_books_md_m2r/DataRegularisation:shifting method>`
* :ref:`Function Parameter <cook_books/DataRegularisation:aggregation functions>`
* :ref:`Method Parameter <cook_books/DataRegularisation:shifting method>`
* :ref:`Interpolation <cook_books_md_m2r/DataRegularisation:interpolation>`
* :ref:`Interpolation <cook_books/DataRegularisation:interpolation>`
* :ref:`Representing Data Sparsity <cook_books_md_m2r/DataRegularisation:interpolation and data sparsity>`
* :ref:`Representing Data Sparsity <cook_books/DataRegularisation:interpolation and data sparsity>`
#.
We see how regularisation interacts with Flags.
* :ref:`Flags and Regularisation <cook_books_md_m2r/DataRegularisation:flags and regularisation>`
* :ref:`Flags and Regularisation <cook_books/DataRegularisation:flags and regularisation>`
Why Regularisation
------------------
......@@ -137,9 +137,9 @@ Basically, there are three types of :doc:`regularisation <function_cats/regulari
#. We could keep the values as they are, and thus,
just :ref:`shift <cook_books_md_m2r/DataRegularisation:Shift>` them in time to match the equidistant *10* minutes frequency grid, we want the data to exhibit.
#. We could calculate new, synthetic data values for the regular timestamps, via an :ref:`interpolation <cook_books_md_m2r/DataRegularisation:Interpolation>` method.
#. We could apply some :ref:`aggregation <cook_books_md_m2r/DataRegularisation:Resampling>` to up- or down sample the data.
just :ref:`shift <cook_books/DataRegularisation:Shift>` them in time to match the equidistant *10* minutes frequency grid, we want the data to exhibit.
#. We could calculate new, synthetic data values for the regular timestamps, via an :ref:`interpolation <cook_books/DataRegularisation:Interpolation>` method.
#. We could apply some :ref:`aggregation <cook_books/DataRegularisation:Resampling>` to up- or down sample the data.
Shift
-----
......@@ -227,7 +227,7 @@ transformation as well. That change stems from 2 sources mainly:
Empty Intervals
^^^^^^^^^^^^^^^
If there is no :ref:`valid <cook_books_md_m2r/DataRegularisation:valid data>` data point available within an interval of the passed frequency,
If there is no :ref:`valid <cook_books/DataRegularisation:valid data>` data point available within an interval of the passed frequency,
that could be shifted to match a multiple of the frequency, a ``NaN`` value gets inserted to represent the fact,
that in the interval that is represented by that date time index, there was data missing.
......@@ -247,8 +247,8 @@ Data points are referred to, as *valid*\ , in context of a regularisation, if:
Note, that, from point *2* above, it follows, that flagging data values
before regularisation, will effectively exclude them from the regularistaion process. See chapter
:ref:`flagging and resampling <cook_books_md_m2r/DataRegularisation:flagging and resampling>` for an example of this effect and how it can help
control :ref:`data reduction <cook_books_md_m2r/DataRegularisation:data reduction>`.
:ref:`flagging and resampling <cook_books/DataRegularisation:flagging and resampling>` for an example of this effect and how it can help
control :ref:`data reduction <cook_books/DataRegularisation:data reduction>`.
data reduction
^^^^^^^^^^^^^^
......@@ -259,7 +259,7 @@ interval.
This reduction depends on the selected :doc:`regularisation <../function_cats/regularisation>` method.
For example, :ref:`above <cook_books_md_m2r/DataRegularisation:shift>`\ , we applied a backwards :py:func:`shift <Functions.saqc.shift>` with a *10* minutes frequency.
For example, :ref:`above <cook_books/DataRegularisation:shift>`\ , we applied a backwards :py:func:`shift <Functions.saqc.shift>` with a *10* minutes frequency.
As a result, the first value, encountered after any multiple of *10* minutes, gets shifted backwards to be aligned with
the desired frequency and any other value in that *10* minutes interval just gets discarded.
......@@ -390,13 +390,13 @@ for calculating the median, ``sum``\ , for assigning the value sum, and so on.)
Aggregation method
^^^^^^^^^^^^^^^^^^
As it is with the :ref:`shift <cook_books_md_m2r/DataRegularisation:Shift>` functionality, a ``method`` keyword controlls, weather the
As it is with the :ref:`shift <cook_books/DataRegularisation:Shift>` functionality, a ``method`` keyword controlls, weather the
aggregation result for the interval in between 2 regular timestamps gets assigned to the left (=\ ``bagg``\ ) or to the
right (\ ``fagg``\ ) boundary timestamp.
* Also, analogous to to the shift functionality, intervals of size ``freq``\ , that do
not contain any :ref:`valid <cook_books_md_m2r/DataRegularisation:valid data>` data, that could be aggregated, get ``ǹp.nan`` assigned.
not contain any :ref:`valid <cook_books/DataRegularisation:valid data>` data, that could be aggregated, get ``ǹp.nan`` assigned.
Interpolation
-------------
......@@ -449,15 +449,15 @@ Interpolation and Data Sparsity
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The regularisation by interpolation is strict in the sense, that regular timestamps *only* get
interpolated, if they have at least one :ref:`valid <cook_books_md_m2r/DataRegularisation:valid data>` data value preceeding them *and* one
interpolated, if they have at least one :ref:`valid <cook_books/DataRegularisation:valid data>` data value preceeding them *and* one
succeeding them *within* the given frequency range (wich is controlled by the ``freq`` keyword.).
Thats, why, you have no interpolation value at ``2021-03-20 07:30:00`` - bacause it is preceeded
by a :ref:`valid <cook_books_md_m2r/DataRegularisation:valid data>` value at ``2021-03-20 07:26:16``\ , but there is no :ref:`valid <cook_books_md_m2r/DataRegularisation:valid data>` value
by a :ref:`valid <cook_books/DataRegularisation:valid data>` value at ``2021-03-20 07:26:16``\ , but there is no :ref:`valid <cook_books/DataRegularisation:valid data>` value
available in between the succeeding *10* minutes interval from ``2021-03-20 07:30:00`` to ``2021-03-20 07:30:00``.
On the other hand, there is an interpolated value assigned to ``2021-03-20 07:50:00``\ , it is preceeded by
a :ref:`valid <cook_books_md_m2r/DataRegularisation:valid data>` value at ``2021-03-20 07:40:37`` and one succeeding at ``2021-03-20 07:54:59``.
a :ref:`valid <cook_books/DataRegularisation:valid data>` value at ``2021-03-20 07:40:37`` and one succeeding at ``2021-03-20 07:54:59``.
This behavior is intended to reflect the sparsity of the original data in the
regularized data set. The behavior can be circumvented by applying the more general
......@@ -472,7 +472,7 @@ Flags and Regularisation
------------------------
Since data, that is flagged by a level higher or equal to the passed ``to_mask`` value
(default=:py:const:~saqc.constants.BAD), is not regarded :ref:`valid <cook_books_md_m2r/DataRegularisation:valid data>` by the applied function,
(default=:py:const:~saqc.constants.BAD), is not regarded :ref:`valid <cook_books/DataRegularisation:valid data>` by the applied function,
it can be of advantage, to flag data before regularisation in order to effectively exclude it
from the resulting regularly sampled data set. Lets see an example for the *SoilMoisture* data set.
......
......@@ -8,11 +8,11 @@ Mainly we will see how to apply Drift Corrections onto the data and how to perfo
#.
* :ref:`Data Preparation <cook_books_md_m2r/MultivariateFlagging:Data Preparation>`
* :ref:`Data Preparation <cook_books/MultivariateFlagging:Data Preparation>`
#.
* :ref:`Drift Correction <cook_books_md_m2r/MultivariateFlagging:Drift Correction>`
* :ref:`Drift Correction <cook_books/MultivariateFlagging:Drift Correction>`
#.
......@@ -37,7 +37,7 @@ Exponential Drift
* The variables *SAK254* and *Turbidity* show drifting behavior originating from dirt, that accumulates on the light sensitive sensor surfaces over time.
* The effect, the dirt accumulation has on the measurement values, is assumed to be properly described by an exponential model.
* The Sensors are cleaned periodocally, resulting in a periodical reset of the drifting effect.
* The Dates and Times of the maintenance events are input to the :py:func:`correctDrift <Functions.saqc.correctDrift>`, that will correct the data in between any two such maintenance intervals. (Find some formal description of the process :doc:`here <../misc_md_m2r/ExponentialModel>`.)
* The Dates and Times of the maintenance events are input to the :py:func:`correctDrift <Functions.saqc.correctDrift>`, that will correct the data in between any two such maintenance intervals. (Find some formal description of the process :doc:`here <../misc/ExponentialModel>`.)
Linear Long Time Drift
^^^^^^^^^^^^^^^^^^^^^^
......
This diff is collapsed.
Multivariate Flagging
=====================
The tutorial aims to introduce the usage of SaQC in the context of some more complex flagging and processing techniques.
Mainly we will see how to apply Drift Corrections onto the data and how to perform multivariate flagging.
#.
* :ref:`Data Preparation <cook_books_md_m2r/MultivariateFlagging:Data Preparation>`
#.
* :ref:`Drift Correction <cook_books_md_m2r/MultivariateFlagging:Drift Correction>`
#.
* `Multivariate Flagging (odd Water) <#Multivariate-Flagging>`_
Data preparation
----------------
* Flagging missing values via :py:func:`flagMissing <Functions.saqc.flagMissing>`.
* Flagging out of range values via :py:func:`flagRange <Functions.saqc.flagRange>`.
* Flagging values, where the Specific Conductance (\ *K25*\ ) drops down to near zero. (via :py:func:`flagGeneric <Functions.saqc.flag>`)
* Resampling the data via linear Interpolation (:py:func:`linear <Functions.saqc.linear>`).
Drift Correction
----------------
Exponential Drift
^^^^^^^^^^^^^^^^^
* The variables *SAK254* and *Turbidity* show drifting behavior originating from dirt, that accumulates on the light sensitive sensor surfaces over time.
* The effect, the dirt accumulation has on the measurement values, is assumed to be properly described by an exponential model.
* The Sensors are cleaned periodocally, resulting in a periodical reset of the drifting effect.
* The Dates and Times of the maintenance events are input to the :py:func:`correctDrift <Functions.saqc.correctDrift>`, that will correct the data in between any two such maintenance intervals. (Find some formal description of the process :doc:`here <../misc_md_m2r/ExponentialModel>`.)
Linear Long Time Drift
^^^^^^^^^^^^^^^^^^^^^^
* Afterwards, there remains a long time linear Drift in the *SAK254* and *Turbidity* measurements, originating from scratches, that accumule on the sensors glass lenses over time
* The lenses are replaced periodically, resulting in a periodical reset of that long time drifting effect
* The Dates and Times of the lenses replacements are input to the :py:func:`correctDrift <Functions.saqc.correctDrift>`, that will correct the data in between any two such maintenance intervals according to the assumption of a linearly increasing bias.
Maintenance Intervals Flagging
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
* The *SAK254* and *Turbidity* values, obtained while maintenance, are, of course not trustworthy, thus, all the values obtained while maintenance get flagged via the :py:func:`flagManual <Functions.saqc.flagManual>` method.
* When maintaining the *SAK254* and *Turbidity* sensors, also the *NO3* sensors get removed from the water - thus, they also have to be flagged via the :py:func:`flagManual <Functions.saqc.flagManual>` method.
Multivariate Flagging
---------------------
Basically following the *oddWater* procedure, as suggested in *Talagala, P.D. et al (2019): A Feature-Based Procedure for Detecting Technical Outliers in Water-Quality Data From In Situ Sensors. Water Ressources Research, 55(11), 8547-8568.*
* Variables *SAK254*\ , *Turbidity*\ , *Pegel*\ , *NO3N*\ , *WaterTemp* and *pH* get transformed to comparable scales
* We are obtaining nearest neighbor scores and assigign those to a new variable, via :py:func:`assignKNNScores <Functions.saqc.assignKNNScores>`.
* We are applying the *STRAY* Algorithm to find the cut_off points for the scores, above which values qualify as outliers. (:py:func:`flagByStray <Functions.saqc.flagByStray>`)
* We project the calculated flags onto the input variables via :py:func:`assignKNNScore <Functions.saqc.assignKNNScore>`.
Postprocessing
--------------
* (Flags reduction onto subspaces)
* Back projection of calculated flags from resampled Data onto original data via :py:func: ``mapToOriginal <Functions.saqc.mapToOriginal>``
This diff is collapsed.
Configuration Files
===================
The behaviour of SaQC can be completely controlled by a text based configuration file.
Format
------
SaQC expects configuration files to be semicolon-separated text files with a
fixed header. Each row of the configuration file lists
one variable and one or several test functions that are applied on the given variable.
Header names
^^^^^^^^^^^^
The header names are basically fixed, but if you really insist in custom
configuration headers have a look `here <saqc/core/config.py>`_.
.. list-table::
:header-rows: 1
* - Name
- Data Type
- Description
- Required
* - varname
- string
- name of a variable
- yes
* - test
- :ref:`function notation <getting_started_md_m2r/ConfigurationFiles:test function notation>`
- test function
- yes
* - plot
- boolean (\ ``True``\ /\ ``False``\ )
- plot the test's result
- no
Test function notation
^^^^^^^^^^^^^^^^^^^^^^
The notation of test functions follows the function call notation of Python and
many other programming languages and looks like this:
.. code-block::
flagRange(min=0, max=100)
Here the function ``flagRange`` is called and the values ``0`` and ``100`` are passed
to the parameters ``min`` and ``max`` respectively. As we value readablity
of the configuration more than conciseness of the extrension language, only
keyword arguments are supported. That means that the notation ``flagRange(0, 100)``
is not a valid replacement for the above example.
Examples
--------
Single Test
^^^^^^^^^^^
Every row lists one test per variable. If you want to call multiple tests on
a specific variable (and you probably want to), list them in separate rows:
.. code-block::
varname | test
#-------|----------------------------------
x | flagMissing()
x | flagRange(min=0, max=100)
x | constants_flagBasic(window="3h")
y | flagRange(min=-10, max=40)
Multiple Tests
^^^^^^^^^^^^^^
A row lists multiple tests for a specific variable in separate columns. All test
columns need to share the common prefix ``test``\ :
.. code-block::
varname ; test_1 ; test_2 ; test_3
#-------;----------------------------;---------------------------;---------------------------------
x ; flagMissing() ; flagRange(min=0, max=100) ; constants_flagBasic(window="3h")
y ; flagRange(min=-10, max=40) ; ;
The evaluation of such a configuration is in columns-major order, so the given
example is identical to the following:
.. code-block::
varname ; test_1
#-------;---------------------------------
x ; flagMissing()
y ; flagRange(min=-10, max=40)
x ; flagRange(min=0, max=100)
x ; constants_flagBasic(window="3h")
Plotting
^^^^^^^^
As the process of finding a good quality check setup is somewhat experimental, SaQC
provides a possibility to plot the results of the test functions. To use this feature, add the optional column ``plot`` and set it
to ``True`` for all results you want to plot. These plots are
meant to provide a quick and easy visual evaluation of the test.
.. code-block::
varname ; test ; plot
#-------;----------------------------------;-----
x ; flagMissing() ;
x ; flagRange(min=0, max=100) ; False
x ; constants_flagBasic(window="3h") ; True
y ; flagRange(min=-10, max=40)` ;
Regular Expressions in ``varname`` column
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Some of the tests (e.g. checks for missing values, range tests or interpolation
functions) are very likely to be used on all or at least several variables of
the processed dataset. As it becomes quite cumbersome to list all these
variables seperately, only to call the same functions with the same
parameters, SaQC supports regular expressions
within the ``varname`` column. Please not that a ``varname`` needs to be quoted
(with ``'`` or ``"``\ ) in order to be interpreted as a regular expression.
.. code-block::
varname ; test
#----------;------------------------------
'.*' ; harm_shift2Grid(freq="15Min")
'(x \| y)' ; flagMissing()
Customizations
==============
SaQC comes with a continuously growing number of pre-implemented
:doc:`quality check and processing routines <FunctionIndex>` and
flagging schemes.
For any sufficiently large use case however it is very likely that the
functions provided won't fulfill all your needs and requirements.
Acknowledging the impossibility to address all imaginable use cases, we
designed the system to allow for extensions and costumizations. The main extensions options, namely
:ref:`quality check routines <getting_started_md_m2r/Customizations:custom quality check routines>`
and the :ref:`flagging scheme <getting_started_md_m2r/Customizations:custom flagging schemes>`
are described within this documents.
Custom quality check routines
-----------------------------
In case you are missing quality check routines, you are of course very
welcome to file a feature request issue on the project's
`gitlab repository <https://git.ufz.de/rdm-software/saqc>`_. However, if
you are more the "no-way-I-get-this-done-by-myself" type of person,
SaQC provides two ways to integrate custom routines into the system:
#. The :doc:`extension language <GenericFunctions>`
#. An :ref:`interface <getting_started_md_m2r/Customizations:interface>` to the evaluation machinery
Interface
^^^^^^^^^
In order to make a function usable within the evaluation framework of SaQC the following interface is needed:
.. code-block:: python
import pandas
import dios
import saqc
def yourTestFunction(
data: pandas.DataFrame,
field: str,
flags: saqc.Flags,
*args,
**kwargs
) -> (dios.DictOfSeries, saqc.Flags)
Argument Descriptions
~~~~~~~~~~~~~~~~~~~~~
.. list-table::
:header-rows: 1
* - Name
- Description
* - ``data``
- The actual dataset.
* - ``field``
- The field/column within ``data``\ , that function is processing.
* - ``flags``
- An instance of Flags, responsible for the translation of test results into quality attributes.
* - ``args``
- Any other arguments needed to parameterize the function.
* - ``kwargs``
- Any other keyword arguments needed to parameterize the function.
Integrate into SaQC
^^^^^^^^^^^^^^^^^^^
In order make your function available to the system it needs to be registered. We provide a decorator
`\ ``register`` <saqc/functions/register.py>`_ with saqc, to integrate your
test functions into SaQC. Here is a complete dummy example:
.. code-block:: python
from saqc import register
@register()
def yourTestFunction(data, field, flags, *args, **kwargs):
return data, flags
Example
^^^^^^^
The function `\ ``flagRange`` <saqc/funcs/functions.py>`_ provides a simple, yet complete implementation of
a quality check routine. You might want to look into its implementation as a reference for your own.
Custom flagging schemes
-----------------------
Sorry for the inconvenience! Coming soon...
DMP flagging scheme
===================
Possible flags
--------------
The DMP scheme produces the following flag constants:
* "ok"
* "doubtfull"
* "bad"
Generic Functions
=================
Generic Flagging Functions
--------------------------
Generic flagging functions provide for cross-variable quality
constraints and to implement simple quality checks directly within the configuration.
Why?
^^^^
In most real world datasets many errors
can be explained by the dataset itself. Think of a an active, fan-cooled
measurement device: no matter how precise the instrument may work, problems
are to be expected when the fan stops working or the power supply
drops below a certain threshold. While these dependencies are easy to
:ref:`formalize <getting_started_md_m2r/GenericFunctions:a real world example>` on a per dataset basis, it is quite
challenging to translate them into generic source code.
Specification
^^^^^^^^^^^^^
Generic flagging functions are used in the same manner as their
:doc:`non-generic counterparts <FunctionIndex>`. The basic
signature looks like that:
.. code-block:: sh
flagGeneric(func=<expression>, flag=<flagging_constant>)
where ``<expression>`` is composed of the :ref:`supported constructs <getting_started_md_m2r/GenericFunctions:supported constructs>`
and ``<flag_constant>`` is one of the predefined
:ref:`flagging constants <getting_started_md_m2r/ParameterDescriptions:flagging constants>` (default: ``BAD``\ ).
Generic flagging functions are expected to return a boolean value, i.e. ``True`` or ``False``. All other expressions will
fail during the runtime of ``SaQC``.
Examples
^^^^^^^^
Simple comparisons
~~~~~~~~~~~~~~~~~~
Task
""""
Flag all values of ``x`` where ``y`` falls below 0.
Configuration file
""""""""""""""""""
.. code-block::
varname ; test
#-------;------------------------
x ; flagGeneric(func=y < 0)
Calculations
~~~~~~~~~~~~
Task
""""
Flag all values of ``x`` that exceed 3 standard deviations of ``y``.
Configuration file
""""""""""""""""""
.. code-block::
varname ; test
#-------;---------------------------------
x ; flagGeneric(func=x > std(y) * 3)
Special functions
~~~~~~~~~~~~~~~~~
Task
""""
Flag all values of ``x`` where: ``y`` is flagged and ``z`` has missing values.
Configuration file
""""""""""""""""""
.. code-block::
varname ; test
#-------;----------------------------------------------
x ; flagGeneric(func=isflagged(y) & ismissing(z))
A real world example
~~~~~~~~~~~~~~~~~~~~
Let's consider the following dataset:
.. list-table::
:header-rows: 1
* - date
- meas
- fan
- volt
* - 2018-06-01 12:00
- 3.56
- 1
- 12.1
* - 2018-06-01 12:10
- 4.7
- 0
- 12.0
* - 2018-06-01 12:20
- 0.1
- 1
- 11.5
* - 2018-06-01 12:30
- 3.62
- 1
- 12.1
* - ...
-
-
-
Task
""""
Flag ``meas`` where ``fan`` equals 0 and ``volt``
is lower than ``12.0``.
Configuration file
""""""""""""""""""
There are various options. We can directly implement the condition as follows:
.. code-block::
varname ; test
#-------;-----------------------------------------------
meas ; flagGeneric(func=(fan == 0) \| (volt < 12.0))
But we could also quality check our independent variables first
and than leverage this information later on:
.. code-block::
varname ; test
#-------;----------------------------------------------------
'.*' ; flagMissing()
fan ; flagGeneric(func=fan == 0)
volt ; flagGeneric(func=volt < 12.0)
meas ; flagGeneric(func=isflagged(fan) \| isflagged(volt))
Generic Processing
------------------
Generic processing functions provide a way to evaluate mathmetical operations
and functions on the variables of a given dataset.
Why
^^^
In many real-world use cases, quality control is embedded into a larger data
processing pipeline and it is not unusual to even have certain processing
requirements as a part of the quality control itself. Generic processing
functions make it easy to enrich a dataset through the evaluation of a
given expression.
Specification
^^^^^^^^^^^^^
The basic signature looks like that:
.. code-block:: sh
procGeneric(func=<expression>)
where ``<expression>`` is composed of the :ref:`supported constructs <getting_started_md_m2r/GenericFunctions:supported constructs>`.
Variable References
-------------------
All variables of the processed dataset are available within generic functions,
so arbitrary cross references are possible. The variable of interest
is furthermore available with the special reference ``this``\ , so the second
:ref:`example <getting_started_md_m2r/GenericFunctions:calculations>` could be rewritten as:
.. code-block::
varname ; test
#-------;------------------------------------
x ; flagGeneric(func=this > std(y) * 3)
When referencing other variables, their flags will be respected during evaluation
of the generic expression. So, in the example above only values of ``x`` and ``y``\ , that
are not already flagged with ``BAD`` will be used the avaluation of ``x > std(y)*3``.
Supported constructs
--------------------
Operators
^^^^^^^^^
Comparison
~~~~~~~~~~
The following comparison operators are available:
.. list-table::
:header-rows: 1
* - Operator
- Description
* - ``==``
- ``True`` if the values of the operands are equal
* - ``!=``
- ``True`` if the values of the operands are not equal
* - ``>``
- ``True`` if the values of the left operand are greater than the values of the right operand
* - ``<``
- ``True`` if the values of the left operand are smaller than the values of the right operand
* - ``>=``
- ``True`` if the values of the left operand are greater or equal than the values of the right operand
* - ``<=``
- ``True`` if the values of the left operand are smaller or equal than the values of the right operand
Arithmetics
~~~~~~~~~~~
The following arithmetic operators are supported:
.. list-table::
:header-rows: 1
* - Operator
- Description
* - ``+``
- addition
* - ``-``
- subtraction
* - ``*``
- multiplication
* - ``/``
- division
* - ``**``
- exponentiation
* - ``%``
- modulus
Bitwise
~~~~~~~
The bitwise operators also act as logical operators in comparison chains
.. list-table::
:header-rows: 1
* - Operator
- Description
* - ``&``
- binary and
* - ``|``
- binary or
* - ``^``
- binary xor
* - ``~``
- binary complement
Functions
^^^^^^^^^
All functions expect a :ref:`variable reference <getting_started_md_m2r/GenericFunctions:variable references>`
as the only non-keyword argument (see :ref:`here <getting_started_md_m2r/GenericFunctions:special functions>`\ )
Mathematical Functions
~~~~~~~~~~~~~~~~~~~~~~
.. list-table::
:header-rows: 1
* - Name
- Description
* - ``abs``
- absolute values of a variable
* - ``max``
- maximum value of a variable
* - ``min``
- minimum value of a variable
* - ``mean``
- mean value of a variable
* - ``sum``
- sum of a variable
* - ``std``
- standard deviation of a variable
* - ``len``
- the number of values for variable
Special Functions
~~~~~~~~~~~~~~~~~
.. list-table::
:header-rows: 1
* - Name
- Description
* - ``ismissing``
- check for missing values
* - ``isflagged``
- check for flags
Constants
^^^^^^^^^
Generic functions support the same constants as normal functions, a detailed
list is available :ref:`here <getting_started_md_m2r/ParameterDescriptions:constants>`.
Getting started with SaQC
=========================
Requirements: this tutorial assumes that you have Python version 3.6.1 or newer
installed, and that both your operating system and Python version are in 64-bit.
Contents
--------
#. :ref:`Set up your environment <getting_started_md_m2r/GettingStarted:1. set up your environment>`
#. :ref:`Get SaQC <getting_started_md_m2r/GettingStarted:2. get saqc>`
#. :ref:`Training tour <getting_started_md_m2r/GettingStarted:3. training tour>`
* :ref:`3.1 Get toy data and configuration <getting_started_md_m2r/GettingStarted:get toy data and configuration>`
* :ref:`3.2 Run SaQC <getting_started_md_m2r/GettingStarted:run saqc>`
* :ref:`3.3 Configure SaQC <getting_started_md_m2r/GettingStarted:configure saqc>`
* :ref:`Change test parameters <getting_started_md_m2r/GettingStarted:change test parameters>`
* :ref:`3.4 Explore the functionality <getting_started_md_m2r/GettingStarted:explore the functionality>`
* :ref:`Process multiple variables <getting_started_md_m2r/GettingStarted:process multiple variables>`
* :ref:`Data harmonization and custom functions <getting_started_md_m2r/GettingStarted:data harmonization and custom functions>`
* :ref:`Save outputs to file <getting_started_md_m2r/GettingStarted:save outputs to file>`
1. Set up your environment
--------------------------
SaQC is written in Python, so the easiest way to set up your system to use SaQC
for your needs is using the Python Package Index (PyPI). Following good Python
practice, you will first want to create a new virtual environment that you
install SaQC into by typing the following in your console:
On Unix/Mac-systems
"""""""""""""""""""
.. code-block:: sh
# if you have not installed venv yet, do so:
python3 -m pip install --user virtualenv
# move to the directory where you want to create your virtual environment
cd YOURDIR
# create virtual environment called "env_saqc"
python3 -m venv env_saqc
# activate the virtual environment
source env_saqc/bin/activate
On Windows-systems
""""""""""""""""""
.. code-block:: sh
# if you have not installed venv yet, do so:
py -3 -m pip install --user virtualenv
# move to the directory where you want to create your virtual environment
cd YOURDIR
# create virtual environment called "env_saqc"
py -3 -m venv env_saqc
# move to the Scripts directory in "env_saqc"
cd env_saqc/Scripts
# activate the virtual environment
./activate
2. Get SaQC
-----------
Via PyPI
^^^^^^^^
Type the following:
On Unix/Mac-systems
"""""""""""""""""""
.. code-block:: sh
python3 -m pip install saqc
On Windows-systems
""""""""""""""""""
.. code-block:: sh
py -3 -m pip install saqc
From Gitlab repository
^^^^^^^^^^^^^^^^^^^^^^
Download SaQC directly from the `GitLab-repository <https://git.ufz.de/rdm/saqc>`_ to make sure you use the most recent version:
.. code-block:: sh
# clone gitlab - repository
git clone https://git.ufz.de/rdm-software/saqc
# switch to the folder where you installed saqc
cd saqc
# install all required packages
pip install -r requirements.txt
# install all required submodules
git submodule update --init --recursive
3. Training tour
----------------
The following passage guides you through the essentials of the usage of SaQC via
a toy dataset and a toy configuration.
Get toy data and configuration
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
If you take a look into the folder ``saqc/ressources/data`` you will find a toy
dataset ``data.csv`` which contains the following:
.. code-block::
Date,Battery,SM1,SM2
2016-04-01 00:05:48,3573,32.685,29.3157
2016-04-01 00:20:42,3572,32.7428,29.3157
2016-04-01 00:35:37,3572,32.6186,29.3679
2016-04-01 00:50:32,3572,32.736999999999995,29.3679
...
These are two timeseries of soil moisture (SM1+2) and the battery voltage of the
measuring device over time. Generally, this is the way that your data should
look like to run saqc. Note, however, that you do not necessarily need a series
of dates to reference to and that you are free to use more columns of any name
that you like.
Now create your our own configuration file ``saqc/ressources/data/myconfig.csv``
and paste the following lines into it:
.. code-block::
varname;test;plot
SM2;flagRange(min=10, max=60);False
SM2;flagMad(window="30d", z=3.5);True
These lines illustrate how different quality control tests can be specified for
different variables by following the pattern:
..list-table::
:header-rows:1
*-*varname*
-;
-*testname(testparameters)*
-;
-*plottingoption*
*-
In this case, we define a range-test that flags all values outside the range
[10,60] and a test to detect spikes using the MAD-method. You can find an
overview of all available quality control tests in the
:doc:`documentation <FunctionIndex>`. Note that the tests are
*executed in the order that you define in the configuration file*. The quality
flags that are set during one test are always passed on to the subsequent one.
Run SaQC
^^^^^^^^
Remember to have your virtual environment activated:
On Unix/Mac-systems
"""""""""""""""""""
.. code-block:: sh
source env_saqc/bin/activate
On Windows
""""""""""
.. code-block:: sh
cd env_saqc/Scripts
./activate
Via your console, move into the folder you downloaded saqc into:
.. code-block:: sh
cd saqc
From here, you can run saqc and tell it to run the tests from the toy
config-file on the toy dataset via the ``-c`` and ``-d`` options:
On Unix/Mac-systems
"""""""""""""""""""
.. code-block:: sh
python3 -m saqc -c ressources/data/myconfig.csv -d ressources/data/data.csv
On Windows
""""""""""
.. code-block:: sh
py -3 -m saqc -c ressources/data/myconfig.csv -d ressources/data/data.csv
If you installed saqc via PYPi, you can omit ``sh python -m``.
The command will output this plot:
.. image:: ../ressources/images/example_plot_1.png
:target: ../ressources/images/example_plot_1.png
:alt: Toy Plot
So, what do we see here?
* The plot shows the data as well as the quality flags that were set by the
tests for the variable ``SM2``\ , as defined in the config-file
* Following our definition in the config-file, first the ``flagRange``\ -test that flags
all values outside the range [10,60] was executed and after that,
the ``flagMad``\ -test to identify spikes in the data
* In the config, we set the plotting option to ``True`` for ``flagMad``\ ,
only. Thus, the plot aggregates all preceeding tests (here: ``range``\ ) to black
points and highlights the flags of the selected test as red points.
Save outputs to file
~~~~~~~~~~~~~~~~~~~~
If you want the final results to be saved to a csv-file, you can do so by the
use of the ``-o`` option:
.. code-block:: sh
saqc -c ressources/data/config.csv -d ressources/data/data.csv -o ressources/data/out.csv
Which saves a dataframe that contains both the original data and the quality
flags that were assigned by SaQC for each of the variables:
.. code-block::
Date,SM1,SM1_flags,SM2,SM2_flags
2016-04-01 00:05:48,32.685,OK,29.3157,OK
2016-04-01 00:20:42,32.7428,OK,29.3157,OK
2016-04-01 00:35:37,32.6186,OK,29.3679,OK
2016-04-01 00:50:32,32.736999999999995,OK,29.3679,OK
...
Configure SaQC
^^^^^^^^^^^^^^
Change test parameters
~~~~~~~~~~~~~~~~~~~~~~
Now you can start to change the settings in the config-file and investigate the
effect that has on how many datapoints are flagged as "BAD". When using your
own data, this is your way to configure the tests according to your needs. For
example, you could modify your ``myconfig.csv`` and change the parameters of the
range-test:
.. code-block::
varname;test;plot
SM2;flagRange(min=-20, max=60);False
SM2;flagMad(window="30d", z=3.5);True
Rerunning SaQC as above produces the following plot:
.. image:: ../ressources/images/example_plot_2.png
:target: ../ressources/images/example_plot_2.png
:alt: Changing the config
You can see that the changes that we made to the parameters of the range test
take effect so that only the values > 60 are flagged by it (black points). This,
in turn, leaves more erroneous data that is then identified by the proceeding
spike-test (red points).
Explore the functionality
^^^^^^^^^^^^^^^^^^^^^^^^^
Process multiple variables
~~~~~~~~~~~~~~~~~~~~~~~~~~
You can also define multiple tests for multiple variables in your data. These
are then executed sequentially and can be plotted seperately. E.g. you could do
something like this:
.. code-block::
varname;test;plot
SM1;flagRange(min=10, max=60);False
SM2;flagRange(min=10, max=60);False
SM1;flagMad(window="15d", z=3.5);True
SM2;flagMad(window="30d", z=3.5);True
which gives you separate plots for each line where the plotting option is set to
``True`` as well as one summary "data plot" that depicts the joint flags from all
tests:
.. list-table::
:header-rows: 1
* - SM1
- SM2
* - here
- there
.. list-table::
:header-rows: 1
* - SM1
- SM2
* - .. image:: ../ressources/images/example_plot_31.png
:target: ../ressources/images/example_plot_31.png
:alt:
- .. image:: ../ressources/images/example_plot_32.png
:target: ../ressources/images/example_plot_32.png
:alt:
* - .. image:: ../ressources/images/example_plot_31.png
:target: ../ressources/images/example_plot_31.png
:alt:
- bumm
.. image:: ../ressources/images/example_plot_33.png
:target: ../ressources/images/example_plot_33.png
:alt:
Data harmonization and custom functions
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
SaQC includes functionality to harmonize the timestamps of one or more data
series. Also, you can write your own tests using a python-based
:doc:`extension language <GenericFunctions>`. This would look like this:
.. code-block::
varname;test;plot
SM2;shiftToFreq(freq="15Min");False
SM2;generic(func=(SM2 < 30));True
The above executes an internal framework that harmonizes the timestamps of SM2
to a 15min-grid (see data below). Further information about this routine can be
found in the :ref:`Flagging Functions Overview <flaggingFunctions>`.
.. code-block::
Date,SM1,SM1_flags,SM2,SM2_flags
2016-04-01 00:00:00,,,29.3157,OK
2016-04-01 00:05:48,32.685,OK,,
2016-04-01 00:15:00,,,29.3157,OK
2016-04-01 00:20:42,32.7428,OK,,
...
Also, all values where SM2 is below 30 are flagged via the custom function (see
plot below). You can learn more about the syntax of these custom functions
:doc:`here <GenericFunctions>`.
.. image:: ../ressources/images/example_plot_4.png
:target: ../ressources/images/example_plot_4.png
:alt: Example custom function
:py:func:`lala <docs.func_modules.outliers.flagRange>`
Offset Strings
--------------
All the `pandas offset aliases <https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#offset-aliases>`_ ars supported by SaQC. The following table lists some of the more relevant options:
.. list-table::
:header-rows: 1
* - Alias
- Description
* - ``"S"``\ , ``"s"``
- second
* - ``"T"``\ , ``"Min"``\ , ``"min"``
- minute
* - ``"H"``\ , ``"h"``
- hour
* - ``"D"``\ , ``"d"``
- calendar day
* - ``"W"``\ , ``"w"``
- week
* - ``"M"``\ , ``"m"``
- month
* - ``"Y"``\ , ``"y"``
- year
Multiples are build by preceeding the alias with the desired multiply (e.g ``"5Min"``\ , ``"4W"``\ )
Constants
---------
Flagging Constants
^^^^^^^^^^^^^^^^^^
The following flag constants are available and can be used to mark the quality of a data point:
.. list-table::
:header-rows: 1
* - Alias
- Description
* - ``GOOD``
- A value did pass all the test and is therefore considered to be valid
* - ``BAD``
- At least on test failed on the values and is therefore considered to be invalid
* - ``UNFLAGGED``
- The value has not got a flag yet. This might mean, that all tests passed or that no tests ran
How these aliases will be translated into 'real' flags (output of SaQC) dependes on the :doc:`flagging scheme <FlaggingSchemes>`
and might range from numerical values to string constants.
Numerical Constants
^^^^^^^^^^^^^^^^^^^
.. list-table::
:header-rows: 1
* - Alias
- Description
* - ``NAN``
- Not a number
Documentation Guide
===================
We document our code via docstrings in numpy-style.
Features, install and usage instructions and other more text intense stuff,
is written in extra documents.
The documents and the docstrings then are collected and rendered using `sphinx <https://www.sphinx-doc.org/>`_.
Documentation Strings
---------------------
*
Write docstrings for all public modules, functions, classes, and methods.
Docstrings are not necessary for non-public methods,
but you should have a comment that describes what the method does.
This comment should appear after the def line.
[\ `PEP8 <https://www.python.org/dev/peps/pep-0008/#documentation-strings>`_\ ]
*
Note that most importantly, the ``"""`` that ends a multiline docstring should be on a line by itself [\ `PEP8 <https://www.python.org/dev/peps/pep-0008/#documentation-strings>`_\ ] :
.. code-block:: python
"""Return a foobang
Optional plotz says to frobnicate the bizbaz first.
"""
*
For one liner docstrings, please keep the closing ``"""`` on the same line.
[\ `PEP8 <https://www.python.org/dev/peps/pep-0008/#documentation-strings>`_\ ]
Pandas Style
^^^^^^^^^^^^
We use `Pandas-style <https://pandas.pydata.org/pandas-docs/stable/development/contributing_docstring.html>`_ docstrings:
Flagger, data, field, etc.
--------------------------
use this:
.. code-block:: py
def foo(data, field, flagger):
"""
data : dios.DictOfSeries
A saqc-data object.
field : str
A field denoting a column in data.
flagger : saqc.flagger.BaseFlagger
A saqc-flagger object.
"""
IDE helper
^^^^^^^^^^
In pycharm one can activate autogeneration of numpy doc style like so:
#. ``File->Settings...``
#. ``Tools->Python Integrated Tools``
#. ``Docstrings->Docstring format``
#. Choose ``NumPy``
Docstring formatting pitfalls
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
* Latex is included via
.. code-block::
:math:`<latex_code>`
*
Latex commands need to be signified with **double** backlash! (\ ``\\mu`` instead of ``\mu``\ )
*
Nested lists need to be all of the same kind (either numbered or marked - otherwise result is salad)
* List items covering several lines in the docstring have to be all aligned - (so, not only the superfluent ones, but ALL, including the first one - otherwise result is salad)
* Start of a list has to be seperated from preceding docstring code by *one blank line* - (otherwise list items get just chained in one line and result is salad)
* Most formatting signifiers are not allowed to start or end with a space. (so no :math: `1+1 `, ` var2`, `` a=1 ``, ...)
* Do not include lines *only* containing two or more ``-`` signs, except it is the underscore line of the section heading (otherwise resulting html representation could be messed up)
hyperlinking docstrings
-----------------------
*
Link code content/modules via python roles.
*
Cite/link via the py domain roles. Link content ``bar``\ , that is registered to the API with the adress ``foo.bar`` and
shall be represented by the name ``link_name``\ , via:
.. code-block::
:py:role:`link_name <foo.bar>`
*
check out the *_api* folder in the `repository <https://git.ufz.de/rdm-software/saqc/-/tree/develop/sphinx-doc>`_ to get an
overview of already registered paths. Most important may be:
*
constants are available via ``saqc.constants`` - for example:
.. code-block::
:py:const:`~saqc.constants.BAD`
*
the ``~`` is a shorthand for hiding the module path and only displaying ``BAD``.
*
Functions are available via the "simulated" module ``Functions.saqc`` - for example:
.. code-block::
:py:func:`saqc.flagRange <saqc.Functions.flagRange>`
* Modules are available via the "simulated" package ``Functions.`` - for example:
.. code-block::
:py:mod:`generic <Functions.generic>`
* The saqc object and/or its content is available via:
.. code-block::
:py:class:`saqc.SaQC`
:py:meth:`saqc.SaQC.getResults`
* The Flags object and/or its content is available via:
.. code-block::
:py:class:`saqc.Flags`
*
you can add .rst files containing ``automodapi`` directives to the modulesAPI folder to make available more modules via pyroles
*
the Environment table, including variables available via config files is available as restfile located in the environment folder. (Use include directive to include, or linking syntax to link it.
Adding Markdown content to the Documentation
--------------------------------------------
*
By linking the markdown file "foo/bar.md", or any folder that contains markdown files directly,
you can trigger sphinx - ``recommonmark``\ , which is fine for not-too complex markdown documents.
*
Especially, if you have multiple markdown files that are mutually linked and/or, contain tables of certain fencieness (tables with figures),
you will have to take some minor extra steps:
*
You will have to gather all markdown files in subfolders of "sphinx-doc" directory (you can have multiple subfolders).
*
To include a folder named ``foo`` of markdown files in the documentation, or refer to content in ``foo``\ , you will have
to append the folder name to the MDLIST variable in the Makefile:
*
The markdown files must be in one of the subfolders listed in MDLIST - they cant be gathered in nested subfolders.
*
You can not link to sections in other markdown files, that contain the ``-`` character (sorry).
*
The Section structure/ordering must be consistent in the ReST sence (otherwise they wont appear - thats also required if you use plain ``recommonmark``
*
You can link to ressources - like pictures and include them in the markdown, if the pictures are in (possibly another) folder in ``\sphinx-doc`` and the paths to this ressources are given relatively!
*
You can include a markdown file in a rest document, by appending '_m2r' to the folder name when linking it path_wise.
So, to include the markdown file 'foo/bar.md' in a toc tree for example - you would do something like:
*
the Environment table, including variables availbe via config files is available as restfile located in the environment folder. (Use include directive to include, or linking syntax to link it.)
.. code-block:: python
.. toctree::
:hidden:
:maxdepth: 1
foo_m2r/bar
Linking ReST sources in markdown documentation
----------------------------------------------
*
If you want to hyperlink/include other sources from the sphinx documentation that are rest-files (and docstrings),
you will not be able to include them in a way, that they will appear in you markdown rendering. - however - there is
the posibillity to just include the respective rest directives (see directive/link :ref:`examples <how_to_doc_md_m2r/HowToDoc:hyperlinking docstrings>`\ ).
*
This will mess up your markdown code - meaning that you will have
those rest snippets flying around, but when the markdown file gets converted to the rest file and build into the
sphinx html build, the linked sources will be integrated properly. The syntax for linking rest sources is as
follows as follows:
*
to include the link to the rest source ``functions.rst`` in the folder ``foo``\ , under the name ``bar``\ , you would need to insert:
.. code-block:: python
:doc:`foo <rel_path/functions>`
*
to link to a section with name ``foo`` in a rest source named ``bumm.rst``\ , under the name ``bar``\ , you would just insert:
.. code-block::
:ref:`bar <relative/path/from/sphinx/root/bumm:foo>`
*
in that manner you might be able to smuggle most rest directives through into the resulting html build. Especially if you want to link to the docstrings of certain (domain specific) objects. Lets say you want to link to the *function* ``saqc.funcs.flagRange`` under the name ``ranger`` - you just include:
.. code-block::
:py:func:`Ranger <saqc.funcs.flagRange>`
whereas the ``:func:`` part determines the role, the object is documented as. See `this page <https://www.sphinx-doc.org/en/master/#ref-role>`_ for an overview of the available roles
Exponential Drift Model and Correction
======================================
It is assumed, that, in between maintenance events, there is a drift effect shifting the measurements in a way, that the resulting value course can be described by the exponential model $\ ``M``\ $:
$\ ``M(t, a, b, c) = a + b(e^{ct}-1)``\ $
We consider the timespan in between maintenance events to be scaled to the $\ ``[0,1]``\ $ interval.
To additionally make sure, the modeled curve can be used to calibrate the value course, we added the following two conditions.
$\ ``M(0, a, b, c) = y_0``\ $
$\ ``M(1, a, b, c) = y_1``\ $
With $\ ``y_0``\ $ denoting the mean value obtained from the first 6 meassurements directly after the last maintenance event, and $\ ``y_1``\ $ denoting the mean over the 6 meassurements, directly preceeding the beginning of the next maintenance event.
Solving the equation, one obtains the one-parameter Model:
$\ ``M_{drift}(t, c) = y_0 + ( \frac{y_1 - y_0}{e^c - 1} ) (e^{ct} - 1)``\ $
For every datachunk in between maintenance events.
After having found the parameter $\ ``c^*``\ $, that minimizes the squared residues between data and drift model, the correction is performed by bending the fitted curve, $\ ``M_{drift}(t, c^*)``\ $, in a way, that it matches $\ ``y_2``\ $ at $\ ``t=1``\ $ (,with $\ ``y_2``\ $ being the mean value observed directly after the end of the next maintenance event).
This bended curve is given by:
$\ ``M_{shift}(t, c^{*}) = M(t, y_0, \frac{y_1 - y_0}{e^c - 1} , c^*)``\ $
the new values $\ ``y_{shifted}``\ $ are computed via:
$\ ``y_{shifted} = y + M_{shift} - M_{drift}``\ $
saqc
====
.. automodapi:: Functions.saqc
:no-heading:
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment