Skip to content
Snippets Groups Projects
Commit fabdda9e authored by Peter Lünenschloß's avatar Peter Lünenschloß
Browse files

solved some warnings

parent 01034fed
No related branches found
No related tags found
5 merge requests!685Release 2.4,!684Release 2.4,!567Release 2.2.1,!566Release 2.2,!501Release 2.1
......@@ -5,7 +5,7 @@ The tutorial aims to introduce the usage of ``SaQC`` methods, in order to obtain
from given time series data input. Regularly sampled time series data, is data, that exhibits a constant temporal
spacing in between subsequent data points.
In the following steps, the tutorial guides through the usage of the *SaQC* :py:mod:`resampling <saqc.resampling>`
In the following steps, the tutorial guides through the usage of the *SaQC* :doc:`resampling <../funcSummaries/generic>`
library.
#. Initially, we introduce and motivate regularisation techniques and we do import the tutorial data.
......@@ -118,7 +118,7 @@ Regularisations
So lets transform the measurements timestamps to have a regular *10* minutes frequency. In order to do so,
we have to decide what to do with each time stamps associated data, when we alter the timestamps value.
Basically, there are three types of :doc:`regularisations <../moduleAPIs/Functionsresampling>` methods:
Basically, there are three types of :doc:`regularisations <../funcSummaries/resampling>` methods:
#. We could keep the values as they are, and thus,
......@@ -129,7 +129,7 @@ Basically, there are three types of :doc:`regularisations <../moduleAPIs/Functio
Shift
-----
Lets apply a simple shift via the :py:func:`shift <Functions.saqc.shift>` method.
Lets apply a simple shift via the :py:meth:`~saqc.SaQC.shift` method.
.. doctest::
......@@ -149,7 +149,7 @@ Freq parameter
We passed the ``freq`` keyword of the intended sampling frequency in terms of a
`date alias <https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#offset-aliases>`_ string. All of
the :doc:`regularisations <../moduleAPIs/Functionsresampling>` methods have such a frequency keyword,
the :doc:`regularisations <../funcSummaries/resampling>` methods have such a frequency keyword,
and it just determines the sampling rate, the resulting regular timeseries will have.
Shifting Method
......@@ -157,7 +157,7 @@ Shifting Method
With the ``method`` keyword, we determined the direction of the shift. We passed it the string ``bshift`` -
which applies a *backwards* shift, so data points get shifted *backwards*\ , until they match a timestamp
that is a multiple of *10* minutes. (See :py:func:`saqc.shift <Functions.saqc.shift>` documentation for more
that is a multiple of *10* minutes. (See :py:meth:`~saqc.SaQC.shift` documentation for more
details on the keywords.)
Lets see, how the data is now sampled. Therefore, we use the ``data_raw`` Atribute from the
......@@ -193,7 +193,7 @@ We see, the first and last *10* datapoints of both, the original data time serie
Obveously, the shifted data series now exhibits a regular sampling rate of *10* minutes, with the index
ranging from the latest timestamp, that is a multiple of *10* minutes and preceeds the initial timestamp
of the original data, up to the first *10* minutes multiple, that succeeds the last original datas timestamp.
This is default behavior to all the :doc:`regularisations <../moduleAPIs/Functionsresampling>` provided by ``saqc``.
This is default behavior to all the :doc:`regularisations <../funcSummaries/resampling>` provided by ``saqc``.
Data Loss and Empty Intervals
-----------------------------
......@@ -234,9 +234,9 @@ If there are multiple values present within an interval with size according to t
``freq``\ , this values get reduced to one single value, that will get assigned to the timestamp associated with the
interval.
This reduction depends on the selected :doc:`regularisation <../moduleAPIs/Functionsresampling>` method.
This reduction depends on the selected :doc:`regularisation <../funcSummaries/resampling>` method.
For example, :ref:`above <cook_books/DataRegularisation:shift>`\ , we applied a backwards :py:func:`shift <Functions.saqc.shift>` with a *10* minutes frequency.
For example, :ref:`above <cook_books/DataRegularisation:shift>`\ , we applied a backwards :py:meth:`~saqc.SaQC.shift` with a *10* minutes frequency.
As a result, the first value, encountered after any multiple of *10* minutes, gets shifted backwards to be aligned with
the desired frequency and any other value in that *10* minutes interval just gets discarded.
......@@ -309,7 +309,7 @@ Aggregation
-----------
If we want to comprise several values by aggregation and assign the result to the new regular timestamp, instead of
selecting a single one, we can do this, with the :py:func:`saqc.resample <Functions.saqc.resample>` method.
selecting a single one, we can do this, with the :py:meth:`~saqc.SaQC.resample` method.
Lets resample the *SoilMoisture* data to have a *20* minutes sample rate by aggregating every *20* minutes intervals
content with the arithmetic mean (which is implemented by numpies ``numpy.mean`` function for example).
......@@ -366,13 +366,13 @@ Interpolation
Another common way of obtaining regular timestamps, is, the interpolation of data at regular timestamps.
In the pool of py:mod:`regularisation <Functions.saqc.resampling>` methods, is available the
:py:func:`saqc.interpolate <Functions.saqc.interpolate>` method.
:py:meth:`~saqc.SaQC.interpolate` method.
Lets apply a linear interpolation onto the dataset. To access
linear interpolation, we pass the ``method`` parameter the string ``"time"``. This
applies an interpolation, that is sensitive to the difference in temporal gaps
(as opposed by ``"linear"``\ , wich expects all the gaps to be equal). Get an overview
of the possible interpolation methods in the :py:func:`saqc.interpolate <Functions.saqc.interpolate>`
of the possible interpolation methods in the :py:meth:`~saqc.SaQC.interpolate>`
documentation. Lets check the results:
>>> qc = qc.interpolate('SoilMoisture', target='SoilMoisture_linear', freq='10min', method='time')
......@@ -419,12 +419,12 @@ a :ref:`valid <cook_books/DataRegularisation:valid data>` value at ``2021-03-20
This behavior is intended to reflect the sparsity of the original data in the
regularized data set. The behavior can be circumvented by applying the more general
:py:func:`saqc.interpolateIndex <Functions.saqc.interpolateIndex>`.
:py:meth:`~saqc.SaQC.interpolateIndex`.
Linear Interpolation
~~~~~~~~~~~~~~~~~~~~
Note, that there is a wrapper available for linear interpolation: :py:func:`saqc.linear <Functions.saqc.linear>`.
Note, that there is a wrapper available for linear interpolation: :py:meth:`~saqc.SaQC.linear`.
Flags and Regularisation
------------------------
......@@ -461,7 +461,7 @@ We can circumvent having that gap, by flagging that value before interpolation.
works, because there is actually another, now valid value, available in the interval
in between ``2021-01-01 15:40:00`` and ``2021-01-01 15:50:00``\ , that can serve as right pillow point
for the interpolation at ``2021-01-01 15:40:00``. So lets flag all the values smaller than *0*
with the :py:func:`saqc.flagRange <Functions.saqc.flagRange>` method and after this,
with the :py:meth:`~saqc.SaQC.flagRange` method and after this,
do the interpolation.
>>> qc = qc.flagRange('SoilMoisture', min=0)
......
......@@ -109,7 +109,7 @@ that refers to the loaded data.
>>> qc = saqc.SaQC(data)
The only timeseries have here, is the *incidents* dataset. We can have a look at the data and obtain the above plot through
the method :py:meth:`plot <Functions.saqc.plot>`:
the method :py:meth:`~saqc.SaQC.plot`:
.. doctest:: exampleOD
......@@ -125,7 +125,7 @@ Rolling Mean
^^^^^^^^^^^^
Easiest thing to do, would be, to apply some rolling mean
model via the method :py:meth:`roll <Functions.saqc.roll>`.
model via the method :py:meth:`saqc.SaQC.roll`.
.. doctest:: exampleOD
......@@ -161,7 +161,7 @@ Polynomial Fit
^^^^^^^^^^^^^^
Another common approach, is, to fit polynomials of certain degrees to the data.
:py:class:`SaQC <Core.Core.SaQC>` provides the polynomial fit function :py:meth:`fitPolynomial <Core.Core.SaQC.fitPolynomial>`:
:py:class:`SaQC <Core.Core.SaQC>` provides the polynomial fit function :py:meth:`~saqc.SaQC.fitPolynomial`:
.. doctest:: exampleOD
......@@ -174,7 +174,7 @@ Custom Models
^^^^^^^^^^^^^
If you want to apply a completely arbitrary function to your data, without pre-chunking it by a rolling window,
you can make use of the more general :py:meth:`processGeneric <Functions.saqc.process>` function.
you can make use of the more general :py:meth:`~saqc.SaQC.process` function.
Lets apply a smoothing filter from the `scipy.signal <https://docs.scipy.org/doc/scipy/reference/signal.html>`_
module. We wrap the filter generator up into a function first:
......@@ -187,7 +187,7 @@ module. We wrap the filter generator up into a function first:
return pd.Series(filtfilt(b, a, x), index=x.index)
This function object, we can pass on to the :py:meth:`processGeneric <Core.Core.SaQC.processGeneric>` methods ``func`` argument.
This function object, we can pass on to the :py:meth:`~saqc.SaQC.processGeneric` methods ``func`` argument.
.. doctest:: exampleOD
......@@ -224,7 +224,7 @@ Residues
We want to evaluate the residues of one of our models model, in order to score the outlierish-nes of every point.
Therefor we just stick to the initially calculated rolling mean curve.
First, we retrieve the residues via the :py:meth:`processGeneric <Core.Core.SaQC.processGeneric>` method.
First, we retrieve the residues via the :py:meth:`~saqc.SaQC.processGeneric` method.
This method always comes into play, when we want to obtain variables, resulting from basic algebraic
manipulations of one or more input variables.
......@@ -249,7 +249,7 @@ for the point lying in the center of every window, we would define our function
>>> z_score = lambda D: abs((D[14] - np.mean(D)) / np.std(D))
And subsequently, use the :py:meth:`~Core.Core.SaQC.roll` method to make a rolling window application with the scoring
And subsequently, use the :py:meth:`~saqc.SaQC.roll` method to make a rolling window application with the scoring
function:
.. doctest:: exampleOD
......@@ -273,7 +273,7 @@ So the attempt works fine, only because our data set is small and strictly regul
Meaning that it has constant temporal distances between subsequent meassurements.
In order to tweak our calculations and make them much more stable, it might be useful to decompose the scoring
into seperate calls to the :py:meth:`roll <Functions.saqc.roll>` function, by calculating the series of the
into seperate calls to the :py:meth:`~saqc.SaQC.roll` function, by calculating the series of the
residues *mean* and *standard deviation* seperately:
.. doctest:: exampleOD
......@@ -285,12 +285,12 @@ residues *mean* and *standard deviation* seperately:
With huge datasets, this will be noticably faster, compared to the method presented :ref:`initially <cook_books/OutlierDetection:Scores>`\ ,
because ``saqc`` dispatches the rolling with the basic numpy statistic methods to an optimized pandas built-in.
Also, as a result of the :py:meth:`~Core.Core.SaQC.roll` assigning its results to the center of every window,
Also, as a result of the :py:meth:`~saqc.SaQC.roll` assigning its results to the center of every window,
all the values are centered and we dont have to care about window center indices when we are generating
the *Z*\ -Scores from the two series.
We simply combine them via the
:py:meth:`~Core.Core.SaQC.processGeneric` method, in order to obtain the scores:
:py:meth:`~saqc.SaQC.processGeneric` method, in order to obtain the scores:
.. doctest:: exampleOD
......@@ -315,7 +315,7 @@ Flagging the Scores
We can now implement the common `rule of thumb <https://en.wikipedia.org/wiki/68%E2%80%9395%E2%80%9399.7_rule>`_\ ,
that any *Z*\ -score value above *3* may indicate an outlierish data point,
by applying the :py:meth:`~Core.Core.SaQC.flagRange` method with a `max` value of *3*.
by applying the :py:meth:`~saqc.SaQC.flagRange` method with a `max` value of *3*.
.. doctest:: exampleOD
......@@ -384,7 +384,7 @@ some flags based on some condition.
In order want to *unflag* those values, that do not relate to
sufficiently large residues, we assign them the :py:const:`~saqc.constants.UNFLAGGED` flag.
Therefore, we make use of the :py:meth:`~Core.Core.SaQC.flagGeneric` method.
Therefore, we make use of the :py:meth:`~saqc.SaQC.flagGeneric` method.
This method usually comes into play, when we want to assign flags based on the evaluation of logical expressions.
So, we check out, which residues evaluate to a level below *20*\ , and assign the
......@@ -416,7 +416,7 @@ Including multiple conditions
If we do not want to first set flags, only to remove the majority of them in the next step, we also
could circumvent the :ref:`unflagging <cook_books/OutlierDetection:Unflagging>` step, by adding to the call to
:py:meth:`~Core.Core.SaQC.flagRange` the condition for the residues having to be above *20*
:py:meth:`~saqc.SaQC.flagRange` the condition for the residues having to be above *20*
.. doctest:: exampleOD
......
This diff is collapsed.
......@@ -67,7 +67,7 @@ available functions appear as methods of the ``SaQC`` class, so we can add a te
qc = qc.flagRange("a", min=20, max=80)
:py:func:`flagRange <Functions.saqc.flagRange>` is the easiest of all functions and simply marks all values
:py:meth:`~saqc.SaQC.flagRange>` is the easiest of all functions and simply marks all values
smaller than ``min`` and larger than ``max``. This feature by itself wouldn't be worth the trouble of getting
into ``SaQC``, but it serves as a simple example. All functions expect the name of a column in the given
``data`` as their first positional argument (called ``field``). The function ``flagRange`` (like all other
......@@ -75,8 +75,8 @@ functions for that matter) is then called on the given ``field`` (only).
Each call to a ``SaQC`` method returns a new object (all itermediate objects share the main internal data
structures, so we only create shallow copies). Setting up more complex quality control suites (here by calling
the additional methods :py:func:`flagConstants <Functions.saqc.flagConstants>` and
:py:func:`flagByGrubbs <Functions.saqc.flagByGrubbs>`) is therefore simply a matter of method chaining.
the additional methods :py:meth:`~saqc.SaQC.flagConstants` and
:py:meth:`~saqc.SaQC.flagByGrubbs`) is therefore simply a matter of method chaining.
.. testcode:: python
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment