Skip to content
Snippets Groups Projects

Compare revisions

Changes are shown as if the source revision was being merged into the target revision. Learn more about comparing revisions.

Source

Select target project
No results found

Target

Select target project
  • berntm/saqc
  • rdm-software/saqc
  • schueler/saqc
3 results
Show changes
Commits on Source (12)
Showing
with 194 additions and 76 deletions
...@@ -9,13 +9,18 @@ SPDX-License-Identifier: GPL-3.0-or-later ...@@ -9,13 +9,18 @@ SPDX-License-Identifier: GPL-3.0-or-later
This changelog starts with version 2.0.0. Basically all parts of the system, including the format of this changelog, have been reworked between the releases 1.4 and 2.0. Preceding the major breaking release 2.0, the maintenance of this file was rather sloppy, so we won't provide a detailed change history for early versions. This changelog starts with version 2.0.0. Basically all parts of the system, including the format of this changelog, have been reworked between the releases 1.4 and 2.0. Preceding the major breaking release 2.0, the maintenance of this file was rather sloppy, so we won't provide a detailed change history for early versions.
## [Unreleased] ## Unreleased
[List of commits](https://git.ufz.de/rdm-software/saqc/-/compare/v2.0.1...develop)
### Added ### Added
### Changed ### Changed
- `flagOffsets` parameters `thresh` and `thresh_relative` now both are optional
### Removed ### Removed
### Fixed ### Fixed
- `flagOffset` bug with zero-valued threshold
## [2.0.1] - 2021-12-20 ## [2.0.1](https://git.ufz.de/rdm-software/saqc/-/tags/v2.0.1) - 2021-12-20
[List of commits](https://git.ufz.de/rdm-software/saqc/-/compare/v2.0.0...v2.0.1)
### Added ### Added
- CLI now accepts remote configuration and data files as URL - CLI now accepts remote configuration and data files as URL
- new function `transferFlags` - new function `transferFlags`
...@@ -41,5 +46,7 @@ This changelog starts with version 2.0.0. Basically all parts of the system, inc ...@@ -41,5 +46,7 @@ This changelog starts with version 2.0.0. Basically all parts of the system, inc
- `field` was not masked for resampling functions - `field` was not masked for resampling functions
- allow custom registered functions to overwrite built-ins. - allow custom registered functions to overwrite built-ins.
## [2.0.0] - 2021-11-25 ## [2.0.0](https://git.ufz.de/rdm-software/saqc/-/tags/v2.0.0) - 2021-11-25
[List of commits](https://git.ufz.de/rdm-software/saqc/-/compare/v1.5.0...v2.0.0)
This release marks the beginning of a new release cycle. Basically the entire system got reworked between versions 1.4 and 2.0, a detailed changelog is not recoverable and/or useful. This release marks the beginning of a new release cycle. Basically the entire system got reworked between versions 1.4 and 2.0, a detailed changelog is not recoverable and/or useful.
...@@ -3,7 +3,7 @@ title: SaQC - System for automated Quality Control ...@@ -3,7 +3,7 @@ title: SaQC - System for automated Quality Control
message: "Please cite this software using these metadata." message: "Please cite this software using these metadata."
type: software type: software
version: 2.0.0 version: 2.0.0
doi: doi: https://doi.org/10.5281/zenodo.5888547
date-released: "2021-11-25" date-released: "2021-11-25"
license: "GPL-3.0" license: "GPL-3.0"
repository-code: "https://git.ufz.de/rdm-software/saqc" repository-code: "https://git.ufz.de/rdm-software/saqc"
...@@ -24,7 +24,7 @@ authors: ...@@ -24,7 +24,7 @@ authors:
affiliation: >- affiliation: >-
Helmholtz Centre for Environmental Research - Helmholtz Centre for Environmental Research -
UFZ UFZ
orcid: 'https://orcid.org/0000-0000-0000-0000' orcid: 'https://orcid.org/0000-0001-5106-9057'
- given-names: Peter - given-names: Peter
family-names: Lünenschloß family-names: Lünenschloß
email: peter.luenenschloss@ufz.de email: peter.luenenschloss@ufz.de
......
...@@ -24,41 +24,38 @@ We implement the following naming conventions: ...@@ -24,41 +24,38 @@ We implement the following naming conventions:
### Argument names in public functions signatures ### Argument names in public functions signatures
first, its not necessary to have *talking* arg-names, in contrast to variable names in First, in contrast to variable names in code, it is not necessary to have *talking* function argument names.
code. This is, because one always must read the documentation. To use and parameterize a function, A user is always expected to have had acknowledged the documentation. Using and parameterizing a function,
just by guessing the meaning of the argument names and not read the docs, just by guessing the meaning of the argument names, without having read the documentation,
will almost never work. thats why, we dont have the obligation to make names (very) will almost never work. That is the reason, why we dont have the obligation to make names (very)
talkative. talkative.
second, because of the nature of a function (to have a *simple* way to use complex code), Second, from the nature of a function to deliver a *simple* way of using complex code, it follows, that simple and short names are to prefer. This means, the encoding of *irrelevant* informations in names should be omitted.
its common to use simple and short names. This means, to omit any *irrelevant* information.
For example if we have a function that fit a polynomial on some data with three arguments. For example, take a function of three arguments, that fits a polynomial to some data.
Lets say we have: Lets say we have:
- the data input, - the data input,
- a threshold that defines a cutoff point for a calculation on a polynomial and - a threshold, that defines a cutoff point for a calculation on a polynomial and
- a third argument. - a third argument.
one could name the args `data, poly_cutoff_threshold, ...`, but much better names would One could name the corresponding arguments: `data, poly_cutoff_threshold, ...`. However, much better names would
be `data, thresh, ...`, because a caller dont need the extra information, be: `data, thresh, ...`, because a caller that is aware of a functions documentation doesnt need the extra information,
stuffed in the name. encoded in the name.
If the third argument is also some kind of threshold, If the third argument is also some kind of threshold,
one can use `data, cutoff, thresh`, because the *thresh-* information of the `cutoff` one can use `data, cutoff, thresh`, because the *thresh-* information of the `cutoff`
parameter is not crucial and the caller knows that this is a threshold from the docstring. parameter is not crucial, and the caller knows that this is a threshold from having studied the docstring, anyways.
third, underscores give a nice feedback if one doing wrong or over complex. Third, underscores give a nice implicit feedback, on whether one is doing wrong or getting over complex in the naming behavior.
No underscore is fine, one underscore is ok, if the information is *really necessary* (see above), To have no underscore, is just fine. Having one underscore, is ok, if the encoded information appended through the underscore is *really necessary* (see above).
but if one use two or more underscores, one should think of a better naming, If one uses two or more underscores, one should think of a better naming or omit some information.
or omit some information. Sure, although it is seldom, it might sometimes be necessary to use two underscores, but still the usage of two underscores is considered bad style.
Sure, seldom but sometimes it is necessary to use 2 underscores, but we consider it as bad style. Using three or more underscores is not allowed unless having issued a exhaustive and accepted (by at least one core developer per underscore) reasoning.
Using 3 or more underscores, is not allowed unless have write an reasoning and get it
signed by at least as many core developers as underscores one want to use.
In short the naming should *give a very, very rough idea* of the purpose of the argument, In short, the naming should *give a very, very rough idea* of the purpose of the argument,
but not *explain* the usage or the purpose. but not *explain* the usage or the purpose.
It is not a shame to name a parameter just `n` or `alpha` etc. if for example the algorithm It is not a shame to name a parameter just `n` or `alpha` etc., if, for example, the algorithm
(from the paper etc.) name it alike. (from the paper etc.) names it alike.
### Test Functions ### Test Functions
......
...@@ -112,9 +112,14 @@ of the documentation. ...@@ -112,9 +112,14 @@ of the documentation.
## Changelog ## Changelog
All notable changes to this project will be documented in [CHANGELOG.md](CHANGELOG.md). All notable changes to this project will be documented in [CHANGELOG.md](CHANGELOG.md).
## Contributing ## Get involved
### Contributing
You found a bug or you want to suggest some cool features? Please refer to our [contributing guidelines](CONTRIBUTING.md) to see how you can contribute to SaQC. You found a bug or you want to suggest some cool features? Please refer to our [contributing guidelines](CONTRIBUTING.md) to see how you can contribute to SaQC.
### User support
If you need help or have a question, you can use the SaQC user support mailing list: [saqc-support@ufz.de](mailto:saqc-support@ufz.de)
## Copyright and License ## Copyright and License
Copyright(c) 2021, [Helmholtz-Zentrum für Umweltforschung GmbH -- UFZ](https://www.ufz.de). All rights reserved. Copyright(c) 2021, [Helmholtz-Zentrum für Umweltforschung GmbH -- UFZ](https://www.ufz.de). All rights reserved.
...@@ -127,8 +132,9 @@ For full details, see [LICENSE](LICENSE.md). ...@@ -127,8 +132,9 @@ For full details, see [LICENSE](LICENSE.md).
... ...
## Publications ## Publications
... coming soon...
## How to cite SaQC ## How to cite SaQC
... If SaQC is advancing your research, please cite as:
> Schäfer, David; Palm, Bert; Lünenschloß, Peter. (2021). System for automated Quality Control - SaQC. Zenodo. https://doi.org/10.5281/zenodo.5888547
...@@ -121,6 +121,6 @@ if __name__ == "__main__": ...@@ -121,6 +121,6 @@ if __name__ == "__main__":
# t1 = time.time() # t1 = time.time()
# print(t1-t0) # print(t1-t0)
rr = [10 ** r for r in range(1, 6)] rr = [10**r for r in range(1, 6)]
c = range(10, 60, 10) c = range(10, 60, 10)
gen_all(rr, c) gen_all(rr, c)
...@@ -36,7 +36,7 @@ if __name__ == "__main__": ...@@ -36,7 +36,7 @@ if __name__ == "__main__":
) )
def f(s): def f(s):
sec = 10 ** 9 sec = 10**9
s.index = pd.to_datetime(s.index * sec) s.index = pd.to_datetime(s.index * sec)
return s return s
......
...@@ -72,7 +72,7 @@ def df_unaligned__(): ...@@ -72,7 +72,7 @@ def df_unaligned__():
def dios_fuzzy__(nr_cols=None, mincol=0, maxcol=10, itype=None): def dios_fuzzy__(nr_cols=None, mincol=0, maxcol=10, itype=None):
nr_of_cols = nr_cols if nr_cols else randint(mincol, maxcol + 1) nr_of_cols = nr_cols if nr_cols else randint(mincol, maxcol + 1)
ns = 10 ** 9 ns = 10**9
sec_per_year = 31536000 sec_per_year = 31536000
ITYPES = [IntItype, FloatItype, DtItype, ObjItype] ITYPES = [IntItype, FloatItype, DtItype, ObjItype]
......
...@@ -23,7 +23,7 @@ class Breaks: ...@@ -23,7 +23,7 @@ class Breaks:
gap_window: str, gap_window: str,
group_window: str, group_window: str,
flag: float = BAD, flag: float = BAD,
**kwargs **kwargs,
) -> saqc.SaQC: ) -> saqc.SaQC:
return self._defer("flagIsolated", locals()) return self._defer("flagIsolated", locals())
...@@ -34,6 +34,6 @@ class Breaks: ...@@ -34,6 +34,6 @@ class Breaks:
window: str, window: str,
min_periods: int = 1, min_periods: int = 1,
flag: float = BAD, flag: float = BAD,
**kwargs **kwargs,
) -> saqc.SaQC: ) -> saqc.SaQC:
return self._defer("flagJumps", locals()) return self._defer("flagJumps", locals())
...@@ -20,7 +20,7 @@ class Constants: ...@@ -20,7 +20,7 @@ class Constants:
maxna: int = None, maxna: int = None,
maxna_group: int = None, maxna_group: int = None,
flag: float = BAD, flag: float = BAD,
**kwargs **kwargs,
) -> saqc.SaQC: ) -> saqc.SaQC:
return self._defer("flagByVariance", locals()) return self._defer("flagByVariance", locals())
......
...@@ -22,6 +22,6 @@ class Curvefit: ...@@ -22,6 +22,6 @@ class Curvefit:
window: Union[int, str], window: Union[int, str],
order: int, order: int,
min_periods: int = 0, min_periods: int = 0,
**kwargs **kwargs,
) -> saqc.SaQC: ) -> saqc.SaQC:
return self._defer("fitPolynomial", locals()) return self._defer("fitPolynomial", locals())
...@@ -32,7 +32,7 @@ class Drift: ...@@ -32,7 +32,7 @@ class Drift:
/ len(x), / len(x),
method: LinkageString = "single", method: LinkageString = "single",
flag: float = BAD, flag: float = BAD,
**kwargs **kwargs,
) -> saqc.SaQC: ) -> saqc.SaQC:
return self._defer("flagDriftFromNorm", locals()) return self._defer("flagDriftFromNorm", locals())
...@@ -47,7 +47,7 @@ class Drift: ...@@ -47,7 +47,7 @@ class Drift:
) )
/ len(x), / len(x),
flag: float = BAD, flag: float = BAD,
**kwargs **kwargs,
) -> saqc.SaQC: ) -> saqc.SaQC:
return self._defer("flagDriftFromReference", locals()) return self._defer("flagDriftFromReference", locals())
...@@ -57,7 +57,7 @@ class Drift: ...@@ -57,7 +57,7 @@ class Drift:
maintenance_field: str, maintenance_field: str,
model: Callable[..., float] | Literal["linear", "exponential"], model: Callable[..., float] | Literal["linear", "exponential"],
cal_range: int = 5, cal_range: int = 5,
**kwargs **kwargs,
) -> saqc.SaQC: ) -> saqc.SaQC:
return self._defer("correctDrift", locals()) return self._defer("correctDrift", locals())
...@@ -68,7 +68,7 @@ class Drift: ...@@ -68,7 +68,7 @@ class Drift:
model: CurveFitter, model: CurveFitter,
tolerance: Optional[str] = None, tolerance: Optional[str] = None,
epoch: bool = False, epoch: bool = False,
**kwargs **kwargs,
) -> saqc.SaQC: ) -> saqc.SaQC:
return self._defer("correctRegimeAnomaly", locals()) return self._defer("correctRegimeAnomaly", locals())
...@@ -80,6 +80,6 @@ class Drift: ...@@ -80,6 +80,6 @@ class Drift:
window: str, window: str,
min_periods: int, min_periods: int,
tolerance: Optional[str] = None, tolerance: Optional[str] = None,
**kwargs **kwargs,
) -> saqc.SaQC: ) -> saqc.SaQC:
return self._defer("correctOffset", locals()) return self._defer("correctOffset", locals())
...@@ -26,6 +26,6 @@ class Noise: ...@@ -26,6 +26,6 @@ class Noise:
sub_thresh: float = None, sub_thresh: float = None,
min_periods: int = None, min_periods: int = None,
flag: float = BAD, flag: float = BAD,
**kwargs **kwargs,
) -> saqc.SaQC: ) -> saqc.SaQC:
return self._defer("flagByStatLowPass", locals()) return self._defer("flagByStatLowPass", locals())
...@@ -77,9 +77,9 @@ class Outliers: ...@@ -77,9 +77,9 @@ class Outliers:
def flagOffset( def flagOffset(
self, self,
field: str, field: str,
thresh: float,
tolerance: float, tolerance: float,
window: Union[int, str], window: Union[int, str],
thresh: Optional[float] = None,
thresh_relative: Optional[float] = None, thresh_relative: Optional[float] = None,
flag: float = BAD, flag: float = BAD,
**kwargs, **kwargs,
......
...@@ -20,6 +20,6 @@ class Pattern: ...@@ -20,6 +20,6 @@ class Pattern:
normalize=True, normalize=True,
plot=False, plot=False,
flag=BAD, flag=BAD,
**kwargs **kwargs,
) -> saqc.SaQC: ) -> saqc.SaQC:
return self._defer("flagPatternByDTW", locals()) return self._defer("flagPatternByDTW", locals())
...@@ -24,7 +24,7 @@ class Residues: ...@@ -24,7 +24,7 @@ class Residues:
window: Union[str, int], window: Union[str, int],
order: int, order: int,
min_periods: Optional[int] = 0, min_periods: Optional[int] = 0,
**kwargs **kwargs,
) -> saqc.SaQC: ) -> saqc.SaQC:
return self._defer("calculatePolynomialResidues", locals()) return self._defer("calculatePolynomialResidues", locals())
...@@ -35,6 +35,6 @@ class Residues: ...@@ -35,6 +35,6 @@ class Residues:
func: Callable[[pd.Series], np.ndarray] = np.mean, func: Callable[[pd.Series], np.ndarray] = np.mean,
min_periods: Optional[int] = 0, min_periods: Optional[int] = 0,
center: bool = True, center: bool = True,
**kwargs **kwargs,
) -> saqc.SaQC: ) -> saqc.SaQC:
return self._defer("calculateRollingResidues", locals()) return self._defer("calculateRollingResidues", locals())
...@@ -20,6 +20,6 @@ class Transformation: ...@@ -20,6 +20,6 @@ class Transformation:
field: str, field: str,
func: Callable[[pd.Series], pd.Series], func: Callable[[pd.Series], pd.Series],
freq: Optional[Union[float, str]] = None, freq: Optional[Union[float, str]] = None,
**kwargs **kwargs,
) -> saqc.SaQC: ) -> saqc.SaQC:
return self._defer("transform", locals()) return self._defer("transform", locals())
...@@ -102,7 +102,7 @@ class PositionalScheme(TranslationScheme): ...@@ -102,7 +102,7 @@ class PositionalScheme(TranslationScheme):
thist = flags.history[field].hist.replace(self._BACKWARD).astype(int) thist = flags.history[field].hist.replace(self._BACKWARD).astype(int)
# concatenate the single flag values # concatenate the single flag values
ncols = thist.shape[-1] ncols = thist.shape[-1]
init = 9 * 10 ** ncols init = 9 * 10**ncols
bases = 10 ** np.arange(ncols - 1, -1, -1) bases = 10 ** np.arange(ncols - 1, -1, -1)
tflags = init + (thist * bases).sum(axis=1) tflags = init + (thist * bases).sum(axis=1)
......
...@@ -33,7 +33,7 @@ def fitPolynomial( ...@@ -33,7 +33,7 @@ def fitPolynomial(
window: int | str, window: int | str,
order: int, order: int,
min_periods: int = 0, min_periods: int = 0,
**kwargs **kwargs,
) -> Tuple[DictOfSeries, Flags]: ) -> Tuple[DictOfSeries, Flags]:
""" """
Fits a polynomial model to the data. Fits a polynomial model to the data.
...@@ -117,7 +117,7 @@ def _fitPolynomial( ...@@ -117,7 +117,7 @@ def _fitPolynomial(
set_flags: bool = True, set_flags: bool = True,
min_periods: int = 0, min_periods: int = 0,
return_residues: bool = False, return_residues: bool = False,
**kwargs **kwargs,
) -> Tuple[DictOfSeries, Flags]: ) -> Tuple[DictOfSeries, Flags]:
# TODO: some (rather large) parts are functional similar to saqc.funcs.rolling.roll # TODO: some (rather large) parts are functional similar to saqc.funcs.rolling.roll
......
...@@ -243,12 +243,12 @@ def _evalStrayLabels( ...@@ -243,12 +243,12 @@ def _evalStrayLabels(
x = test_slice.index.values.astype(float) x = test_slice.index.values.astype(float)
x_0 = x[0] x_0 = x[0]
x = (x - x_0) / 10 ** 12 x = (x - x_0) / 10**12
polyfitted = poly.polyfit(y=test_slice.values, x=x, deg=polydeg) polyfitted = poly.polyfit(y=test_slice.values, x=x, deg=polydeg)
testval = poly.polyval( testval = poly.polyval(
(float(index[1].to_numpy()) - x_0) / 10 ** 12, polyfitted (float(index[1].to_numpy()) - x_0) / 10**12, polyfitted
) )
testval = val_frame[var][index[1]] - testval testval = val_frame[var][index[1]] - testval
...@@ -878,27 +878,29 @@ def flagOffset( ...@@ -878,27 +878,29 @@ def flagOffset(
data: DictOfSeries, data: DictOfSeries,
field: str, field: str,
flags: Flags, flags: Flags,
thresh: float,
tolerance: float, tolerance: float,
window: Union[int, str], window: Union[int, str],
thresh: Optional[float] = None,
thresh_relative: Optional[float] = None, thresh_relative: Optional[float] = None,
flag: float = BAD, flag: float = BAD,
**kwargs, **kwargs,
) -> Tuple[DictOfSeries, Flags]: ) -> Tuple[DictOfSeries, Flags]:
""" """
A basic outlier test that work on regular and irregular sampled data A basic outlier test that works on regularly and irregularly sampled data.
The test classifies values/value courses as outliers by detecting not only a rise The test classifies values/value courses as outliers by detecting not only a rise
in value, but also, checking for a return to the initial value level. in value, but also, by checking for a return to the initial value level.
Values :math:`x_n, x_{n+1}, .... , x_{n+k}` of a timeseries :math:`x` with Values :math:`x_n, x_{n+1}, .... , x_{n+k}` of a timeseries :math:`x` with
associated timestamps :math:`t_n, t_{n+1}, .... , t_{n+k}` are considered spikes, if associated timestamps :math:`t_n, t_{n+1}, .... , t_{n+k}` are considered spikes, if
1. :math:`|x_{n-1} - x_{n + s}| >` `thresh`, for all :math:`s \\in [0,1,2,...,k]` 1. :math:`|x_{n-1} - x_{n + s}| >` `thresh`, for all :math:`s \\in [0,1,2,...,k]`
2. :math:`|x_{n-1} - x_{n+k+1}| <` `tolerance` 2. :math:`(x_{n + s} - x_{n - 1}) / x_{n - 1} >` `thresh_relative`
3. :math:`|x_{n-1} - x_{n+k+1}| <` `tolerance`
3. :math:`|t_{n-1} - t_{n+k+1}| <` `window` 4. :math:`|t_{n-1} - t_{n+k+1}| <` `window`
Note, that this definition of a "spike" not only includes one-value outliers, but Note, that this definition of a "spike" not only includes one-value outliers, but
also plateau-ish value courses. also plateau-ish value courses.
...@@ -911,15 +913,19 @@ def flagOffset( ...@@ -911,15 +913,19 @@ def flagOffset(
The field in data. The field in data.
flags : saqc.Flags flags : saqc.Flags
Container to store flags of the data. Container to store flags of the data.
thresh : float
Minimum difference between to values, to consider the latter one as a spike. See condition (1)
tolerance : float tolerance : float
Maximum difference between pre-spike and post-spike values. See condition (2) Maximum difference allowed, between the value, directly preceding and the value, directly succeeding an offset,
to trigger flagging of the values forming the offset.
See condition (3).
window : {str, int}, default '15min' window : {str, int}, default '15min'
Maximum length of "spiky" value courses. See condition (3). Integer defined window length are only allowed for Maximum length allowed for offset value courses, to trigger flagging of the values forming the offset.
regularly sampled timeseries. See condition (4). Integer defined window length are only allowed for regularly sampled timeseries.
thresh : float: {float, None}, default None
Minimum difference between a value and its successors, to consider the successors an anomalous offset group.
See condition (1). If None is passed, condition (1) is not tested.
thresh_relative : {float, None}, default None thresh_relative : {float, None}, default None
Relative threshold. Minimum relative change between and its successors, to consider the successors an anomalous offset group.
See condition (2). If None is passed, condition (2) is not tested.
flag : float, default BAD flag : float, default BAD
flag to set. flag to set.
...@@ -931,6 +937,99 @@ def flagOffset( ...@@ -931,6 +937,99 @@ def flagOffset(
The quality flags of data The quality flags of data
Flags values may have changed, relatively to the flags input. Flags values may have changed, relatively to the flags input.
Examples
--------
.. plot::
:context:
:include-source: False
import matplotlib
import saqc
import pandas as pd
data = pd.DataFrame({'data':np.array([5,5,8,16,17,7,4,4,4,1,1,4])}, index=pd.date_range('2000',freq='1H', periods=12))
Lets generate a simple, regularly sampled timeseries with an hourly sampling rate and generate an
:py:class:`saqc.SaQC` instance from it.
.. doctest:: flagOffsetExample
>>> data = pd.DataFrame({'data':np.array([5,5,8,16,17,7,4,4,4,1,1,4])}, index=pd.date_range('2000',freq='1H', periods=12))
>>> data
data
2000-01-01 00:00:00 5
2000-01-01 01:00:00 5
2000-01-01 02:00:00 8
2000-01-01 03:00:00 16
2000-01-01 04:00:00 17
2000-01-01 05:00:00 7
2000-01-01 06:00:00 4
2000-01-01 07:00:00 4
2000-01-01 08:00:00 4
2000-01-01 09:00:00 1
2000-01-01 10:00:00 1
2000-01-01 11:00:00 4
>>> qc = saqc.SaQC(data)
Now we are applying :py:meth:`~saqc.SaQC.flagOffset` and try to flag offset courses, that dont extend longer than
*6 hours* in time (``window``) and that have an initial value jump higher than *2* (``thresh``), and that do return
to the initial value level within a tolerance of *1.5* (``tolerance``).
.. doctest:: flagOffsetExample
>>> qc = qc.flagOffset("data", thresh=2, tolerance=1.5, window='6H')
>>> qc.plot('data') # doctest:+SKIP
.. plot::
:context: close-figs
:include-source: False
>>> qc = saqc.SaQC(data)
>>> qc = qc.flagOffset("data", thresh=2, tolerance=1.5, window='6H')
>>> qc.plot('data')
Note, that both, negative and positive jumps are considered starting points of negative or positive offsets.
If you want to impose the additional condition, that the initial value jump must exceed *+90%* of the value level,
you can additionally set the ``thresh_relative`` parameter:
.. doctest:: flagOffsetExample
>>> qc = qc.flagOffset("data", thresh=2, thresh_relative=.9, tolerance=1.5, window='6H')
>>> qc.plot('data') # doctest:+SKIP
.. plot::
:context: close-figs
:include-source: False
>>> qc = saqc.SaQC(data)
>>> qc = qc.flagOffset("data", thresh=2, thresh_relative=.9, tolerance=1.5, window='6H')
>>> qc.plot('data')
Now, only positive jumps, that exceed a value gain of *+90%* are considered starting points of offsets.
In the same way, you can aim for only negative offsets, by setting a negative relative threshold. The below
example only flags offsets, that fall off by at least *50 %* in value, with an absolute value drop of at least *2*.
.. doctest:: flagOffsetExample
>>> qc = qc.flagOffset("data", thresh=2, thresh_relative=-.5, tolerance=1.5, window='6H')
>>> qc.plot('data') # doctest:+SKIP
.. plot::
:context: close-figs
:include-source: False
>>> qc = saqc.SaQC(data)
>>> qc = qc.flagOffset("data", thresh=2, thresh_relative=-.5, tolerance=1.5, window='6H')
>>> qc.plot('data')
References References
---------- ----------
The implementation is a time-window based version of an outlier test from the UFZ Python library, The implementation is a time-window based version of an outlier test from the UFZ Python library,
...@@ -939,6 +1038,12 @@ def flagOffset( ...@@ -939,6 +1038,12 @@ def flagOffset(
https://git.ufz.de/chs/python/blob/master/ufz/level1/spike.py https://git.ufz.de/chs/python/blob/master/ufz/level1/spike.py
""" """
if (thresh is None) and (thresh_relative is None):
raise ValueError(
"At least one of parameters 'thresh' and 'thresh_relative' has to be given. Got 'thresh'=None, "
"'thresh_relative'=None instead."
)
dataseries = data[field].dropna() dataseries = data[field].dropna()
if dataseries.empty: if dataseries.empty:
return data, flags return data, flags
...@@ -954,19 +1059,19 @@ def flagOffset( ...@@ -954,19 +1059,19 @@ def flagOffset(
window = delta * window window = delta * window
if not delta: if not delta:
raise TypeError( raise TypeError(
"Only offset string defined window sizes allowed for irrgegularily sampled timeseries" "Only offset string defined window sizes allowed for timeseries not sampled regularly."
) )
# get all the entries preceding a significant jump # get all the entries preceding a significant jump
if thresh: if thresh is not None:
post_jumps = dataseries.diff().abs() > thresh post_jumps = dataseries.diff().abs() > thresh
if thresh_relative: if thresh_relative is not None:
s = np.sign(thresh_relative) s = np.sign(thresh_relative)
rel_jumps = s * (dataseries.shift(1) - dataseries).div(dataseries.abs()) > abs( rel_jumps = s * (dataseries.shift(1) - dataseries).div(dataseries.abs()) > abs(
thresh_relative thresh_relative
) )
if thresh: if thresh is not None:
post_jumps = rel_jumps & post_jumps post_jumps = rel_jumps & post_jumps
else: else:
post_jumps = rel_jumps post_jumps = rel_jumps
...@@ -982,11 +1087,13 @@ def flagOffset( ...@@ -982,11 +1087,13 @@ def flagOffset(
).dropna() ).dropna()
to_roll = dataseries[to_roll] to_roll = dataseries[to_roll]
if thresh_relative: if thresh_relative is not None:
def spikeTester(chunk, thresh=abs(thresh_relative), tol=tolerance): def spikeTester(
chunk, thresh_r=abs(thresh_relative), thresh_a=thresh or 0, tol=tolerance
):
jump = chunk[-2] - chunk[-1] jump = chunk[-2] - chunk[-1]
thresh = thresh * abs(jump) thresh = max(thresh_r * abs(chunk[-1]), thresh_a)
chunk_stair = (np.sign(jump) * (chunk - chunk[-1]) < thresh)[::-1].cumsum() chunk_stair = (np.sign(jump) * (chunk - chunk[-1]) < thresh)[::-1].cumsum()
initial = np.searchsorted(chunk_stair, 2) initial = np.searchsorted(chunk_stair, 2)
if initial == len(chunk): if initial == len(chunk):
......
...@@ -5,8 +5,9 @@ ...@@ -5,8 +5,9 @@
# SPDX-License-Identifier: GPL-3.0-or-later # SPDX-License-Identifier: GPL-3.0-or-later
# -*- coding: utf-8 -*- # -*- coding: utf-8 -*-
from __future__ import annotations
from typing import Optional, Tuple, Union
from typing import Optional, Tuple
from typing_extensions import Literal from typing_extensions import Literal
import numpy as np import numpy as np
...@@ -264,7 +265,7 @@ def plot( ...@@ -264,7 +265,7 @@ def plot(
flags: Flags, flags: Flags,
path: Optional[str] = None, path: Optional[str] = None,
max_gap: Optional[str] = None, max_gap: Optional[str] = None,
history: Optional[Literal["valid", "complete", "clear"]] = "valid", history: Optional[Literal["valid", "complete"] | list] = "valid",
xscope: Optional[slice] = None, xscope: Optional[slice] = None,
phaseplot: Optional[str] = None, phaseplot: Optional[str] = None,
store_kwargs: Optional[dict] = None, store_kwargs: Optional[dict] = None,
...@@ -304,14 +305,14 @@ def plot( ...@@ -304,14 +305,14 @@ def plot(
before plotting. If an offset string is passed, only points that have a distance before plotting. If an offset string is passed, only points that have a distance
below `max_gap` get connected via the plotting line. below `max_gap` get connected via the plotting line.
history : {"valid", "complete", None}, default "valid" history : {"valid", "complete", None, list of strings}, default "valid"
Discriminate the plotted flags with respect to the tests they originate from. Discriminate the plotted flags with respect to the tests they originate from.
* "valid" - Only plot those flags, that do not get altered or "unflagged" by subsequent tests. Only list tests * "valid" - Only plot those flags, that do not get altered or "unflagged" by subsequent tests. Only list tests
in the legend, that actually contributed flags to the overall resault. in the legend, that actually contributed flags to the overall resault.
* "complete" - plot all the flags set and list all the tests ran on a variable. Suitable for debugging/tracking. * "complete" - plot all the flags set and list all the tests ran on a variable. Suitable for debugging/tracking.
* "clear" - clear plot from all the flagged values
* None - just plot the resulting flags for one variable, without any historical meta information. * None - just plot the resulting flags for one variable, without any historical meta information.
* list of strings - plot only flags set by those tests listed.
xscope : slice or Offset, default None xscope : slice or Offset, default None
Parameter, that determines a chunk of the data to be plotted Parameter, that determines a chunk of the data to be plotted
...@@ -328,7 +329,7 @@ def plot( ...@@ -328,7 +329,7 @@ def plot(
ax_kwargs : dict, default {} ax_kwargs : dict, default {}
Axis keywords. Change the axis labeling defaults. Most important keywords: Axis keywords. Change the axis labeling defaults. Most important keywords:
'x_label', 'y_label', 'title', 'fontsize'. 'x_label', 'y_label', 'title', 'fontsize', 'cycleskip'.
""" """
......