Skip to content
GitLab
Explore
Sign in
Primary navigation
Search or go to…
Project
SaQC
Manage
Activity
Members
Labels
Plan
Issues
35
Issue boards
Milestones
Wiki
Code
Merge requests
7
Repository
Branches
Commits
Tags
Repository graph
Compare revisions
Snippets
Build
Pipelines
Jobs
Pipeline schedules
Artifacts
Deploy
Releases
Container Registry
Model registry
Operate
Environments
Monitor
Incidents
Service Desk
Analyze
Value stream analytics
Contributor analytics
CI/CD analytics
Repository analytics
Model experiments
Help
Help
Support
GitLab documentation
Compare GitLab plans
Community forum
Contribute to GitLab
Provide feedback
Terms and privacy
Keyboard shortcuts
?
Snippets
Groups
Projects
Show more breadcrumbs
rdm-software
SaQC
Commits
0a4a002f
Commit
0a4a002f
authored
3 years ago
by
Bert Palm
🎇
Browse files
Options
Downloads
Patches
Plain Diff
docstrings in breaks.py
parent
e22d0b13
No related branches found
No related tags found
7 merge requests
!685
Release 2.4
,
!684
Release 2.4
,
!567
Release 2.2.1
,
!566
Release 2.2
,
!501
Release 2.1
,
!372
fix doctest snippets
,
!355
docstring cleanup - part1
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
saqc/funcs/breaks.py
+66
-42
66 additions, 42 deletions
saqc/funcs/breaks.py
with
66 additions
and
42 deletions
saqc/funcs/breaks.py
+
66
−
42
View file @
0a4a002f
...
...
@@ -2,10 +2,12 @@
# -*- coding: utf-8 -*-
"""
Detecting breakish changes in timeseries value courses.
"""
Detecting breaks in data.
This module provides functions to detect and flag breakish changes in the data value course, like gaps
(:py:func:`flagMissing`), jumps/drops (:py:func:`flagJumps`) or isolated values (:py:func:`flagIsolated`).
This module provides functions to detect and flag breaks in data, for example temporal
gaps (:py:func:`flagMissing`), jumps and drops (:py:func:`flagJumps`) or temporal
isolated values (:py:func:`flagIsolated`).
"""
from
typing
import
Tuple
...
...
@@ -21,7 +23,6 @@ from saqc.lib.tools import groupConsecutives
from
saqc.lib.types
import
FreqString
from
saqc.funcs.changepoints
import
_assignChangePointCluster
from
saqc.core.flags
import
Flags
from
saqc.core.history
import
History
from
saqc.core.register
import
_isflagged
,
register
,
flagging
...
...
@@ -35,25 +36,34 @@ def flagMissing(
**
kwargs
)
->
Tuple
[
DictOfSeries
,
Flags
]:
"""
The function flags all values indicating missing data.
Flag NaNs in data.
By default only NaNs are flagged, that not already have a flag.
`to_mask` can be used to pass a flag that is used as threshold.
Each flag worse than the threshold is replaced by the function.
This is, because the data gets masked (with NaNs) before the
function evaluates the NaNs.
Parameters
----------
data : dios.DictOfSeries
A dictionary of pandas.Series, holding all the data.
The data container.
field : str
The fieldname of the column, holding the data-to-be-flagged.
Column(s) in flags and data.
flags : saqc.Flags
Container to store quality flags to data.
The flags container.
flag : float, default BAD
f
lag to set.
F
lag to set.
Returns
-------
data : dios.DictOfSeries
A dictionary of pandas.Series, holding all the data.
Unmodified data container
flags : saqc.Flags
The
quality flags of data
The
flags container
"""
datacol
=
data
[
field
]
...
...
@@ -76,47 +86,53 @@ def flagIsolated(
**
kwargs
)
->
Tuple
[
DictOfSeries
,
Flags
]:
"""
The function flags arbitrary large groups of values, if they are surrounded by sufficiently
large data gaps.
Find and flag temporal isolated groups of data.
A gap is a timespan containing either no data or data invalid only (usually `nan`) .
The function flags arbitrary large groups of values, if they are surrounded by
sufficiently large data gaps. A gap is a timespan containing either no data at all
or NaNs only.
Parameters
----------
data : dios.DictOfSeries
A dictionary of pandas.Series, holding all the data.
The data container.
field : str
The fieldname of the column, holding the data-to-be-flagged.
Column(s) in flags and data.
flags : saqc.Flags
A flags object
The flags container.
gap_window : str
The minimum size of the gap before and after a group of valid values, making this group considered an
isolated group. See condition (2) and (3)
Minimum gap size required before and after a data group to consider it
isolated. See condition (2) and (3)
group_window : str
The maximum temporal extension allowed for a group that is isolated by gaps of size
'
gap_window
'
,
to be actually flagged as isolated group. See condition (1).
Maximum size of a data chunk to consider it a candidate for an isolated group.
Data chunks that are bigger than the ``group_window`` are ignored.
This does not include the possible gaps surrounding it.
See condition (1).
flag : float, default BAD
f
lag to set.
F
lag to set.
Returns
-------
data : dios.DictOfSeries
A dictionary of pandas.Series, holding all the data.
Unmodified data container
flags : saqc.Flags
The flags
object, holding flags and additional information related to `data`.
The flags
container
Notes
-----
A series of values :math:`x_k,x_{k+1},...,x_{k+n}`, with associated
timestamps :math:`t_k,t_{k+1},...,t_{k+n}`,
is considered to be isolated, if:
A series of values :math:`x_k,x_{k+1},...,x_{k+n}`, with associated
timestamps :math:`t_k,t_{k+1},...,t_{k+n}`,
is considered to be isolated, if:
1. :math:`t_{k+1} - t_n <` `group_window`
2. None of the :math:`x_j` with :math:`0 < t_k - t_j <` `gap_window`, is valid (preceeding gap).
3. None of the :math:`x_j` with :math:`0 < t_j - t_(k+n) <` `gap_window`, is valid (succeding gap).
See Also
--------
:py:func:`flagMissing`
2. None of the :math:`x_j` with :math:`0 < t_k - t_j <` `gap_window`,
is valid (preceeding gap).
3. None of the :math:`x_j` with :math:`0 < t_j - t_(k+n) <` `gap_window`,
is valid (succeding gap).
"""
gap_window
=
pd
.
tseries
.
frequencies
.
to_offset
(
gap_window
)
group_window
=
pd
.
tseries
.
frequencies
.
to_offset
(
group_window
)
...
...
@@ -156,26 +172,34 @@ def flagJumps(
**
kwargs
)
->
Tuple
[
DictOfSeries
,
Flags
]:
"""
Flag where the mean of the values significantly changes (the data
"
jumps
"
).
Flag jumps and drops in data.
Flag data where the mean of its values significantly changes (the data
"
jumps
"
).
Parameters
----------
data : dios.DictOfSeries
A dictionary of pandas.Series, holding all the data.
The data container.
field : str
The reference variable, the deviation from wich determines the flagging.
Column(s) in flags and data.
flags : saqc.Flags
A flags object, holding flags and additional informations related to `data`.
The flags container.
thresh : float
The threshold, the mean of the values have to change by, to trigger flagging.
Threshold, the mean of data have to change to trigger flagging.
window : str
The temporal extension, of the rolling windows, the mean values that are to be
compared, are obtained from.
Size of the moving window. This is the number of observations used
for calculating the statistic.
min_periods : int, default 1
Minimum number of periods that have to be present in a window of size `window`,
so that the mean value obtained from that window is regarded valid.
Minimum number of observations in window required to calculate a valid
mean value.
flag : float, default BAD
f
lag to set.
F
lag to set.
"""
return
_assignChangePointCluster
(
data
,
...
...
This diff is collapsed.
Click to expand it.
David Schäfer
@schaefed
mentioned in commit
63b55c6d
·
2 years ago
mentioned in commit
63b55c6d
mentioned in commit 63b55c6d7dadb0e612b23a897f292d5ffc14cb52
Toggle commit list
David Schäfer
@schaefed
mentioned in commit
684dc8a0
·
2 years ago
mentioned in commit
684dc8a0
mentioned in commit 684dc8a0515470d644fc85fee95d07661c8dd572
Toggle commit list
David Schäfer
@schaefed
mentioned in commit
8f7a90e4
·
1 year ago
mentioned in commit
8f7a90e4
mentioned in commit 8f7a90e4aed61c79a9dc8d67541a46beba0907e8
Toggle commit list
Preview
0%
Loading
Try again
or
attach a new file
.
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Save comment
Cancel
Please
register
or
sign in
to comment