Skip to content
GitLab
Explore
Sign in
Primary navigation
Search or go to…
Project
SaQC
Manage
Activity
Members
Labels
Plan
Issues
36
Issue boards
Milestones
Wiki
Code
Merge requests
8
Repository
Branches
Commits
Tags
Repository graph
Compare revisions
Snippets
Build
Pipelines
Jobs
Pipeline schedules
Artifacts
Deploy
Releases
Container Registry
Model registry
Operate
Environments
Monitor
Incidents
Service Desk
Analyze
Value stream analytics
Contributor analytics
CI/CD analytics
Repository analytics
Model experiments
Help
Help
Support
GitLab documentation
Compare GitLab plans
Community forum
Contribute to GitLab
Provide feedback
Terms and privacy
Keyboard shortcuts
?
Snippets
Groups
Projects
Show more breadcrumbs
rdm-software
SaQC
Commits
1a418f09
Commit
1a418f09
authored
4 years ago
by
Peter Lünenschloß
Browse files
Options
Downloads
Patches
Plain Diff
doc doc doc
parent
a0076e67
No related branches found
Branches containing commit
No related tags found
Tags containing commit
3 merge requests
!193
Release 1.4
,
!188
Release 1.4
,
!138
WIP: Detect and reset offset
Pipeline
#8967
passed with stage
in 6 minutes and 6 seconds
Changes
3
Pipelines
1
Hide whitespace changes
Inline
Side-by-side
Showing
3 changed files
saqc/funcs/breaks_detection.py
+9
-7
9 additions, 7 deletions
saqc/funcs/breaks_detection.py
saqc/funcs/modelling.py
+3
-0
3 additions, 0 deletions
saqc/funcs/modelling.py
saqc/lib/tools.py
+5
-3
5 additions, 3 deletions
saqc/lib/tools.py
with
17 additions
and
10 deletions
saqc/funcs/breaks_detection.py
+
9
−
7
View file @
1a418f09
...
@@ -14,12 +14,12 @@ from saqc.lib.tools import retrieveTrustworthyOriginal, detectDeviants
...
@@ -14,12 +14,12 @@ from saqc.lib.tools import retrieveTrustworthyOriginal, detectDeviants
@register
(
masking
=
'
all
'
)
@register
(
masking
=
'
all
'
)
def
breaks_flagRegimeAnomaly
(
data
,
field
,
flagger
,
cluster_field
,
norm_spread
,
def
breaks_flagRegimeAnomaly
(
data
,
field
,
flagger
,
cluster_field
,
norm_spread
,
metric
=
lambda
x
,
y
:
np
.
abs
(
np
.
nanmean
(
x
)
-
np
.
nanmean
(
y
)),
metric
=
lambda
x
,
y
:
np
.
abs
(
np
.
nanmean
(
x
)
-
np
.
nanmean
(
y
)),
norm_frac
=
0.5
,
**
kwargs
):
norm_frac
=
0.5
,
recluster
=
False
,
**
kwargs
):
"""
"""
A function to flag values belonging to an anomalous regime
s
of field.
A function to flag values belonging to an anomalous regime of field.
"
Normality
"
is determined in terms of a maximum spreading distance, regimes must not exceed in respect
"
Normality
"
is determined in terms of a maximum spreading distance, regimes must not exceed in respect
to a certain metric.
to a certain metric
and linkage method
.
In addition, only a range of regimes is considered
"
normal
"
, if it models more then `norm_frac` percentage of
In addition, only a range of regimes is considered
"
normal
"
, if it models more then `norm_frac` percentage of
the valid samples in
"
field
"
.
the valid samples in
"
field
"
.
...
@@ -27,7 +27,7 @@ def breaks_flagRegimeAnomaly(data, field, flagger, cluster_field, norm_spread,
...
@@ -27,7 +27,7 @@ def breaks_flagRegimeAnomaly(data, field, flagger, cluster_field, norm_spread,
Note, that you must detect the regime changepoints prior to calling this function.
Note, that you must detect the regime changepoints prior to calling this function.
Note, that it is possible to perform hypothesis tests for regime equality by passing the metric
Note, that it is possible to perform hypothesis tests for regime equality by passing the metric
a function for p-value calculation.
a function for p-value calculation
and selecting linkage method
"
complete
"
.
Parameters
Parameters
----------
----------
...
@@ -41,13 +41,15 @@ def breaks_flagRegimeAnomaly(data, field, flagger, cluster_field, norm_spread,
...
@@ -41,13 +41,15 @@ def breaks_flagRegimeAnomaly(data, field, flagger, cluster_field, norm_spread,
The name of the column in data, holding the cluster labels for the samples in field. (has to be indexed
The name of the column in data, holding the cluster labels for the samples in field. (has to be indexed
equal to field)
equal to field)
norm_spread : float
norm_spread : float
A threshold denoting the distance, members of the
"
normal
"
group must not exceed to each other (in terms of the
A threshold denoting the valuelevel, up to wich clusters a agglomerated.
metric passed) to qualify their group as the
"
normal
"
group.
metric : Callable[[numpy.array, numpy.array], float], default lambda x, y: np.abs(np.nanmean(x) - np.nanmean(y))
metric : Callable[[numpy.array, numpy.array], float], default lambda x, y: np.abs(np.nanmean(x) - np.nanmean(y))
A metric function for calculating the dissimilarity between 2 regimes. Defaults to just the difference in mean.
A metric function for calculating the dissimilarity between 2 regimes. Defaults to just the difference in mean.
norm_frac : float
norm_frac : float
Has to be in [0,1]. Determines the minimum percentage of samples,
Has to be in [0,1]. Determines the minimum percentage of samples,
the
"
normal
"
group has to comprise to be the normal group actually.
the
"
normal
"
group has to comprise to be the normal group actually.
recluster : bool, default False
If True,
kwargs
kwargs
Returns
Returns
...
@@ -67,7 +69,7 @@ def breaks_flagRegimeAnomaly(data, field, flagger, cluster_field, norm_spread,
...
@@ -67,7 +69,7 @@ def breaks_flagRegimeAnomaly(data, field, flagger, cluster_field, norm_spread,
plateaus
=
detectDeviants
(
cluster_dios
,
metric
,
norm_spread
,
norm_frac
,
'
single
'
,
'
samples
'
)
plateaus
=
detectDeviants
(
cluster_dios
,
metric
,
norm_spread
,
norm_frac
,
'
single
'
,
'
samples
'
)
for
p
in
plateaus
:
for
p
in
plateaus
:
flagger
=
flagger
.
setFlags
(
data
.
iloc
[:,
p
].
index
,
**
kwargs
)
flagger
=
flagger
.
setFlags
(
field
,
loc
=
cluster_dios
.
iloc
[:,
p
].
index
,
**
kwargs
)
return
data
,
flagger
return
data
,
flagger
...
...
This diff is collapsed.
Click to expand it.
saqc/funcs/modelling.py
+
3
−
0
View file @
1a418f09
...
@@ -438,6 +438,9 @@ def modelling_clusterByChangePoints(data, field, flagger, stat_func, thresh_func
...
@@ -438,6 +438,9 @@ def modelling_clusterByChangePoints(data, field, flagger, stat_func, thresh_func
generated by.
generated by.
The regime change points detection is based on a sliding window search.
The regime change points detection is based on a sliding window search.
Note, that the cluster labels will be stored to the `field` field of the input data, so that the data that is
clustered gets overridden.
Parameters
Parameters
----------
----------
data : dios.DictOfSeries
data : dios.DictOfSeries
...
...
This diff is collapsed.
Click to expand it.
saqc/lib/tools.py
+
5
−
3
View file @
1a418f09
...
@@ -532,7 +532,9 @@ def detectDeviants(data, metric, norm_spread, norm_frac, linkage_method='single'
...
@@ -532,7 +532,9 @@ def detectDeviants(data, metric, norm_spread, norm_frac, linkage_method='single'
Helper function for carrying out the repeatedly upcoming task,
Helper function for carrying out the repeatedly upcoming task,
of detecting variables a group of variables.
of detecting variables a group of variables.
"
Normality
"
is determined in terms of a maximum spreading distance, that members of a normal group must not exceed.
"
Normality
"
is determined in terms of a maximum spreading distance, that members of a normal group must not exceed
in respect to a certain metric and linkage method.
In addition, only a group is considered
"
normal
"
if it contains more then `norm_frac` percent of the
In addition, only a group is considered
"
normal
"
if it contains more then `norm_frac` percent of the
variables in
"
fields
"
.
variables in
"
fields
"
.
...
@@ -560,7 +562,7 @@ def detectDeviants(data, metric, norm_spread, norm_frac, linkage_method='single'
...
@@ -560,7 +562,7 @@ def detectDeviants(data, metric, norm_spread, norm_frac, linkage_method='single'
Returns
Returns
-------
-------
deviants : List
deviants : List
A list containing the
the
column positions of deviant variables in the input frame/dios.
A list containing the column positions of deviant variables in the input frame/dios.
"""
"""
var_num
=
len
(
data
.
columns
)
var_num
=
len
(
data
.
columns
)
...
@@ -582,7 +584,7 @@ def detectDeviants(data, metric, norm_spread, norm_frac, linkage_method='single'
...
@@ -582,7 +584,7 @@ def detectDeviants(data, metric, norm_spread, norm_frac, linkage_method='single'
counts
[
cluster
[
c
]]
+=
data
.
iloc
[:,
c
].
dropna
().
shape
[
0
]
counts
[
cluster
[
c
]]
+=
data
.
iloc
[:,
c
].
dropna
().
shape
[
0
]
pop_num
=
np
.
sum
(
list
(
counts
.
values
()))
pop_num
=
np
.
sum
(
list
(
counts
.
values
()))
else
:
else
:
raise
ValueError
(
"
Not a valid normality criteria keyword passed.
p
ass either
'
variables
'
or
'
population
'
.
"
)
raise
ValueError
(
"
Not a valid normality criteria keyword passed.
P
ass either
'
variables
'
or
'
population
'
.
"
)
norm_cluster
=
-
1
norm_cluster
=
-
1
for
item
in
counts
.
items
():
for
item
in
counts
.
items
():
...
...
This diff is collapsed.
Click to expand it.
Preview
0%
Loading
Try again
or
attach a new file
.
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Save comment
Cancel
Please
register
or
sign in
to comment