Skip to content
GitLab
Explore
Sign in
Primary navigation
Search or go to…
Project
SaQC
Manage
Activity
Members
Labels
Plan
Issues
36
Issue boards
Milestones
Wiki
Code
Merge requests
8
Repository
Branches
Commits
Tags
Repository graph
Compare revisions
Snippets
Build
Pipelines
Jobs
Pipeline schedules
Artifacts
Deploy
Releases
Container Registry
Model registry
Operate
Environments
Monitor
Incidents
Service Desk
Analyze
Value stream analytics
Contributor analytics
CI/CD analytics
Repository analytics
Model experiments
Help
Help
Support
GitLab documentation
Compare GitLab plans
Community forum
Contribute to GitLab
Provide feedback
Terms and privacy
Keyboard shortcuts
?
Snippets
Groups
Projects
Show more breadcrumbs
rdm-software
SaQC
Commits
0b428e98
Commit
0b428e98
authored
4 years ago
by
Peter Lünenschloß
Browse files
Options
Downloads
Patches
Plain Diff
soil moisture flagging module documented (all but random forest)
parent
572c401e
No related branches found
Branches containing commit
No related tags found
3 merge requests
!193
Release 1.4
,
!188
Release 1.4
,
!78
doc-string doc of test functionality
Pipeline
#6114
passed with stage
in 12 minutes and 32 seconds
Changes
1
Pipelines
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
saqc/funcs/soil_moisture_tests.py
+245
-62
245 additions, 62 deletions
saqc/funcs/soil_moisture_tests.py
with
245 additions
and
62 deletions
saqc/funcs/soil_moisture_tests.py
+
245
−
62
View file @
0b428e98
...
@@ -30,10 +30,55 @@ def sm_flagSpikes(
...
@@ -30,10 +30,55 @@ def sm_flagSpikes(
):
):
"""
"""
The Function provides just a call to flagSpikes_spektrumBased, with parameter defaults, that refer to:
The Function provides just a call to ``flagSpikes_spektrumBased``, with parameter defaults,
that refer to References [1].
Parameters
----------
data : dios.DictOfSeries
A dictionary of pandas.Series, holding all the data.
field : str
The fieldname of the column, holding the data-to-be-flagged.
flagger : saqc.flagger
A flagger object, holding flags and additional Informations related to `data`.
raise_factor : float, default 0.15
Minimum relative value difference between two values to consider the latter as a spike candidate.
See condition (1) (or reference [2]).
deriv_factor : float, default 0.2
See condition (2) (or reference [2]).
noise_func : {
'
CoVar
'
,
'
rVar
'
}, default
'
CoVar
'
Function to calculate noisiness of the data surrounding potential spikes.
``
'
CoVar
'
``: Coefficient of Variation
``
'
rVar
'
``: Relative Variance
noise_window : str, default
'
12h
'
An offset string that determines the range of the time window of the
"
surrounding
"
data of a potential spike.
See condition (3) (or reference [2]).
noise_thresh : float, default 1
Upper threshold for noisiness of data surrounding potential spikes. See condition (3) (or reference [2]).
smooth_window : {None, str}, default None
Size of the smoothing window of the Savitsky-Golay filter.
The default value ``None`` results in a window of two times the sampling rate (i.e. containing three values).
smooth_poly_deg : int, default 2
Degree of the polynomial used for fitting with the Savitsky-Golay filter.
Returns
-------
data : dios.DictOfSeries
A dictionary of pandas.Series, holding all the data.
flagger : saqc.flagger
The flagger object, holding flags and additional Informations related to `data`.
Flags values may have changed relatively to the flagger input.
References
----------
This Function is a generalization of the Spectrum based Spike flagging mechanism as presented in:
[1] Dorigo, W. et al: Global Automated Quality Control of In Situ Soil Moisture
Data from the international Soil Moisture Network. 2013. Vadoze Zone J.
doi:10.2136/vzj2012.0097.
[2] https://git.ufz.de/rdm-software/saqc/-/blob/testfuncDocs/docs/funcs/FormalDescriptions.md#spikes_flagspektrumbased
Dorigo,W,.... Global Automated Quality Control of In Situ Soil Moisture Data from the international
Soil Moisture Network. 2013. Vadoze Zone J. doi:10.2136/vzj2012.0097.
"""
"""
return
spikes_flagSpektrumBased
(
return
spikes_flagSpektrumBased
(
...
@@ -69,10 +114,57 @@ def sm_flagBreaks(
...
@@ -69,10 +114,57 @@ def sm_flagBreaks(
):
):
"""
"""
The Function provides just a call to flagBreaks_spektrumBased, with parameter defaults that refer to:
The Function provides just a call to flagBreaks_spektrumBased, with parameter defaults that refer to references [1].
Dorigo,W,.... Global Automated Quality Control of In Situ Soil Moisture Data from the international
Parameters
Soil Moisture Network. 2013. Vadoze Zone J. doi:10.2136/vzj2012.0097.
----------
data : dios.DictOfSeries
A dictionary of pandas.Series, holding all the data.
field : str
The fieldname of the column, holding the data-to-be-flagged.
flagger : saqc.flagger
A flagger object, holding flags and additional Informations related to `data`.
thresh_rel : float, default 0.1
Float in [0,1]. See (1) of function description above to learn more
thresh_abs : float, default 0.01
Float > 0. See (2) of function descritpion above to learn more.
first_der_factor : float, default 10
Float > 0. See (3) of function descritpion above to learn more.
first_der_window_range : str, default
'
12h
'
Offset string. See (3) of function description to learn more.
scnd_der_ratio_margin_1 : float, default 0.05
Float in [0,1]. See (4) of function descritpion above to learn more.
scnd_der_ratio_margin_2 : float, default 10
Float in [0,1]. See (5) of function descritpion above to learn more.
smooth : bool, default True
Method for obtaining dataseries
'
derivatives.
* False: Just take series step differences (default)
* True: Smooth data with a Savitzky Golay Filter before differentiating.
smooth_window : {None, str}, default 2
Effective only if `smooth` = True
Offset string. Size of the filter window, used to calculate the derivatives.
smooth_poly_deg : int, default 2
Effective only, if `smooth` = True
Polynomial order, used for smoothing with savitzk golay filter.
Returns
-------
data : dios.DictOfSeries
A dictionary of pandas.Series, holding all the data.
flagger : saqc.flagger
The flagger object, holding flags and additional informations related to `data`.
Flags values may have changed, relatively to the flagger input.
References
----------
[1] Dorigo,W. et al.: Global Automated Quality Control of In Situ Soil Moisture
Data from the international Soil Moisture Network. 2013. Vadoze Zone J.
doi:10.2136/vzj2012.0097.
Find a brief mathematical description of the function here:
[2] https://git.ufz.de/rdm-software/saqc/-/blob/testfuncDocs/docs/funcs
/FormalDescriptions.md#breaks_flagspektrumbased
"""
"""
return
breaks_flagSpektrumBased
(
return
breaks_flagSpektrumBased
(
...
@@ -95,28 +187,45 @@ def sm_flagBreaks(
...
@@ -95,28 +187,45 @@ def sm_flagBreaks(
@register
@register
def
sm_flagFrost
(
data
,
field
,
flagger
,
soil_temp_variable
,
window
=
"
1h
"
,
frost_thresh
=
0
,
**
kwargs
):
def
sm_flagFrost
(
data
,
field
,
flagger
,
soil_temp_variable
,
window
=
"
1h
"
,
frost_thresh
=
0
,
**
kwargs
):
"""
This Function is an implementation of the soil temperature based Soil Moisture flagging, as presented in:
"""
This Function is an implementation of the soil temperature based Soil Moisture flagging, as presented in
Dorigo,W,.... Global Automated Quality Control of In Situ Soil Moisture Data from the international
references [1]:
Soil Moisture Network. 2013. Vadoze Zone J. doi:10.2136/vzj2012.0097.
All parameters default to the values, suggested in this publication.
All parameters default to the values, suggested in this publication.
Function flags Soil moisture measurements by evaluating the soil-frost-level in the moment of measurement.
Function flags Soil moisture measurements by evaluating the soil-frost-level in the moment of measurement.
Soil temperatures below
"
frost_level
"
are regarded as denoting frozen soil state.
Soil temperatures below
"
frost_level
"
are regarded as denoting frozen soil state.
:param data: The pandas dataframe holding the data-to-be flagged, as well as the reference
Parameters
series. Data must be indexed by a datetime series.
----------
:param field: Fieldname of the Soil moisture measurements field in data.
data : dios.DictOfSeries
:param flagger: A flagger - object.
A dictionary of pandas.Series, holding all the data.
like thingies that refer to the data(including datestrings).
field : str
:param tolerated_deviation: Offset String. Denoting the maximal temporal deviation,
The fieldname of the column, holding the data-to-be-flagged.
the soil frost states timestamp is allowed to have, relative to the
flagger : saqc.flagger
data point to-be-flagged.
A flagger object, holding flags and additional Informations related to `data`.
:param soil_temp_reference: A STRING, denoting the fields name in data,
soil_temp_variable : str,
that holds the data series of soil temperature values,
An offset string, denoting the fields name in data, that holds the data series of soil temperature values,
the to-be-flagged values shall be checked against.
the to-be-flagged values shall be checked against.
:param frost_level: Value level, the flagger shall check against, when evaluating soil frost level.
window : str
An offset string denoting the maximal temporal deviation, the soil frost states timestamp is allowed to have,
relative to the data point to-be-flagged.
frost_thresh : float
Value level, the flagger shall check against, when evaluating soil frost level.
Returns
-------
data : dios.DictOfSeries
A dictionary of pandas.Series, holding all the data.
flagger : saqc.flagger
The flagger object, holding flags and additional informations related to `data`.
Flags values may have changed, relatively to the flagger input.
References
----------
[1] Dorigo,W. et al.: Global Automated Quality Control of In Situ Soil Moisture
Data from the international Soil Moisture Network. 2013. Vadoze Zone J.
doi:10.2136/vzj2012.0097.
"""
"""
# retrieve reference series
# retrieve reference series
...
@@ -155,10 +264,9 @@ def sm_flagPrecipitation(
...
@@ -155,10 +264,9 @@ def sm_flagPrecipitation(
**
kwargs
,
**
kwargs
,
):
):
"""
This Function is an implementation of the precipitation based Soil Moisture flagging, as presented in:
"""
This Function is an implementation of the precipitation based Soil Moisture flagging, as presented in
Dorigo,W,.... Global Automated Quality Control of In Situ Soil Moisture Data from the international
references [1].
Soil Moisture Network. 2013. Vadoze Zone J. doi:10.2136/vzj2012.0097.
All parameters default to the values, suggested in this publication. (excluding porosity,sensor accuracy and
All parameters default to the values, suggested in this publication. (excluding porosity,sensor accuracy and
sensor depth)
sensor depth)
...
@@ -172,9 +280,9 @@ def sm_flagPrecipitation(
...
@@ -172,9 +280,9 @@ def sm_flagPrecipitation(
A data point y_t is flagged an invalid soil moisture raise, if:
A data point y_t is flagged an invalid soil moisture raise, if:
(1) y_t > y_(t-raise_window)
(1) y_t > y_(t-
`
raise_window
`
)
(2) y_t - y_(t-
"
std_factor_range
"
) >
"
std_factor
"
* std(y_(t-
"
std_factor_range
"
),...,y_t)
(2) y_t - y_(t-
`
std_factor_range
`
) >
`
std_factor
`
* std(y_(t-
`
std_factor_range
`
),...,y_t)
(3) sum(prec(t-24h),...,prec(t)) > sensor_depth * sensor_accuracy * soil_porosity
(3) sum(prec(t-24h),...,prec(t)) >
`
sensor_depth
`
*
`
sensor_accuracy
`
*
`
soil_porosity
`
NOTE1: np.nan entries in the input precipitation series will be regarded as susipicious and the test will be
NOTE1: np.nan entries in the input precipitation series will be regarded as susipicious and the test will be
omited for every 24h interval including a np.nan entrie in the original precipitation sampling rate.
omited for every 24h interval including a np.nan entrie in the original precipitation sampling rate.
...
@@ -183,27 +291,57 @@ def sm_flagPrecipitation(
...
@@ -183,27 +291,57 @@ def sm_flagPrecipitation(
NOTE2: The function wont test any values that are flagged suspicious anyway - this may change in a future version.
NOTE2: The function wont test any values that are flagged suspicious anyway - this may change in a future version.
:param data: The pandas dataframe holding the data-to-be flagged, as well as the reference
Parameters
series. Data must be indexed by a datetime series and be harmonized onto a
----------
time raster with seconds precision.
data : dios.DictOfSeries
:param field: Fieldname of the Soil moisture measurements field in data.
A dictionary of pandas.Series, holding all the data.
:param flagger: A flagger - object. (saqc.flagger.X)
field : str
:param prec_variable: Fieldname of the precipitation meassurements column in data.
The fieldname of the column, holding the data-to-be-flagged.
:param sensor_depth: Measurement depth of the soil moisture sensor, [m].
flagger : saqc.flagger
:param sensor_accuracy: Accuracy of the soil moisture sensor, [-].
A flagger object, holding flags and additional informations related to `data`.
:param soil_porosity: Porosity of moisture sensors surrounding soil, [-].
prec_variable : str
:param std_factor: The value determines by which rule it is decided, weather a raise in soil
Fieldname of the precipitation meassurements column in data.
moisture is significant enough to trigger the flag test or not:
raise_window: {None, str}, default None
Significants is assumed, if the raise is greater then
"
std_factor
"
multiplied
Denotes the distance to the datapoint, relatively to witch
with the last 24 hours standart deviation.
it is decided if the current datapoint is a raise or not. Equation [1].
:param std_factor_range: Offset String. Denotes the range over witch the standart deviation is obtained,
It defaults to None. When None is passed, raise_window is just the sample
to test condition [2]. (Should be a multiple of the sampling rate)
rate of the data. Any raise reference must be a multiple of the (intended)
:param raise_window: Offset String. Denotes the distance to the datapoint, relatively to witch
sample rate and below std_factor_range.
it is decided if the current datapoint is a raise or not. Equation [1].
sensor_depth : float, default 0
It defaults to None. When None is passed, raise_window is just the sample
Measurement depth of the soil moisture sensor, [m].
rate of the data. Any raise reference must be a multiple of the (intended)
sensor_accuracy : float, default 0
sample rate and below std_factor_range.
Accuracy of the soil moisture sensor, [-].
:param ignore_missing:
soil_porosity : float, default 0
Porosity of moisture sensors surrounding soil, [-].
std_factor : int, default 2
The value determines by which rule it is decided, weather a raise in soil
moisture is significant enough to trigger the flag test or not:
Significance is assumed, if the raise is greater then
"
std_factor
"
multiplied
with the last 24 hours standart deviation.
std_window: str, default
'
24h
'
An offset string that denotes the range over witch the standart deviation is obtained,
to test condition [2]. (Should be a multiple of the sampling rate)
raise_window: str
Denotes the distance to the datapoint, relatively to witch
it is decided if the current datapoint is a raise or not. Equation [1].
It defaults to None. When None is passed, raise_window is just the sample
rate of the data. Any raise reference must be a multiple of the (intended)
sample rate and below std_factor_range.
ignore_missing: bool, default False
Returns
-------
data : dios.DictOfSeries
A dictionary of pandas.Series, holding all the data.
flagger : saqc.flagger
The flagger object, holding flags and additional informations related to `data`.
Flags values may have changed, relatively to the flagger input.
References
----------
[1] Dorigo,W. et al.: Global Automated Quality Control of In Situ Soil Moisture
Data from the international Soil Moisture Network. 2013. Vadoze Zone J.
doi:10.2136/vzj2012.0097.
"""
"""
dataseries
,
moist_rate
=
retrieveTrustworthyOriginal
(
data
,
field
,
flagger
)
dataseries
,
moist_rate
=
retrieveTrustworthyOriginal
(
data
,
field
,
flagger
)
...
@@ -245,7 +383,6 @@ def sm_flagPrecipitation(
...
@@ -245,7 +383,6 @@ def sm_flagPrecipitation(
flagger
=
flagger
.
setFlags
(
field
,
loc
=
invalid_indices
.
index
,
**
kwargs
)
flagger
=
flagger
.
setFlags
(
field
,
loc
=
invalid_indices
.
index
,
**
kwargs
)
return
data
,
flagger
return
data
,
flagger
@register
@register
def
sm_flagConstants
(
def
sm_flagConstants
(
data
,
data
,
...
@@ -265,16 +402,62 @@ def sm_flagConstants(
...
@@ -265,16 +402,62 @@ def sm_flagConstants(
):
):
"""
"""
This function flags plateaus/series of constant values in soil moisture data.
Note, function has to be harmonized to equidistant freq_grid
Mentionings of
"
conditions
"
in the following explanations refer to references [2].
Note, in current implementation, it has to hold that: (rainfall_window_range >= plateau_window_min)
The function represents a stricter version of
:param data: The pandas dataframe holding the data-to-be flagged.
constants_flagVarianceBased.
Data must be indexed by a datetime series and be harmonized onto a
time raster with seconds precision (skips allowed).
The additional constraints (3)-(5), are designed to match the special cases of constant
:param field: Fieldname of the Soil moisture measurements field in data.
values in soil moisture measurements and basically for preceding precipitation events
:param flagger: A flagger - object. (saqc.flagger.X)
(conditions (3) and (4)) and certain plateau level (condition (5)).
Parameters
----------
data : dios.DictOfSeries
A dictionary of pandas.Series, holding all the data.
field : str
The fieldname of the column, holding the data-to-be-flagged.
flagger : saqc.flagger
A flagger object, holding flags and additional Informations related to `data`.
window : str, default
'
12h
'
Minimum duration during which values need to identical to become plateau candidates. See condition (1)
thresh : float, default 0.0005
Maximum variance of a group of values to still consider them constant. See condition (2)
precipitation_window : str, default
'
12h
'
See condition (3) and (4)
tolerance : float, default 0.95
Tolerance factor, see condition (5)
deriv_max : float, default 0
See condition (4)
deriv_min : float, default 0.0025
See condition (3)
max_missing : {None, int}, default None
Maximum number of missing values allowed in window, by default this condition is ignored
max_consec_missing : {None, int}, default None
Maximum number of consecutive missing values allowed in window, by default this condition is ignored
smooth_window : {None, str}, default None
Size of the smoothing window of the Savitsky-Golay filter. The default value None results in a window of two
times the sampling rate (i.e. three values)
smooth_poly_deg : int, default 2
Degree of the polynomial used for smoothing with the Savitsky-Golay filter
Returns
-------
data : dios.DictOfSeries
A dictionary of pandas.Series, holding all the data.
flagger : saqc.flagger
The flagger object, holding flags and additional informations related to `data`.
Flags values may have changed, relatively to the flagger input.
References
----------
[1] Dorigo,W. et al.: Global Automated Quality Control of In Situ Soil Moisture
Data from the international Soil Moisture Network. 2013. Vadoze Zone J.
doi:10.2136/vzj2012.0097.
[2] https://git.ufz.de/rdm-software/saqc/-/edit/testfuncDocs/docs/funcs/FormalDescriptions.md#sm_flagconstants
"""
"""
# get plateaus:
# get plateaus:
...
...
This diff is collapsed.
Click to expand it.
Preview
0%
Loading
Try again
or
attach a new file
.
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Save comment
Cancel
Please
register
or
sign in
to comment