flagging plateaus
Adding wavelet-based flagging of outlierish plateaus.
Based on the idea of searching for ricker-wavelet patterns, at decompositionscales generated by themselfs, a quite powerful, parameter minimal algorithm/workflow was sketched out and implemented.
Based only on the minimal and maximal length of plateus to be detected, the algorithm seems to perform well on highly volatile and noisy test data.
Example
Example data set was generated by
- generating noisy baseline, adding offsets and outliers (
base1
) - adding some cosine variance to the data (
base2
) - adding highly volatile, real world turbidity meassurements to the data (
base3
)
Flagging result for the call:
qc = qc.flagPlateau('base3', min_length='100min', max_length='7d')
Looks as follows (overview):
Zoomed in on the flagged chunks:
Merge request reports
Activity
changed milestone to %Someday
added accepted category: algorithms feature labels
assigned to @brunnerm
added test(s)-added✓ label and removed accepted label
removed test(s)-added✓ label
- Resolved by David Schäfer
Is the data available somewhere?
- Resolved by Peter Lünenschloß
I experimented a bit and found two performance bottlenecks when working with an artificial timeseries of length 25_000:
- The use of the
SaQC
object. Removing that reduced the runtime on my machine from 3:30 to 2:30 - The rolling-apply of
patternSearch
. Commenting this out reduced the runtime to something like 20 seconds, so I guess this should be our primary goal for optimizations. As I am not a pandas rolling expert I played a around a bit withnumpy
s stride tricks, but wasn't too successful either...But I guess there is a way to speed this up significantly, ideally without using numba/cython.
Edited by David Schäfer - The use of the
added 3 commits