Skip to content
Snippets Groups Projects
Commit 11f1f411 authored by Peter Lünenschloß's avatar Peter Lünenschloß
Browse files

merged develop

parents e703bb90 bb7194e5
No related branches found
No related tags found
1 merge request!850Horizontal axis rolling
Pipeline #207550 failed with stages
in 1 minute and 11 seconds
This commit is part of merge request !850. Comments created here will be created in the context of that merge request.
Showing
with 131 additions and 105 deletions
...@@ -29,7 +29,7 @@ jobs: ...@@ -29,7 +29,7 @@ jobs:
fail-fast: false fail-fast: false
matrix: matrix:
os: ["windows-latest", "ubuntu-latest", "macos-latest"] os: ["windows-latest", "ubuntu-latest", "macos-latest"]
python-version: ["3.9", "3.10", "3.11"] python-version: ["3.9", "3.10", "3.11", "3.12"]
defaults: defaults:
run: run:
# somehow this also works for windows O.o ?? # somehow this also works for windows O.o ??
......
...@@ -30,7 +30,7 @@ stages: ...@@ -30,7 +30,7 @@ stages:
- deploy - deploy
default: default:
image: python:3.10 image: python:3.11
before_script: before_script:
- pip install --upgrade pip - pip install --upgrade pip
- pip install -r requirements.txt - pip install -r requirements.txt
...@@ -133,8 +133,23 @@ python311: ...@@ -133,8 +133,23 @@ python311:
reports: reports:
junit: report.xml junit: report.xml
python312:
stage: test
image: python:3.12
script:
- export DISPLAY=:99
- Xvfb :99 &
- pytest tests -Werror --junitxml=report.xml
- python -m saqc --config docs/resources/data/config.csv --data docs/resources/data/data.csv --outfile /tmp/test.csv
artifacts:
when: always
reports:
junit: report.xml
doctest: doctest:
stage: test stage: test
variables:
COLUMNS: 200
script: script:
- cd docs - cd docs
- pip install -r requirements.txt - pip install -r requirements.txt
...@@ -180,6 +195,16 @@ wheel311: ...@@ -180,6 +195,16 @@ wheel311:
- pip install . - pip install .
- python -c 'import saqc; print(f"{saqc.__version__=}")' - python -c 'import saqc; print(f"{saqc.__version__=}")'
wheel312:
stage: build
image: python:3.12
variables:
PYPI_PKG_NAME: "saqc-dev"
script:
- pip install wheel
- pip wheel .
- pip install .
- python -c 'import saqc; print(f"{saqc.__version__=}")'
# =========================================================== # ===========================================================
# Extra Pipeline (run with a successful run of all other jobs on develop) # Extra Pipeline (run with a successful run of all other jobs on develop)
......
...@@ -6,38 +6,52 @@ SPDX-License-Identifier: GPL-3.0-or-later ...@@ -6,38 +6,52 @@ SPDX-License-Identifier: GPL-3.0-or-later
# Changelog # Changelog
## Unreleased ## Unreleased
[List of commits](https://git.ufz.de/rdm-software/saqc/-/compare/v2.5.0...develop) [List of commits](https://git.ufz.de/rdm-software/saqc/-/compare/v2.6.0...develop)
### Added
### Changed
### Removed
### Fixed
### Deprecated
## [2.6.0](https://git.ufz.de/rdm-software/saqc/-/tags/v2.6.0) - 2024-04-15
[List of commits](https://git.ufz.de/rdm-software/saqc/-/compare/v2.5.0...v2.6.0)
### Added ### Added
- `reindex`: base reindexer function - `reindex`: base reindexer function
- `flagGeneric`, `processGeneric`: target broadcasting and numpy array support - `flagGeneric`, `processGeneric`: target broadcasting and numpy array support
- `SaQC`: automatic translation of incoming flags - `SaQC`: automatic translation of incoming flags
- Option to change the flagging scheme after initialization - Option to change the flagging scheme after initialization
- `flagByClick`: manually assign flags using a graphical user interface - `flagByClick`: manually assign flags using a graphical user interface
- `SaQC`: support for selection, slicing and setting of items by use of subscription on SaQC objects (e.g. `qc[key]` and `qc[key] = value`). - `SaQC`: support for selection, slicing and setting of items by subscription on `SaQC` objects
Selection works with single keys, collections of keys and string slices (e.g. `qc["a":"f"]`). Values can be SaQC objects, pd.Series,
Iterable of Series and dict-like with series values.
- `transferFlags` is a multivariate function - `transferFlags` is a multivariate function
- `plot`: added `yscope` keyword - `plot`: added `yscope` keyword
- `setFlags`: function to replace `flagManual` - `setFlags`: function to replace `flagManual`
- `flagUniLOF`: added defaultly applied correction to mitigate phenomenon of overflagging at relatively steep data value slopes. (parameter `slope_correct`). - `flagUniLOF`: added parameter `slope_correct` to correct for overflagging at relatively steep data value slopes
- `History`: added option to change aggregation behavior - `History`: added option to change aggregation behavior
- "horizontal" axis / multivariate mode for `rolling` - "horizontal" axis / multivariate mode for `rolling`
- Translation scheme `AnnotatedFloatScheme`
### Changed ### Changed
- `flagPattern` uses *fastdtw* package now to compute timeseries distances - `SaQC.flags` always returns a `DictOfSeries`
### Removed ### Removed
- `SaQC` methods deprecated in version 2.4: `interpolate`, `interpolateIndex`, `interpolateInvalid`, `roll`, `linear`,`shift`, `flagCrossStatistics`
- Method `Flags.toDios` deprecated in version 2.4
- Method `DictOfSeries.index_of` method deprecated in version 2.4
- Option `"complete"` for parameter `history` of method `plot`
- Option `"cycleskip"` for parameter `ax_kwargs` of method `plot`
- Parameter `phaseplot` from method `plot`
### Fixed ### Fixed
- `flagConstants`: fixed flagging of rolling ramps - `flagConstants`: fixed flagging of rolling ramps
- `Flags`: add meta entry to imported flags - `Flags`: add meta entry to imported flags
- group operations were overwriting existing flags - group operations were overwriting existing flags
- `SaQC._construct` : was not working for inherit classes (used hardcoded `SaQC` to construct a new instance). - `SaQC._construct` : was not working for inherited classes
- `processgeneric`: improved numpy function compatability - `processgeneric`: improved numpy function compatability
### Deprecated ### Deprecated
- `flagManual` in favor of `setFlags` - `flagManual` in favor of `setFlags`
- `inverse_` + methodstring options for `concatFlags` parameter `method` deprecated in favor of `invert=True` setting - `inverse_**` options for `concatFlags` parameter `method` in favor of `invert=True`
- `flagRaise` with delegation to better replacements `flagZScore`, `flagUniLOF`, `flagJumps` or `flagOffset` - `flagRaise` with delegation to better replacements `flagZScore`, `flagUniLOF`, `flagJumps` or `flagOffset`
- `flagByGrubbs` with delegation to better replacements `flagZScore`, `flagUniLOF`s - `flagByGrubbs` with delegation to better replacements `flagZScore`, `flagUniLOF`s
- `flagMVScore` with delegation to manual application of the steps - `flagMVScore` with delegation to manual application of the steps
## [2.5.0](https://git.ufz.de/rdm-software/saqc/-/tags/v2.4.1) - 2023-06-22
## [2.5.0](https://git.ufz.de/rdm-software/saqc/-/tags/v2.5.0) - 2023-09-05
[List of commits](https://git.ufz.de/rdm-software/saqc/-/compare/v2.4.1...v2.5.0) [List of commits](https://git.ufz.de/rdm-software/saqc/-/compare/v2.4.1...v2.5.0)
### Added ### Added
- WMO standard mean aggregations - WMO standard mean aggregations
......
...@@ -62,7 +62,7 @@ could look like [this](https://git.ufz.de/rdm-software/saqc/raw/develop/docs/res ...@@ -62,7 +62,7 @@ could look like [this](https://git.ufz.de/rdm-software/saqc/raw/develop/docs/res
``` ```
varname ; test varname ; test
#----------; --------------------------------------------------------------------- #----------; ---------------------------------------------------------------------
SM2 ; shift(freq="15Min") SM2 ; align(freq="15Min")
'SM(1|2)+' ; flagMissing() 'SM(1|2)+' ; flagMissing()
SM1 ; flagRange(min=10, max=60) SM1 ; flagRange(min=10, max=60)
SM2 ; flagRange(min=10, max=40) SM2 ; flagRange(min=10, max=40)
...@@ -103,7 +103,7 @@ data = pd.read_csv( ...@@ -103,7 +103,7 @@ data = pd.read_csv(
qc = SaQC(data=data) qc = SaQC(data=data)
qc = (qc qc = (qc
.shift("SM2", freq="15Min") .align("SM2", freq="15Min")
.flagMissing("SM(1|2)+", regex=True) .flagMissing("SM(1|2)+", regex=True)
.flagRange("SM1", min=10, max=60) .flagRange("SM1", min=10, max=60)
.flagRange("SM2", min=10, max=40) .flagRange("SM2", min=10, max=40)
......
...@@ -30,7 +30,7 @@ clean: ...@@ -30,7 +30,7 @@ clean:
# make documentation # make documentation
doc: doc:
# generate environment table from dictionary # generate environment table from dictionary
@$(SPHINXBUILD) -M html "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O) @ $(SPHINXBUILD) -M html "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
# run tests # run tests
test: test:
......
...@@ -315,10 +315,10 @@ Aggregation ...@@ -315,10 +315,10 @@ Aggregation
If we want to comprise several values by aggregation and assign the result to the new regular timestamp, instead of If we want to comprise several values by aggregation and assign the result to the new regular timestamp, instead of
selecting a single one, we can do this, with the :py:meth:`~saqc.SaQC.resample` method. selecting a single one, we can do this, with the :py:meth:`~saqc.SaQC.resample` method.
Lets resample the *SoilMoisture* data to have a *20* minutes sample rate by aggregating every *20* minutes intervals Lets resample the *SoilMoisture* data to have a *20* minutes sample rate by aggregating every *20* minutes intervals
content with the arithmetic mean (which is provided by the ``numpy.mean`` function for example). content with the arithmetic mean.
>>> import numpy as np >>> import numpy as np
>>> qc = qc.resample('SoilMoisture', target='SoilMoisture_mean', freq='20min', method='bagg', func=np.mean) >>> qc = qc.resample('SoilMoisture', target='SoilMoisture_mean', freq='20min', method='bagg', func="mean")
>>> qc.data # doctest: +SKIP >>> qc.data # doctest: +SKIP
SoilMoisture | SoilMoisture_mean | SoilMoisture | SoilMoisture_mean |
================================ | ===================================== | ================================ | ===================================== |
......
...@@ -140,7 +140,7 @@ Looking at the example data set more closely, we see that 2 of the 5 variables s ...@@ -140,7 +140,7 @@ Looking at the example data set more closely, we see that 2 of the 5 variables s
qc.plot(variables, xscope=slice('2017-05', '2017-11')) qc.plot(variables, xscope=slice('2017-05', '2017-11'))
Lets try to detect those drifts via saqc. The changes we observe in the data seem to develop significantly only in temporal spans over a month, Lets try to detect those drifts via saqc. The changes we observe in the data seem to develop significantly only in temporal spans over a month,
so we go for ``"1M"`` as value for the so we go for ``"1ME"`` as value for the
``window`` parameter. We identified the majority group as a group containing three variables, whereby two variables ``window`` parameter. We identified the majority group as a group containing three variables, whereby two variables
seem to be scattered away, so that we can leave the ``frac`` value at its default ``.5`` level. seem to be scattered away, so that we can leave the ``frac`` value at its default ``.5`` level.
The majority group seems on average not to be spread out more than 3 or 4 degrees. So, for the ``spread`` value The majority group seems on average not to be spread out more than 3 or 4 degrees. So, for the ``spread`` value
...@@ -152,7 +152,7 @@ average in a month from any member of the majority group. ...@@ -152,7 +152,7 @@ average in a month from any member of the majority group.
.. doctest:: flagDriftFromNorm .. doctest:: flagDriftFromNorm
>>> variables = ['temp1 [degC]', 'temp2 [degC]', 'temp3 [degC]', 'temp4 [degC]', 'temp5 [degC]'] >>> variables = ['temp1 [degC]', 'temp2 [degC]', 'temp3 [degC]', 'temp4 [degC]', 'temp5 [degC]']
>>> qc = qc.flagDriftFromNorm(variables, window='1M', spread=3) >>> qc = qc.flagDriftFromNorm(variables, window='1ME', spread=3)
.. plot:: .. plot::
:context: close-figs :context: close-figs
...@@ -160,7 +160,7 @@ average in a month from any member of the majority group. ...@@ -160,7 +160,7 @@ average in a month from any member of the majority group.
:class: center :class: center
>>> variables = ['temp1 [degC]', 'temp2 [degC]', 'temp3 [degC]', 'temp4 [degC]', 'temp5 [degC]'] >>> variables = ['temp1 [degC]', 'temp2 [degC]', 'temp3 [degC]', 'temp4 [degC]', 'temp5 [degC]']
>>> qc = qc.flagDriftFromNorm(variables, window='1M', spread=3) >>> qc = qc.flagDriftFromNorm(variables, window='1ME', spread=3)
Lets check the results: Lets check the results:
......
...@@ -147,19 +147,19 @@ Rolling Mean ...@@ -147,19 +147,19 @@ Rolling Mean
^^^^^^^^^^^^ ^^^^^^^^^^^^
Easiest thing to do, would be, to apply some rolling mean Easiest thing to do, would be, to apply some rolling mean
model via the method :py:meth:`saqc.SaQC.roll`. model via the method :py:meth:`saqc.SaQC.rolling`.
.. doctest:: exampleOD .. doctest:: exampleOD
>>> import numpy as np >>> import numpy as np
>>> qc = qc.roll(field='incidents', target='incidents_mean', func=np.mean, window='13D') >>> qc = qc.rolling(field='incidents', target='incidents_mean', func=np.mean, window='13D')
.. plot:: .. plot::
:context: :context:
:include-source: False :include-source: False
import numpy as np import numpy as np
qc = qc.roll(field='incidents', target='incidents_mean', func=np.mean, window='13D') qc = qc.rolling(field='incidents', target='incidents_mean', func=np.mean, window='13D')
The ``field`` parameter is passed the variable name, we want to calculate the rolling mean of. The ``field`` parameter is passed the variable name, we want to calculate the rolling mean of.
The ``target`` parameter holds the name, we want to store the results of the calculation to. The ``target`` parameter holds the name, we want to store the results of the calculation to.
...@@ -174,13 +174,13 @@ under the name ``np.median``. We just calculate another model curve for the ``"i ...@@ -174,13 +174,13 @@ under the name ``np.median``. We just calculate another model curve for the ``"i
.. doctest:: exampleOD .. doctest:: exampleOD
>>> qc = qc.roll(field='incidents', target='incidents_median', func=np.median, window='13D') >>> qc = qc.rolling(field='incidents', target='incidents_median', func=np.median, window='13D')
.. plot:: .. plot::
:context: :context:
:include-source: False :include-source: False
qc = qc.roll(field='incidents', target='incidents_median', func=np.median, window='13D') qc = qc.rolling(field='incidents', target='incidents_median', func=np.median, window='13D')
We chose another :py:attr:`target` value for the rolling *median* calculation, in order to not override our results from We chose another :py:attr:`target` value for the rolling *median* calculation, in order to not override our results from
the previous rolling *mean* calculation. the previous rolling *mean* calculation.
...@@ -318,18 +318,18 @@ for the point lying in the center of every window, we would define our function ...@@ -318,18 +318,18 @@ for the point lying in the center of every window, we would define our function
z_score = lambda D: abs((D[14] - np.mean(D)) / np.std(D)) z_score = lambda D: abs((D[14] - np.mean(D)) / np.std(D))
And subsequently, use the :py:meth:`~saqc.SaQC.roll` method to make a rolling window application with the scoring And subsequently, use the :py:meth:`~saqc.SaQC.rolling` method to make a rolling window application with the scoring
function: function:
.. doctest:: exampleOD .. doctest:: exampleOD
>>> qc = qc.roll(field='incidents_residuals', target='incidents_scores', func=z_score, window='27D', min_periods=27) >>> qc = qc.rolling(field='incidents_residuals', target='incidents_scores', func=z_score, window='27D', min_periods=27)
.. plot:: .. plot::
:context: close-figs :context: close-figs
:include-source: False :include-source: False
qc = qc.roll(field='incidents_residuals', target='incidents_scores', func=z_score, window='27D', min_periods=27) qc = qc.rolling(field='incidents_residuals', target='incidents_scores', func=z_score, window='27D', min_periods=27)
Optimization by Decomposition Optimization by Decomposition
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
...@@ -347,13 +347,13 @@ So the attempt works fine, only because our data set is small and strictly regul ...@@ -347,13 +347,13 @@ So the attempt works fine, only because our data set is small and strictly regul
Meaning that it has constant temporal distances between subsequent meassurements. Meaning that it has constant temporal distances between subsequent meassurements.
In order to tweak our calculations and make them much more stable, it might be useful to decompose the scoring In order to tweak our calculations and make them much more stable, it might be useful to decompose the scoring
into seperate calls to the :py:meth:`~saqc.SaQC.roll` function, by calculating the series of the into seperate calls to the :py:meth:`~saqc.SaQC.rolling` function, by calculating the series of the
residuals *mean* and *standard deviation* seperately: residuals *mean* and *standard deviation* seperately:
.. doctest:: exampleOD .. doctest:: exampleOD
>>> qc = qc.roll(field='incidents_residuals', target='residuals_mean', window='27D', func=np.mean) >>> qc = qc.rolling(field='incidents_residuals', target='residuals_mean', window='27D', func=np.mean)
>>> qc = qc.roll(field='incidents_residuals', target='residuals_std', window='27D', func=np.std) >>> qc = qc.rolling(field='incidents_residuals', target='residuals_std', window='27D', func=np.std)
>>> qc = qc.processGeneric(field=['incidents_scores', "residuals_mean", "residuals_std"], target="residuals_norm", >>> qc = qc.processGeneric(field=['incidents_scores', "residuals_mean", "residuals_std"], target="residuals_norm",
... func=lambda this, mean, std: (this - mean) / std) ... func=lambda this, mean, std: (this - mean) / std)
...@@ -362,15 +362,15 @@ residuals *mean* and *standard deviation* seperately: ...@@ -362,15 +362,15 @@ residuals *mean* and *standard deviation* seperately:
:context: close-figs :context: close-figs
:include-source: False :include-source: False
qc = qc.roll(field='incidents_residuals', target='residuals_mean', window='27D', func=np.mean) qc = qc.rolling(field='incidents_residuals', target='residuals_mean', window='27D', func=np.mean)
qc = qc.roll(field='incidents_residuals', target='residuals_std', window='27D', func=np.std) qc = qc.rolling(field='incidents_residuals', target='residuals_std', window='27D', func=np.std)
qc = qc.processGeneric(field=['incidents_scores', "residuals_mean", "residuals_std"], target="residuals_norm", func=lambda this, mean, std: (this - mean) / std) qc = qc.processGeneric(field=['incidents_scores', "residuals_mean", "residuals_std"], target="residuals_norm", func=lambda this, mean, std: (this - mean) / std)
With huge datasets, this will be noticably faster, compared to the method presented :ref:`initially <cookbooks/ResidualOutlierDetection:Scores>`\ , With huge datasets, this will be noticably faster, compared to the method presented :ref:`initially <cookbooks/ResidualOutlierDetection:Scores>`\ ,
because ``saqc`` dispatches the rolling with the basic numpy statistic methods to an optimized pandas built-in. because ``saqc`` dispatches the rolling with the basic numpy statistic methods to an optimized pandas built-in.
Also, as a result of the :py:meth:`~saqc.SaQC.roll` assigning its results to the center of every window, Also, as a result of the :py:meth:`~saqc.SaQC.rolling` assigning its results to the center of every window,
all the values are centered and we dont have to care about window center indices when we are generating all the values are centered and we dont have to care about window center indices when we are generating
the *Z*\ -Scores from the two series. the *Z*\ -Scores from the two series.
......
...@@ -11,4 +11,3 @@ Gap filling ...@@ -11,4 +11,3 @@ Gap filling
:nosignatures: :nosignatures:
~SaQC.interpolateByRolling ~SaQC.interpolateByRolling
~SaQC.interpolate
...@@ -15,3 +15,5 @@ Flagtools ...@@ -15,3 +15,5 @@ Flagtools
~SaQC.flagManual ~SaQC.flagManual
~SaQC.flagDummy ~SaQC.flagDummy
~SaQC.transferFlags ~SaQC.transferFlags
~SaQC.andGroup
~SaQC.orGroup
...@@ -13,6 +13,6 @@ Generic Functions ...@@ -13,6 +13,6 @@ Generic Functions
~SaQC.processGeneric ~SaQC.processGeneric
~SaQC.flagGeneric ~SaQC.flagGeneric
~SaQC.roll ~SaQC.andGroup
~SaQC.transform ~SaQC.orGroup
~SaQC.resample
...@@ -12,7 +12,6 @@ Multivariate outlier detection. ...@@ -12,7 +12,6 @@ Multivariate outlier detection.
.. autosummary:: .. autosummary::
:nosignatures: :nosignatures:
~SaQC.flagCrossStatistics
~SaQC.flagLOF ~SaQC.flagLOF
~SaQC.flagZScore ~SaQC.flagZScore
...@@ -10,10 +10,7 @@ Sampling Alignment ...@@ -10,10 +10,7 @@ Sampling Alignment
.. autosummary:: .. autosummary::
:nosignatures: :nosignatures:
~SaQC.linear
~SaQC.shift
~SaQC.align ~SaQC.align
~SaQC.concatFlags ~SaQC.concatFlags
~SaQC.interpolateIndex
~SaQC.resample ~SaQC.resample
~SaQC.reindex ~SaQC.reindex
...@@ -15,3 +15,4 @@ Tools ...@@ -15,3 +15,4 @@ Tools
~SaQC.renameField ~SaQC.renameField
~SaQC.selectTime ~SaQC.selectTime
~SaQC.plot ~SaQC.plot
...@@ -50,7 +50,7 @@ with something more elaborate, is in fact a one line change. So let's start with ...@@ -50,7 +50,7 @@ with something more elaborate, is in fact a one line change. So let's start with
from saqc import SaQC from saqc import SaQC
# we need some dummy data # we need some dummy data
values = np.array([12, 24, 36, 33, 89, 87, 45, 31, 18, 99]) values = np.array([12, 24, 36, 33, 89, 87, 45, 31, 18, 99], dtype="float")
dates = pd.date_range(start="2020-01-01", periods=len(values), freq="D") dates = pd.date_range(start="2020-01-01", periods=len(values), freq="D")
data = pd.DataFrame({"a": values}, index=dates) data = pd.DataFrame({"a": values}, index=dates)
# let's insert some constant values ... # let's insert some constant values ...
...@@ -103,32 +103,32 @@ number of different attributes, of which you likely might want to use the follow ...@@ -103,32 +103,32 @@ number of different attributes, of which you likely might want to use the follow
.. doctest:: python .. doctest:: python
>>> qc.data #doctest:+NORMALIZE_WHITESPACE >>> qc.data #doctest:+NORMALIZE_WHITESPACE
a | a |
================= | ================= |
2020-01-01 12.0 | 2020-01-01 12.0 |
2020-01-02 24.0 | 2020-01-02 24.0 |
2020-01-03 36.0 | 2020-01-03 36.0 |
2020-01-04 47.4 | 2020-01-04 47.4 |
2020-01-05 47.4 | 2020-01-05 47.4 |
2020-01-06 47.4 | 2020-01-06 47.4 |
2020-01-07 45.0 | 2020-01-07 45.0 |
2020-01-08 31.0 | 2020-01-08 31.0 |
2020-01-09 175.0 | 2020-01-09 175.0 |
2020-01-10 99.0 | 2020-01-10 99.0 |
>>> qc.flags #doctest:+NORMALIZE_WHITESPACE >>> qc.flags #doctest:+NORMALIZE_WHITESPACE
a | a |
===================== | ===================== |
2020-01-01 BAD | 2020-01-01 BAD |
2020-01-02 UNFLAGGED | 2020-01-02 UNFLAGGED |
2020-01-03 UNFLAGGED | 2020-01-03 UNFLAGGED |
2020-01-04 UNFLAGGED | 2020-01-04 UNFLAGGED |
2020-01-05 UNFLAGGED | 2020-01-05 UNFLAGGED |
2020-01-06 UNFLAGGED | 2020-01-06 UNFLAGGED |
2020-01-07 UNFLAGGED | 2020-01-07 UNFLAGGED |
2020-01-08 UNFLAGGED | 2020-01-08 UNFLAGGED |
2020-01-09 BAD | 2020-01-09 BAD |
2020-01-10 BAD | 2020-01-10 BAD |
Putting it together - The complete workflow Putting it together - The complete workflow
...@@ -142,7 +142,7 @@ The snippet below provides you with a compete example from the things we have se ...@@ -142,7 +142,7 @@ The snippet below provides you with a compete example from the things we have se
from saqc import SaQC from saqc import SaQC
# we need some dummy data # we need some dummy data
values = np.random.randint(low=0, high=100, size=100) values = np.random.randint(low=0, high=100, size=100).astype(float)
dates = pd.date_range(start="2020-01-01", periods=len(values), freq="D") dates = pd.date_range(start="2020-01-01", periods=len(values), freq="D")
data = pd.DataFrame({"a": values}, index=dates) data = pd.DataFrame({"a": values}, index=dates)
# let's insert some constant values ... # let's insert some constant values ...
......
...@@ -4,6 +4,7 @@ ...@@ -4,6 +4,7 @@
Click==8.1.7 Click==8.1.7
docstring_parser==0.16 docstring_parser==0.16
fancy-collections==0.3.0
fastdtw==0.3.4 fastdtw==0.3.4
matplotlib==3.8.3 matplotlib==3.8.3
numpy==1.26.4 numpy==1.26.4
...@@ -13,4 +14,3 @@ pandas==2.2.1 ...@@ -13,4 +14,3 @@ pandas==2.2.1
scikit-learn==1.4.1.post1 scikit-learn==1.4.1.post1
scipy==1.12.0 scipy==1.12.0
typing_extensions==4.10.0 typing_extensions==4.10.0
fancy-collections==0.2.1
...@@ -10,7 +10,13 @@ ...@@ -10,7 +10,13 @@
from saqc.constants import BAD, DOUBTFUL, FILTER_ALL, FILTER_NONE, GOOD, UNFLAGGED from saqc.constants import BAD, DOUBTFUL, FILTER_ALL, FILTER_NONE, GOOD, UNFLAGGED
from saqc.exceptions import ParsingError from saqc.exceptions import ParsingError
from saqc.core import Flags, DictOfSeries, SaQC from saqc.core import Flags, DictOfSeries, SaQC
from saqc.core.translation import DmpScheme, FloatScheme, PositionalScheme, SimpleScheme from saqc.core.translation import (
DmpScheme,
FloatScheme,
PositionalScheme,
SimpleScheme,
AnnotatedFloatScheme,
)
from saqc.parsing.reader import fromConfig from saqc.parsing.reader import fromConfig
from saqc.version import __version__ from saqc.version import __version__
......
...@@ -8,7 +8,6 @@ ...@@ -8,7 +8,6 @@
from __future__ import annotations from __future__ import annotations
import json
import logging import logging
from functools import partial from functools import partial
from pathlib import Path from pathlib import Path
...@@ -146,27 +145,27 @@ def main( ...@@ -146,27 +145,27 @@ def main(
saqc = cr.run() saqc = cr.run()
data_result = saqc.data.to_pandas() data_result = saqc.data
flags_result = saqc.flags flags_result = saqc.flags
if isinstance(flags_result, DictOfSeries):
flags_result = flags_result.to_pandas()
if outfile: if outfile:
data_result.columns = pd.MultiIndex.from_product(
[data_result.columns.tolist(), ["data"]]
)
if not isinstance(flags_result.columns, pd.MultiIndex):
flags_result.columns = pd.MultiIndex.from_product(
[flags_result.columns.tolist(), ["flags"]]
)
out = pd.concat([data_result, flags_result], axis=1).sort_index( out = DictOfSeries()
axis=1, level=0, sort_remaining=False for k in data_result.keys():
flagscol = flags_result[k]
if isinstance(flagscol, pd.Series):
flagscol = flagscol.rename("flags")
out[k] = pd.concat([data_result[k].rename("data"), flagscol], axis=1)
writeData(
writer,
out.to_pandas(
fill_value=-9999 if scheme == "positional" else np.nan,
multiindex=True,
),
outfile,
) )
writeData(writer, out, outfile)
if __name__ == "__main__": if __name__ == "__main__":
main() main()
...@@ -22,6 +22,7 @@ from saqc.core.frame import DictOfSeries ...@@ -22,6 +22,7 @@ from saqc.core.frame import DictOfSeries
from saqc.core.history import History from saqc.core.history import History
from saqc.core.register import FUNC_MAP from saqc.core.register import FUNC_MAP
from saqc.core.translation import ( from saqc.core.translation import (
AnnotatedFloatScheme,
DmpScheme, DmpScheme,
FloatScheme, FloatScheme,
PositionalScheme, PositionalScheme,
...@@ -41,6 +42,7 @@ TRANSLATION_SCHEMES = { ...@@ -41,6 +42,7 @@ TRANSLATION_SCHEMES = {
"float": FloatScheme, "float": FloatScheme,
"dmp": DmpScheme, "dmp": DmpScheme,
"positional": PositionalScheme, "positional": PositionalScheme,
"annotated-float": AnnotatedFloatScheme,
} }
...@@ -118,13 +120,13 @@ class SaQC(FunctionsMixin): ...@@ -118,13 +120,13 @@ class SaQC(FunctionsMixin):
self._attrs = dict(value) self._attrs = dict(value)
@property @property
def data(self) -> MutableMapping[str, pd.Series]: def data(self) -> DictOfSeries:
data = self._data data = self._data
data.attrs = self._attrs.copy() data.attrs = self._attrs.copy()
return data return data
@property @property
def flags(self) -> MutableMapping[str, pd.Series]: def flags(self) -> DictOfSeries:
flags = self._scheme.toExternal(self._flags, attrs=self._attrs) flags = self._scheme.toExternal(self._flags, attrs=self._attrs)
flags.attrs = self._attrs.copy() flags.attrs = self._attrs.copy()
return flags return flags
......
...@@ -474,24 +474,6 @@ class Flags: ...@@ -474,24 +474,6 @@ class Flags:
# ---------------------------------------------------------------------- # ----------------------------------------------------------------------
# transformation and representation # transformation and representation
def toDios(self) -> DictOfSeries:
"""
Transform the flags container to a ``DictOfSeries``.
.. deprecated:: 2.4
use `saqc.DictOfSeries(obj)` instead.
Returns
-------
DictOfSeries
"""
warnings.warn(
"toDios is deprecated, use `saqc.DictOfSeries(obj)` instead.",
category=DeprecationWarning,
)
return DictOfSeries(self).copy()
def toFrame(self) -> pd.DataFrame: def toFrame(self) -> pd.DataFrame:
""" """
Transform the flags container to a ``pd.DataFrame``. Transform the flags container to a ``pd.DataFrame``.
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment