Skip to content
Snippets Groups Projects

Compare revisions

Changes are shown as if the source revision was being merged into the target revision. Learn more about comparing revisions.

Source

Select target project
No results found

Target

Select target project
  • berntm/saqc
  • rdm-software/saqc
  • schueler/saqc
3 results
Show changes
Commits on Source (25)
Showing
with 347 additions and 198 deletions
...@@ -29,7 +29,7 @@ jobs: ...@@ -29,7 +29,7 @@ jobs:
fail-fast: false fail-fast: false
matrix: matrix:
os: ["windows-latest", "ubuntu-latest", "macos-latest"] os: ["windows-latest", "ubuntu-latest", "macos-latest"]
python-version: ["3.7", "3.8", "3.9", "3.10"] python-version: ["3.8", "3.9", "3.10"]
defaults: defaults:
run: run:
# somehow this also works for windows O.o ?? # somehow this also works for windows O.o ??
...@@ -61,11 +61,11 @@ jobs: ...@@ -61,11 +61,11 @@ jobs:
pytest tests dios/test -Werror pytest tests dios/test -Werror
python -m saqc --config docs/resources/data/config.csv --data docs/resources/data/data.csv --outfile /tmp/test.csv python -m saqc --config docs/resources/data/config.csv --data docs/resources/data/data.csv --outfile /tmp/test.csv
- name: run doc tests # - name: run doc tests
run: | # run: |
cd docs # cd docs
pip install -r requirements.txt # pip install -r requirements.txt
make doc # make doc
make test # make test
...@@ -75,20 +75,6 @@ coverage: ...@@ -75,20 +75,6 @@ coverage:
path: coverage.xml path: coverage.xml
# test saqc with python 3.7
python37:
stage: test
image: python:3.7
script:
- pytest tests dios/test -Werror --junitxml=report.xml
- python -m saqc --config docs/resources/data/config.csv --data docs/resources/data/data.csv --outfile /tmp/test.csv
artifacts:
when: always
reports:
junit: report.xml
# test saqc with python 3.8
python38: python38:
stage: test stage: test
script: script:
...@@ -100,7 +86,6 @@ python38: ...@@ -100,7 +86,6 @@ python38:
junit: report.xml junit: report.xml
# test saqc with python 3.9
python39: python39:
stage: test stage: test
image: python:3.9 image: python:3.9
...@@ -113,7 +98,6 @@ python39: ...@@ -113,7 +98,6 @@ python39:
junit: report.xml junit: report.xml
# test saqc with python 3.10
python310: python310:
stage: test stage: test
image: python:3.10 image: python:3.10
...@@ -125,7 +109,6 @@ python310: ...@@ -125,7 +109,6 @@ python310:
reports: reports:
junit: report.xml junit: report.xml
doctest: doctest:
stage: test stage: test
script: script:
......
...@@ -7,12 +7,30 @@ SPDX-License-Identifier: GPL-3.0-or-later ...@@ -7,12 +7,30 @@ SPDX-License-Identifier: GPL-3.0-or-later
# Changelog # Changelog
## Unreleased ## Unreleased
[List of commits](https://git.ufz.de/rdm-software/saqc/-/compare/v2.2.1...develop) [List of commits](https://git.ufz.de/rdm-software/saqc/-/compare/v2.3.0...develop)
### Added ### Added
### Changed ### Changed
### Removed ### Removed
### Fixed ### Fixed
## [2.3.0](https://git.ufz.de/rdm-software/saqc/-/tags/v2.3.0) - 2023-01-17
[List of commits](https://git.ufz.de/rdm-software/saqc/-/compare/v2.2.1...v2.3.0)
### Added
- add option to not overwrite existing flags to `concatFlags`
- add option to pass existing axis object to `plot`
- python 3.11 support
### Changed
- Remove all flag value restrictions from the default flagging scheme `FloatTranslator`
- Renamed `TranslationScheme.forward` to `TranslationScheme.toInternal`
- Renamed `TranslationScheme.backward` to `TranslationScheme.toExternal`
- Changed default value of the parameter `limit` for `SaQC.interpolateIndex` and `SaQC.interpolateInvalid` to ``None``
- Changed default value of the parameter ``overwrite`` for ``concatFlags`` to ``False``
- Deprecate ``transferFlags`` in favor of ``concatFlags``
### Removed
- python 3.7 support
### Fixed
- Error for interpolations with limits set to be greater than 2 (`interpolateNANs`)
## [2.2.1](https://git.ufz.de/rdm-software/saqc/-/tags/v2.2.1) - 2022-10-29 ## [2.2.1](https://git.ufz.de/rdm-software/saqc/-/tags/v2.2.1) - 2022-10-29
[List of commits](https://git.ufz.de/rdm-software/saqc/-/compare/v2.2.0...v2.2.1) [List of commits](https://git.ufz.de/rdm-software/saqc/-/compare/v2.2.0...v2.2.1)
### Added ### Added
...@@ -29,7 +47,7 @@ SPDX-License-Identifier: GPL-3.0-or-later ...@@ -29,7 +47,7 @@ SPDX-License-Identifier: GPL-3.0-or-later
- translation of `dfilter` - translation of `dfilter`
- new generic function `clip` - new generic function `clip`
- parameter `min_periods` to `SaQC.flagConstants` - parameter `min_periods` to `SaQC.flagConstants`
- function `fitButterworth` - function `fitLowpassFilter`
- tracking interpolation routines in `History` - tracking interpolation routines in `History`
### Changed ### Changed
- test function interface changed to `func(saqc: SaQC, field: str | Sequence[str], *args, **kwargs)` - test function interface changed to `func(saqc: SaQC, field: str | Sequence[str], *args, **kwargs)`
......
...@@ -18,9 +18,6 @@ help: ...@@ -18,9 +18,6 @@ help:
.PHONY: help Makefile clean .PHONY: help Makefile clean
test:
for k in $(MDLIST); do echo docs/"$$k"; done
# clean sphinx generated stuff # clean sphinx generated stuff
clean: clean:
rm -rf _build _static _api rm -rf _build _static _api
......
...@@ -347,7 +347,7 @@ correlated with relatively high *kNNscores*, we could try to calculate a thresho ...@@ -347,7 +347,7 @@ correlated with relatively high *kNNscores*, we could try to calculate a thresho
`STRAY <https://arxiv.org/pdf/1908.04000.pdf>`_ algorithm, which is available as the method: `STRAY <https://arxiv.org/pdf/1908.04000.pdf>`_ algorithm, which is available as the method:
:py:meth:`~saqc.SaQC.flagByStray`. This method will mark some samples of the `kNNscore` variable as anomaly. :py:meth:`~saqc.SaQC.flagByStray`. This method will mark some samples of the `kNNscore` variable as anomaly.
Subsequently we project this marks (or *flags*) on to the *sac* variable with a call to Subsequently we project this marks (or *flags*) on to the *sac* variable with a call to
:py:meth:`~saqc.SaQC.transferFlags`. For the sake of demonstration, we also project the flags :py:meth:`~saqc.SaQC.concatFlags`. For the sake of demonstration, we also project the flags
on the normalized *sac* and plot the flagged values in the *sac254_norm* - *level_norm* feature space. on the normalized *sac* and plot the flagged values in the *sac254_norm* - *level_norm* feature space.
...@@ -355,8 +355,8 @@ on the normalized *sac* and plot the flagged values in the *sac254_norm* - *leve ...@@ -355,8 +355,8 @@ on the normalized *sac* and plot the flagged values in the *sac254_norm* - *leve
.. doctest:: exampleMV .. doctest:: exampleMV
>>> qc = qc.flagByStray(field='kNNscores', freq='30D', alpha=.3) >>> qc = qc.flagByStray(field='kNNscores', freq='30D', alpha=.3)
>>> qc = qc.transferFlags(field='kNNscores', target='sac254_corrected', label='STRAY') >>> qc = qc.concatFlags(field='kNNscores', target='sac254_corrected', label='STRAY')
>>> qc = qc.transferFlags(field='kNNscores', target='sac254_norm', label='STRAY') >>> qc = qc.concatFlags(field='kNNscores', target='sac254_norm', label='STRAY')
>>> qc.plot('sac254_corrected', xscope='2016-11') # doctest:+SKIP >>> qc.plot('sac254_corrected', xscope='2016-11') # doctest:+SKIP
>>> qc.plot('sac254_norm', phaseplot='level_norm', xscope='2016-11') # doctest:+SKIP >>> qc.plot('sac254_norm', phaseplot='level_norm', xscope='2016-11') # doctest:+SKIP
...@@ -365,8 +365,8 @@ on the normalized *sac* and plot the flagged values in the *sac254_norm* - *leve ...@@ -365,8 +365,8 @@ on the normalized *sac* and plot the flagged values in the *sac254_norm* - *leve
:include-source: False :include-source: False
qc = qc.flagByStray(field='kNNscores', freq='30D', alpha=.3) qc = qc.flagByStray(field='kNNscores', freq='30D', alpha=.3)
qc = qc.transferFlags(field='kNNscores', target='sac254_corrected', label='STRAY') qc = qc.concatFlags(field='kNNscores', target='sac254_corrected', label='STRAY')
qc = qc.transferFlags(field='kNNscores', target='sac254_norm', label='STRAY') qc = qc.concatFlags(field='kNNscores', target='sac254_norm', label='STRAY')
.. plot:: .. plot::
:context: close-figs :context: close-figs
......
...@@ -273,7 +273,7 @@ To see all the results obtained so far, plotted in one figure window, we make us ...@@ -273,7 +273,7 @@ To see all the results obtained so far, plotted in one figure window, we make us
.. doctest:: exampleOD .. doctest:: exampleOD
>>> data.to_df().plot() >>> data.to_df().plot()
<AxesSubplot:> <AxesSubplot: >
.. plot:: .. plot::
:context: :context:
......
...@@ -3,11 +3,10 @@ ...@@ -3,11 +3,10 @@
# SPDX-License-Identifier: GPL-3.0-or-later # SPDX-License-Identifier: GPL-3.0-or-later
recommonmark==0.7.1 recommonmark==0.7.1
sphinx<6 sphinx<7
sphinx-automodapi==0.14.1 sphinx-automodapi==0.14.1
sphinxcontrib-fulltoc==1.2.0 sphinxcontrib-fulltoc==1.2.0
sphinx-markdown-tables==0.0.17 sphinx-markdown-tables==0.0.17
m2r==0.2.1
jupyter-sphinx==0.3.2 jupyter-sphinx==0.3.2
sphinx_autodoc_typehints==1.18.2 sphinx_autodoc_typehints==1.18.2
sphinx-tabs==3.4.1 sphinx-tabs==3.4.1
...@@ -16,6 +16,6 @@ water_z ; transform(field=['water_temp_raw'], func=zScore(x), fr ...@@ -16,6 +16,6 @@ water_z ; transform(field=['water_temp_raw'], func=zScore(x), fr
sac_z ; transform(field=['sac254_raw'], func=zScore(x), freq='20D') sac_z ; transform(field=['sac254_raw'], func=zScore(x), freq='20D')
kNN_scores ; assignKNNScore(field=['level_z', 'water_z', 'sac_z'], freq='20D') kNN_scores ; assignKNNScore(field=['level_z', 'water_z', 'sac_z'], freq='20D')
kNN_scores ; flagByStray(freq='20D') kNN_scores ; flagByStray(freq='20D')
level_raw ; transferFlags(field=['kNN_scores'], label='STRAY') level_raw ; concatFlags(field=['kNN_scores'], label='STRAY')
sac254_corr ; transferFlags(field=['kNN_scores'], label='STRAY') sac254_corr ; concatFlags(field=['kNN_scores'], label='STRAY')
water_temp_raw ; transferFlags(field=['kNN_scores'], label='STRAY') water_temp_raw ; concatFlags(field=['kNN_scores'], label='STRAY')
\ No newline at end of file
docs/resources/temp/SM1processingResults.png

58.8 KiB | W: 0px | H: 0px

docs/resources/temp/SM1processingResults.png

58.8 KiB | W: 0px | H: 0px

docs/resources/temp/SM1processingResults.png
docs/resources/temp/SM1processingResults.png
docs/resources/temp/SM1processingResults.png
docs/resources/temp/SM1processingResults.png
  • 2-up
  • Swipe
  • Onion skin
docs/resources/temp/SM2processingResults.png

147 KiB | W: 0px | H: 0px

docs/resources/temp/SM2processingResults.png

147 KiB | W: 0px | H: 0px

docs/resources/temp/SM2processingResults.png
docs/resources/temp/SM2processingResults.png
docs/resources/temp/SM2processingResults.png
docs/resources/temp/SM2processingResults.png
  • 2-up
  • Swipe
  • Onion skin
...@@ -4,13 +4,12 @@ ...@@ -4,13 +4,12 @@
Click==8.1.3 Click==8.1.3
dtw==1.4.0 dtw==1.4.0
hypothesis==6.55.0 matplotlib==3.6.2
matplotlib==3.5.3 numba==0.56.4
numba==0.56.3 numpy==1.23.5
numpy==1.21.6
outlier-utils==0.0.3 outlier-utils==0.0.3
pyarrow==9.0.0 pyarrow==10.0.1
pandas==1.3.5 pandas==1.3.5
scikit-learn==1.0.2 scikit-learn==1.2.0
scipy==1.7.3 scipy==1.10.0
typing_extensions==4.3.0 typing_extensions==4.4.0
...@@ -110,7 +110,7 @@ class SaQC(FunctionsMixin): ...@@ -110,7 +110,7 @@ class SaQC(FunctionsMixin):
@property @property
def flags(self) -> MutableMapping: def flags(self) -> MutableMapping:
flags = self._scheme.backward(self._flags, attrs=self._attrs, raw=True) flags = self._scheme.toExternal(self._flags, attrs=self._attrs)
flags.attrs = self._attrs.copy() flags.attrs = self._attrs.copy()
return flags return flags
......
...@@ -6,7 +6,7 @@ ...@@ -6,7 +6,7 @@
from __future__ import annotations from __future__ import annotations
from typing import DefaultDict, Dict, Iterable, Mapping, Optional, Tuple, Type, Union from typing import DefaultDict, Dict, Iterable, Mapping, Tuple, Type, Union
import numpy as np import numpy as np
import pandas as pd import pandas as pd
...@@ -147,7 +147,7 @@ class Flags: ...@@ -147,7 +147,7 @@ class Flags:
0 True 0 True
1 False 1 False
2 True 2 True
Name: 2, dtype: bool dtype: bool
.. doctest:: exampleFlags .. doctest:: exampleFlags
...@@ -191,9 +191,7 @@ class Flags: ...@@ -191,9 +191,7 @@ class Flags:
2 -inf 25.0 25.0 0.0 99.0 2 -inf 25.0 25.0 0.0 99.0
""" """
def __init__( def __init__(self, raw_data: DictLike | Flags | None = None, copy: bool = False):
self, raw_data: Optional[Union[DictLike, Flags]] = None, copy: bool = False
):
self._data: dict[str, History] self._data: dict[str, History]
......
...@@ -8,10 +8,11 @@ from __future__ import annotations ...@@ -8,10 +8,11 @@ from __future__ import annotations
from copy import copy as shallowcopy from copy import copy as shallowcopy
from copy import deepcopy from copy import deepcopy
from typing import Any, Callable, Dict, List, Tuple, Union from typing import Any, Callable, Dict, List, Tuple
import numpy as np import numpy as np
import pandas as pd import pandas as pd
from pandas.api.types import is_categorical_dtype, is_float_dtype
from saqc.constants import UNFLAGGED from saqc.constants import UNFLAGGED
...@@ -45,8 +46,32 @@ class History: ...@@ -45,8 +46,32 @@ class History:
def __init__(self, index: pd.Index | None): def __init__(self, index: pd.Index | None):
self.hist = pd.DataFrame(index=index) self._hist = pd.DataFrame(index=index)
self.meta = [] self._meta = []
@property
def hist(self):
return self._hist.astype(float, copy=True)
@hist.setter
def hist(self, value: pd.DataFrame) -> None:
self._validateHist(value)
if len(value.columns) != len(self._meta):
raise ValueError(
"passed history does not match existing meta. "
"To use a new `hist` with new `meta` use "
"'History.createFromData(new_hist, new_meta)'"
)
self._hist = value.astype("category", copy=True)
@property
def meta(self) -> list[dict[str, Any]]:
return list(self._meta)
@meta.setter
def meta(self, value: list[dict[str, Any]]) -> None:
self._validateMetaList(value, self._hist)
self._meta = deepcopy(value)
@property @property
def index(self) -> pd.Index: def index(self) -> pd.Index:
...@@ -66,7 +91,7 @@ class History: ...@@ -66,7 +91,7 @@ class History:
------- -------
index : pd.Index index : pd.Index
""" """
return self.hist.index return self._hist.index
@property @property
def columns(self) -> pd.Index: def columns(self) -> pd.Index:
...@@ -80,7 +105,7 @@ class History: ...@@ -80,7 +105,7 @@ class History:
------- -------
columns : pd.Index columns : pd.Index
""" """
return self.hist.columns return self._hist.columns
@property @property
def empty(self) -> bool: def empty(self) -> bool:
...@@ -118,15 +143,11 @@ class History: ...@@ -118,15 +143,11 @@ class History:
# all following code must handle a passed empty series # all following code must handle a passed empty series
# ensure continuous increasing columns # ensure continuous increasing columns
assert 0 <= pos <= len(self) assert 0 <= pos <= len(self.columns)
self._hist[pos] = s.astype("category")
self.hist[pos] = s.astype("category")
return self return self
def append( def append(self, value: pd.Series | History, meta: dict | None = None) -> History:
self, value: Union[pd.Series, History], meta: dict | None = None
) -> History:
""" """
Create a new FH column and insert given pd.Series to it. Create a new FH column and insert given pd.Series to it.
...@@ -157,8 +178,7 @@ class History: ...@@ -157,8 +178,7 @@ class History:
if meta is None: if meta is None:
meta = {} meta = {}
elif not isinstance(meta, dict):
if not isinstance(meta, dict):
raise TypeError("'meta' must be of type None or dict") raise TypeError("'meta' must be of type None or dict")
val = self._validateValue(value) val = self._validateValue(value)
...@@ -166,10 +186,10 @@ class History: ...@@ -166,10 +186,10 @@ class History:
raise ValueError("Index does not match") raise ValueError("Index does not match")
self._insert(val, pos=len(self)) self._insert(val, pos=len(self))
self.meta.append(meta.copy()) self._meta.append(meta.copy())
return self return self
def _appendHistory(self, value: History): def _appendHistory(self, value: History) -> History:
""" """
Append multiple columns of a history to self. Append multiple columns of a history to self.
...@@ -190,44 +210,107 @@ class History: ...@@ -190,44 +210,107 @@ class History:
----- -----
This ignores the column names of the passed History. This ignores the column names of the passed History.
""" """
self._validate(value.hist, value.meta) self._validate(value._hist, value._meta)
if not value.index.equals(self.index): if not value.index.equals(self.index):
raise ValueError("Index does not match") raise ValueError("Index does not match")
# we copy shallow because we only want to set new columns # we copy shallow because we only want to set new columns
# the actual data copy happens in calls to astype # the actual data copy happens in calls to astype
value_hist = value.hist.copy(deep=False) value_hist = value._hist.copy(deep=False)
value_meta = value.meta.copy() value_meta = value._meta.copy()
# rename columns, to avoid ``pd.DataFrame.loc`` become confused # rename columns, to avoid ``pd.DataFrame.loc`` become confused
n = len(self.columns) n = len(self.columns)
columns = pd.Index(range(n, n + len(value_hist.columns))) columns = pd.Index(range(n, n + len(value_hist.columns)))
value_hist.columns = columns value_hist.columns = columns
hist = self.hist.astype(float) hist = self._hist.astype(float)
hist.loc[:, columns] = value_hist.astype(float) hist.loc[:, columns] = value_hist.astype(float)
self.hist = hist.astype("category") self._hist = hist.astype("category")
self.meta += value_meta self._meta += value_meta
return self return self
def squeeze(self, raw=False) -> pd.Series: def squeeze(
self, raw: bool = False, start: int | None = None, end: int | None = None
) -> pd.Series:
""" """
Get the last flag value per row of the FH. Reduce history to a series, by taking the last set value per row.
By passing `start` and/or `end` only a slice of the history is used.
This can be used to get the values of an earlier test. See the
Examples.
Parameters
----------
raw : bool, default False
If True, 'unset' values are represented by `nan`,
otherwise, 'unset' values are represented by the
`UNFLAGGED` (`-inf`) constant
start : int, default None
The first history column to use (inclusive).
end : int, default None
The last history column to use (exclusive).
Returns Returns
------- -------
pd.Series pandas.Series
Examples
--------
>>> from saqc.core.history import History
>>> s0 = pd.Series([np.nan, np.nan, 99.])
>>> s1 = pd.Series([1., 1., np.nan])
>>> s2 = pd.Series([2., np.nan, 2.])
>>> h = History(pd.Index([0,1,2])).append(s0).append(s1).append(s2)
>>> h
0 1 2
0 nan 1.0 2.0
1 nan 1.0 nan
2 99.0 nan 2.0
Get current flags.
>>> h.squeeze()
0 2.0
1 1.0
2 2.0
dtype: float64
Get only the flags that the last function had set:
>>> h.squeeze(start=-1)
0 2.0
1 -inf
2 2.0
dtype: float64
Get the flags before the last function run:
>>> h.squeeze(end=-1)
0 1.0
1 1.0
2 99.0
dtype: float64
Get only the flags that the 2nd function had set:
>>> h.squeeze(start=1, end=2)
0 1.0
1 1.0
2 -inf
dtype: float64
""" """
result = self.hist.astype(float) hist = self._hist.iloc[:, slice(start, end)].astype(float)
if result.empty: if hist.empty:
result = pd.DataFrame(data=np.nan, index=self.hist.index, columns=[0]) result = pd.Series(data=np.nan, index=self._hist.index, dtype=float)
result = result.ffill(axis=1).iloc[:, -1]
if raw:
return result
else: else:
return result.fillna(UNFLAGGED) result = hist.ffill(axis=1).iloc[:, -1]
if not raw:
result = result.fillna(UNFLAGGED)
result.name = None
return result
def reindex( def reindex(
self, index: pd.Index, fill_value_last: float = UNFLAGGED, copy: bool = True self, index: pd.Index, fill_value_last: float = UNFLAGGED, copy: bool = True
...@@ -251,17 +334,11 @@ class History: ...@@ -251,17 +334,11 @@ class History:
------- -------
History History
""" """
# Note: code must handle empty frames
out = self.copy() if copy else self out = self.copy() if copy else self
hist = out._hist.astype(float).reindex(index=index, copy=False)
hist = out.hist.astype(float).reindex(
index=index, copy=False, fill_value=np.nan
)
# Note: all following code must handle empty frames
hist.iloc[:, -1:] = hist.iloc[:, -1:].fillna(fill_value_last) hist.iloc[:, -1:] = hist.iloc[:, -1:].fillna(fill_value_last)
out._hist = hist.astype("category")
out.hist = hist.astype("category")
return out return out
def apply( def apply(
...@@ -271,7 +348,7 @@ class History: ...@@ -271,7 +348,7 @@ class History:
func_kws: dict, func_kws: dict,
func_handle_df: bool = False, func_handle_df: bool = False,
copy: bool = True, copy: bool = True,
): ) -> History:
""" """
Apply a function on each column in history. Apply a function on each column in history.
...@@ -309,27 +386,31 @@ class History: ...@@ -309,27 +386,31 @@ class History:
Returns Returns
------- -------
history with altered columns History with altered columns
""" """
hist = pd.DataFrame(index=index) hist = pd.DataFrame(index=index)
# implicit copy by astype # convert data to floats as functions may fail with categorical dtype
# convert data to floats as functions may fail with categoricals
if func_handle_df: if func_handle_df:
hist = func(self.hist.astype(float), **func_kws) hist = func(self._hist.astype(float, copy=True), **func_kws)
else: else:
for pos in self.columns: for pos in self.columns:
hist[pos] = func(self.hist[pos].astype(float), **func_kws) hist[pos] = func(self._hist[pos].astype(float, copy=True), **func_kws)
History._validate(hist, self.meta) try:
self._validate(hist, self._meta)
except Exception as e:
raise ValueError(
f"result from applied function is not a valid History, because {e}"
) from e
if copy: if copy:
history = History(index=None) # noqa history = History(index=None) # noqa
history.meta = self.meta.copy() history._meta = self._meta.copy()
else: else:
history = self history = self
history.hist = hist.astype("category") history._hist = hist.astype("category")
return history return history
...@@ -350,8 +431,8 @@ class History: ...@@ -350,8 +431,8 @@ class History:
""" """
copyfunc = deepcopy if deep else shallowcopy copyfunc = deepcopy if deep else shallowcopy
new = History(self.index) new = History(self.index)
new.hist = self.hist.copy(deep) new._hist = self._hist.copy(deep)
new.meta = copyfunc(self.meta) new._meta = copyfunc(self._meta)
return new return new
def __copy__(self): def __copy__(self):
...@@ -367,14 +448,14 @@ class History: ...@@ -367,14 +448,14 @@ class History:
return self.copy(deep=True) return self.copy(deep=True)
def __len__(self) -> int: def __len__(self) -> int:
return len(self.hist.columns) return len(self._hist.columns)
def __repr__(self): def __repr__(self):
if self.empty: if self.empty:
return str(self.hist).replace("DataFrame", "History") return str(self._hist).replace("DataFrame", "History")
r = self.hist.astype(str) r = self._hist.astype(str)
return str(r)[1:] return str(r)[1:]
...@@ -382,51 +463,62 @@ class History: ...@@ -382,51 +463,62 @@ class History:
# validation # validation
# #
@staticmethod @classmethod
def _validate(hist: pd.DataFrame, meta: List[Any]) -> Tuple[pd.DataFrame, List]: def _validate(
cls, hist: pd.DataFrame, meta: List[Any]
) -> Tuple[pd.DataFrame, List]:
""" """
check type, columns, index, dtype of hist and if the meta fits also check type, columns, index, dtype of hist and if the meta fits also
""" """
cls._validateHist(hist)
cls._validateMetaList(meta, hist)
return hist, meta
# check hist @classmethod
if not isinstance(hist, pd.DataFrame): def _validateHist(cls, obj):
if not isinstance(obj, pd.DataFrame):
raise TypeError( raise TypeError(
f"'hist' must be of type pd.DataFrame, but is of type {type(hist).__name__}" f"'hist' must be of type pd.DataFrame, "
f"but is of type {type(obj).__name__}"
) )
# isin([float, ..]) does not work ! if not obj.columns.equals(pd.RangeIndex(len(obj.columns))):
if not (
(hist.dtypes == float)
| (hist.dtypes == np.float32)
| (hist.dtypes == np.float64)
| (hist.dtypes == "category")
).all():
raise ValueError( raise ValueError(
"dtype of all columns in hist must be float or categorical" "Columns of 'hist' must consist of "
) "continuous increasing integers, "
"starting with 0."
if not hist.empty and (
not hist.columns.equals(pd.Index(range(len(hist.columns))))
or not np.issubdtype(hist.columns.dtype, np.integer)
):
raise ValueError(
"column names must be continuous increasing int's, starting with 0."
) )
for c in obj.columns:
try:
cls._validateValue(obj[c])
except Exception as e:
raise ValueError(f"Bad column in hist. column '{c}': {e}") from None
return obj
# check meta @classmethod
if not isinstance(meta, list): def _validateMetaList(cls, obj, hist=None):
if not isinstance(obj, list):
raise TypeError( raise TypeError(
f"'meta' must be of type list, but is of type {type(meta).__name__}" f"'meta' must be of type list, got type {type(obj).__name__}"
)
if not all([isinstance(e, dict) for e in meta]):
raise TypeError("All elements in meta must be of type 'dict'")
# check combinations of hist and meta
if not len(hist.columns) == len(meta):
raise ValueError(
"'meta' must have as many entries as columns exist in hist"
) )
if hist is not None:
if not len(obj) == len(hist.columns):
raise ValueError(
"'meta' must have as many entries as columns in 'hist'"
)
for i, item in enumerate(obj):
try:
cls._validateMetaDict(item)
except Exception as e:
raise ValueError(f"Bad meta. item {i}: {e}") from None
return obj
return hist, meta @staticmethod
def _validateMetaDict(obj):
if not isinstance(obj, dict):
raise TypeError("obj must be dict")
if not all(isinstance(k, str) for k in obj.keys()):
raise ValueError("all keys in dict must be strings")
return obj
@staticmethod @staticmethod
def _validateValue(obj: pd.Series) -> pd.Series: def _validateValue(obj: pd.Series) -> pd.Series:
...@@ -435,14 +527,52 @@ class History: ...@@ -435,14 +527,52 @@ class History:
""" """
if not isinstance(obj, pd.Series): if not isinstance(obj, pd.Series):
raise TypeError( raise TypeError(
f"value must be of type pd.Series, but {type(obj).__name__} was given" f"value must be of type pd.Series, got type {type(obj).__name__}"
) )
if not is_float_dtype(obj.dtype) and not is_categorical_dtype(obj.dtype):
if not ((obj.dtype == float) or isinstance(obj.dtype, pd.CategoricalDtype)):
raise ValueError("dtype must be float or categorical") raise ValueError("dtype must be float or categorical")
return obj return obj
@classmethod
def createFromData(cls, hist: pd.DataFrame, meta: List[Dict], copy: bool = False):
"""
Create a History from existing data.
Parameters
----------
hist : pd.Dataframe
Data that define the flags of the history.
meta : List of dict
A list holding meta information for each column, therefore it must
have the same number of entries as columns exist in `hist`.
copy : bool, default False
If `True`, the input data is copied, otherwise not.
Notes
-----
To create a very simple History from a flags dataframe ``f`` use
``mask = pd.DataFrame(True, index=f.index, columns=f.columns``
and
``meta = [{}] * len(f.columns)``.
Returns
-------
History
"""
cls._validate(hist, meta)
if copy:
hist = hist.copy()
meta = deepcopy(meta)
history = cls(index=None) # noqa
history._hist = hist.astype("category", copy=False)
history._meta = meta
return history
def createHistoryFromData( def createHistoryFromData(
hist: pd.DataFrame, hist: pd.DataFrame,
...@@ -476,13 +606,10 @@ def createHistoryFromData( ...@@ -476,13 +606,10 @@ def createHistoryFromData(
------- -------
History History
""" """
History._validate(hist, meta) # todo: expose History, enable this warning
# warnings.warn(
if copy: # "saqc.createHistoryFromData() will be deprecated soon. "
hist = hist.copy() # "Please use saqc.History.createFromData() instead.",
meta = deepcopy(meta) # category=FutureWarning,
# )
history = History(index=None) # noqa return History.createFromData(hist, meta, copy)
history.hist = hist.astype("category", copy=False)
history.meta = meta
return history
...@@ -147,20 +147,12 @@ def _squeezeFlags(old_flags, new_flags: Flags, columns: pd.Index, meta) -> Flags ...@@ -147,20 +147,12 @@ def _squeezeFlags(old_flags, new_flags: Flags, columns: pd.Index, meta) -> Flags
# function call. If no such columns exist, we end up with an empty # function call. If no such columns exist, we end up with an empty
# new_history. # new_history.
start = len(old_history.columns) start = len(old_history.columns)
new_history = _sliceHistory(new_history, slice(start, None)) squeezed = new_history.squeeze(raw=True, start=start)
squeezed = new_history.squeeze(raw=True)
out.history[col] = out.history[col].append(squeezed, meta=meta) out.history[col] = out.history[col].append(squeezed, meta=meta)
return out return out
def _sliceHistory(history: History, sl: slice) -> History:
history.hist = history.hist.iloc[:, sl]
history.meta = history.meta[sl]
return history
def _maskData( def _maskData(
data: dios.DictOfSeries, flags: Flags, columns: Sequence[str], thresh: float data: dios.DictOfSeries, flags: Flags, columns: Sequence[str], thresh: float
) -> Tuple[dios.DictOfSeries, dios.DictOfSeries]: ) -> Tuple[dios.DictOfSeries, dios.DictOfSeries]:
......
...@@ -7,6 +7,7 @@ ...@@ -7,6 +7,7 @@
# -*- coding: utf-8 -*- # -*- coding: utf-8 -*-
from saqc.core.translation.basescheme import ( from saqc.core.translation.basescheme import (
FloatScheme, FloatScheme,
MappingScheme,
SimpleScheme, SimpleScheme,
TranslationScheme, TranslationScheme,
) )
......
...@@ -8,6 +8,7 @@ ...@@ -8,6 +8,7 @@
from __future__ import annotations from __future__ import annotations
from abc import abstractmethod, abstractproperty
from typing import Any, Dict from typing import Any, Dict
import numpy as np import numpy as np
...@@ -22,7 +23,26 @@ ForwardMap = Dict[ExternalFlag, float] ...@@ -22,7 +23,26 @@ ForwardMap = Dict[ExternalFlag, float]
BackwardMap = Dict[float, ExternalFlag] BackwardMap = Dict[float, ExternalFlag]
class TranslationScheme: class TranslationScheme: # pragma: no cover
@property
@abstractmethod
def DFILTER_DEFAULT(self):
pass
@abstractmethod
def __call__(self, flag: ExternalFlag) -> float:
pass
@abstractmethod
def toInternal(self, flags: pd.DataFrame | DictOfSeries) -> Flags:
pass
@abstractmethod
def toExternal(self, flags: Flags, attrs: dict | None = None) -> DictOfSeries:
pass
class MappingScheme(TranslationScheme):
""" """
This class provides the basic translation mechanism and should serve as This class provides the basic translation mechanism and should serve as
a base class for every other translation scheme. a base class for every other translation scheme.
...@@ -81,7 +101,7 @@ class TranslationScheme: ...@@ -81,7 +101,7 @@ class TranslationScheme:
@staticmethod @staticmethod
def _translate( def _translate(
flags: Flags | pd.DataFrame | pd.Series, flags: Flags | pd.DataFrame | pd.Series | DictOfSeries,
trans_map: ForwardMap | BackwardMap, trans_map: ForwardMap | BackwardMap,
) -> DictOfSeries: ) -> DictOfSeries:
""" """
...@@ -95,7 +115,7 @@ class TranslationScheme: ...@@ -95,7 +115,7 @@ class TranslationScheme:
Returns Returns
------- -------
pd.DataFrame, Flags DictOfSeries
""" """
if isinstance(flags, pd.Series): if isinstance(flags, pd.Series):
flags = flags.to_frame() flags = flags.to_frame()
...@@ -128,9 +148,9 @@ class TranslationScheme: ...@@ -128,9 +148,9 @@ class TranslationScheme:
if flag not in self._backward: if flag not in self._backward:
raise ValueError(f"invalid flag: {flag}") raise ValueError(f"invalid flag: {flag}")
return float(flag) return float(flag)
return self._forward[flag] return float(self._forward[flag])
def forward(self, flags: pd.DataFrame) -> Flags: def toInternal(self, flags: pd.DataFrame | DictOfSeries | pd.Series) -> Flags:
""" """
Translate from 'external flags' to 'internal flags' Translate from 'external flags' to 'internal flags'
...@@ -145,13 +165,11 @@ class TranslationScheme: ...@@ -145,13 +165,11 @@ class TranslationScheme:
""" """
return Flags(self._translate(flags, self._forward)) return Flags(self._translate(flags, self._forward))
def backward( def toExternal(
self, self,
flags: Flags, flags: Flags,
raw: bool = False,
attrs: dict | None = None, attrs: dict | None = None,
**kwargs, ) -> DictOfSeries:
) -> pd.DataFrame | DictOfSeries:
""" """
Translate from 'internal flags' to 'external flags' Translate from 'internal flags' to 'external flags'
...@@ -160,9 +178,6 @@ class TranslationScheme: ...@@ -160,9 +178,6 @@ class TranslationScheme:
flags : pd.DataFrame flags : pd.DataFrame
The external flags to translate The external flags to translate
raw: bool, default False
if True return data as DictOfSeries, otherwise as pandas DataFrame.
attrs : dict or None, default None attrs : dict or None, default None
global meta information of saqc-object global meta information of saqc-object
...@@ -172,8 +187,6 @@ class TranslationScheme: ...@@ -172,8 +187,6 @@ class TranslationScheme:
""" """
out = self._translate(flags, self._backward) out = self._translate(flags, self._backward)
out.attrs = attrs or {} out.attrs = attrs or {}
if not raw:
out = out.to_df()
return out return out
...@@ -184,16 +197,30 @@ class FloatScheme(TranslationScheme): ...@@ -184,16 +197,30 @@ class FloatScheme(TranslationScheme):
internal float flags internal float flags
""" """
_MAP = { DFILTER_DEFAULT: float = FILTER_ALL
-np.inf: -np.inf,
**{k: k for k in np.arange(0, 256, dtype=float)},
}
def __init__(self): def __call__(self, flag: float | int) -> float:
super().__init__(self._MAP, self._MAP)
try:
return float(flag)
except (TypeError, ValueError, OverflowError):
raise ValueError(f"invalid flag, expected a numerical value, got: {flag}")
def toInternal(self, flags: pd.DataFrame | DictOfSeries) -> Flags:
try:
return Flags(flags.astype(float))
except (TypeError, ValueError, OverflowError):
raise ValueError(
f"invalid flag(s), expected a collection of numerical values, got: {flags}"
)
def toExternal(self, flags: Flags, attrs: dict | None = None) -> DictOfSeries:
out = flags.toDios()
out.attrs = attrs or {}
return out
class SimpleScheme(TranslationScheme): class SimpleScheme(MappingScheme):
""" """
Acts as the default Translator, provides a changeable subset of the Acts as the default Translator, provides a changeable subset of the
......
...@@ -17,7 +17,7 @@ import pandas as pd ...@@ -17,7 +17,7 @@ import pandas as pd
from saqc.constants import BAD, DOUBTFUL, GOOD, UNFLAGGED from saqc.constants import BAD, DOUBTFUL, GOOD, UNFLAGGED
from saqc.core.flags import Flags from saqc.core.flags import Flags
from saqc.core.history import History from saqc.core.history import History
from saqc.core.translation.basescheme import BackwardMap, ForwardMap, TranslationScheme from saqc.core.translation.basescheme import BackwardMap, ForwardMap, MappingScheme
_QUALITY_CAUSES = [ _QUALITY_CAUSES = [
"", "",
...@@ -40,7 +40,7 @@ _QUALITY_LABELS = [ ...@@ -40,7 +40,7 @@ _QUALITY_LABELS = [
] ]
class DmpScheme(TranslationScheme): class DmpScheme(MappingScheme):
""" """
Implements the translation from and to the flagging scheme implemented in Implements the translation from and to the flagging scheme implemented in
...@@ -91,7 +91,7 @@ class DmpScheme(TranslationScheme): ...@@ -91,7 +91,7 @@ class DmpScheme(TranslationScheme):
field_history.append(histcol, meta=meta) field_history.append(histcol, meta=meta)
return field_history return field_history
def forward(self, df: pd.DataFrame) -> Flags: def toInternal(self, df: pd.DataFrame) -> Flags:
""" """
Translate from 'external flags' to 'internal flags' Translate from 'external flags' to 'internal flags'
...@@ -114,7 +114,7 @@ class DmpScheme(TranslationScheme): ...@@ -114,7 +114,7 @@ class DmpScheme(TranslationScheme):
return Flags(data) return Flags(data)
def backward( def toExternal(
self, flags: Flags, attrs: dict | None = None, **kwargs self, flags: Flags, attrs: dict | None = None, **kwargs
) -> pd.DataFrame: ) -> pd.DataFrame:
""" """
...@@ -131,7 +131,7 @@ class DmpScheme(TranslationScheme): ...@@ -131,7 +131,7 @@ class DmpScheme(TranslationScheme):
------- -------
translated flags translated flags
""" """
tflags = super().backward(flags, raw=True, attrs=attrs) tflags = super().toExternal(flags, attrs=attrs)
out = pd.DataFrame( out = pd.DataFrame(
index=reduce(lambda x, y: x.union(y), tflags.indexes).sort_values(), index=reduce(lambda x, y: x.union(y), tflags.indexes).sort_values(),
......
...@@ -12,10 +12,10 @@ import pandas as pd ...@@ -12,10 +12,10 @@ import pandas as pd
from saqc.constants import BAD, DOUBTFUL, GOOD, UNFLAGGED from saqc.constants import BAD, DOUBTFUL, GOOD, UNFLAGGED
from saqc.core.flags import Flags, History from saqc.core.flags import Flags, History
from saqc.core.translation.basescheme import BackwardMap, ForwardMap, TranslationScheme from saqc.core.translation.basescheme import BackwardMap, ForwardMap, MappingScheme
class PositionalScheme(TranslationScheme): class PositionalScheme(MappingScheme):
""" """
Implements the translation from and to the flagging scheme implemented by CHS Implements the translation from and to the flagging scheme implemented by CHS
...@@ -43,7 +43,7 @@ class PositionalScheme(TranslationScheme): ...@@ -43,7 +43,7 @@ class PositionalScheme(TranslationScheme):
def __init__(self): def __init__(self):
super().__init__(forward=self._FORWARD, backward=self._BACKWARD) super().__init__(forward=self._FORWARD, backward=self._BACKWARD)
def forward(self, flags: pd.DataFrame) -> Flags: def toInternal(self, flags: pd.DataFrame) -> Flags:
""" """
Translate from 'external flags' to 'internal flags' Translate from 'external flags' to 'internal flags'
...@@ -75,7 +75,7 @@ class PositionalScheme(TranslationScheme): ...@@ -75,7 +75,7 @@ class PositionalScheme(TranslationScheme):
return Flags(data) return Flags(data)
def backward(self, flags: Flags, **kwargs) -> pd.DataFrame: def toExternal(self, flags: Flags, **kwargs) -> pd.DataFrame:
""" """
Translate from 'internal flags' to 'external flags' Translate from 'internal flags' to 'external flags'
......
...@@ -396,7 +396,15 @@ class FlagtoolsMixin: ...@@ -396,7 +396,15 @@ class FlagtoolsMixin:
0 -inf -inf -inf 0 -inf -inf -inf
1 255.0 255.0 255.0 1 255.0 255.0 255.0
""" """
import warnings
warnings.warn(
f"""The method 'transferFlags' is deprecated and
will be removed in version 2.5 of SaQC. Please use
'SaQC.concatFlags(field={field}, target={target}, method="match", squeeze=False)'
instead""",
DeprecationWarning,
)
return self.concatFlags(field, target=target, method="match", squeeze=False) return self.concatFlags(field, target=target, method="match", squeeze=False)
@flagging() @flagging()
......