Skip to content
GitLab
Explore
Sign in
Primary navigation
Search or go to…
Project
SaQC
Manage
Activity
Members
Labels
Plan
Issues
35
Issue boards
Milestones
Wiki
Code
Merge requests
7
Repository
Branches
Commits
Tags
Repository graph
Compare revisions
Snippets
Build
Pipelines
Jobs
Pipeline schedules
Artifacts
Deploy
Releases
Container Registry
Model registry
Operate
Environments
Monitor
Incidents
Service Desk
Analyze
Value stream analytics
Contributor analytics
CI/CD analytics
Repository analytics
Model experiments
Help
Help
Support
GitLab documentation
Compare GitLab plans
Community forum
Contribute to GitLab
Provide feedback
Terms and privacy
Keyboard shortcuts
?
Snippets
Groups
Projects
Show more breadcrumbs
rdm-software
SaQC
Commits
82c3725e
Commit
82c3725e
authored
3 years ago
by
Bert Palm
🎇
Browse files
Options
Downloads
Patches
Plain Diff
Flags docu
parent
f3efc1c0
No related branches found
Branches containing commit
No related tags found
Tags containing commit
1 merge request
!231
new Flagger and Stuff
Changes
2
Hide whitespace changes
Inline
Side-by-side
Showing
2 changed files
saqc/core/flags.py
+176
-21
176 additions, 21 deletions
saqc/core/flags.py
saqc/core/history.py
+3
-3
3 additions, 3 deletions
saqc/core/history.py
with
179 additions
and
24 deletions
saqc/core/flags.py
+
176
−
21
View file @
82c3725e
...
...
@@ -50,27 +50,116 @@ class _HistAccess:
class
Flags
:
"""
flags manipulation
------------------
insert new -> flags[
'
new
'
] = pd.Series(...)
set items -> flags[
'
v1
'
] = pd.Series(...)
get items -> v0 = flags[
'
v0
'
]
delete items -> del flags[
'
v0
'
] / drop(
'
v0
'
)
metadata
--------
reading columns -> flags.columns
renaming column(s) -> flags.columns = pd.Index([
'
a
'
,
'
b
'
,
'
c
'
])
Saqc
'
s flags container.
history
-------
get history -> flags.history[
'
v0
'
]
set history -> flags.history[
'
v0
'
] = History(...)
This container class holds the quality flags associated with the data. It hold key-value pairs, where
the key is the name of the column and the value is a ``pandas.Series`` of flags. The index of the series
and the key-value pair can be assumed to be immutable, which means, only the *values* of the series can
be change, once the series exist.
In other words: **an existing column can not be overwritten by a column with a different index.**
conversion
----------
make a dios -> flags.toDios()
make a df -> flags.toFrame()
The flags can be accessed via ``__getitem__`` and ``__setitem__``, in real life known as the `[]`-operator.
For the curious:
Under the hood, the series are stored in a `history`, which allows the advanced user to retrieve all flags
once was set in this object, but in the most cases this is irrelevant. For simplicity one can safely assume,
that this class works just stores the flag-series one sets.
See Also
--------
initFlagsLike : create a Flags instance, with same dimensions as a reference object.
History : class that actually store the flags
Examples
--------
We create an empty instance, by calling ``Flags`` without any arguments and then add a column to it.
>>>
from
saqc.constants
import
UNFLAGGED
,
BAD
,
DOUBT
,
UNTOUCHED
>>>
flags
=
Flags
()
>>>
flags
Empty
Flags
Columns
:
[]
>>>
flags
[
'
v0
'
]
=
pd
.
Series
([
BAD
,
BAD
,
UNFLAGGED
],
dtype
=
float
)
>>>
flags
v0
|
========
|
0
255.0
|
1
255.0
|
2
-
inf
|
Once the column exist, we cannot overwrite it anymore, with a different series.
>>>
flags
[
'
v0
'
]
=
pd
.
Series
([
666.
],
dtype
=
float
)
Traceback
(
most
recent
call
last
):
some
file
path
...
ValueError
:
Index
does
not
match
But if we pass a series, which index match it will work,
because the series now is interpreted as value-to-set.
>>>
flags
[
'
v0
'
]
=
pd
.
Series
([
DOUBT
,
UNTOUCHED
,
DOUBT
],
dtype
=
float
)
>>>
flags
v0
|
========
|
0
25.0
|
1
255.0
|
2
25.0
|
As we see above, the column now holds a combination from the values from the
first and the second set. This is, because the special constant ``UNTOUCHED``,
an alias for ``numpy.nan`` was used. We can inspect all the updates that was
made by looking in the history.
>>>
flags
.
history
[
'
v0
'
]
0
1
0
(
255.0
)
25.0
1
255.0
nan
2
(
-
inf
)
25.0
As we see now, the second call sets ``25.0`` and shadows (represented by the parentheses) ``(255.0)`` in the
first row and ``(-inf)`` in the last, but in the second row ``255.0`` still is valid, because it was
`not touched` by the set.
It is also possible to set values by a mask, which can be interpreted as condidional setting.
Imagine we want to `reset` all flags to ``0.`` if the existing flags are lower that ``255.``.
>>>
mask
=
flags
[
'
v0
'
]
<
BAD
>>>
mask
0
True
1
False
2
True
dtype
:
bool
>>>
flags
[
mask
,
'
v0
'
]
=
0
>>>
flags
v0
|
========
|
0
0.0
|
1
255.0
|
2
0.0
|
The objects you can pass as a row selector (``flags[rows, column]``) are:
- boolen arraylike, with or without index. Must have same length than the undeliing series.
- slices working on the index
- ``pd.Index``, which must be a subset of the existing index
For example, to set `all` values to a scalar value, use a Null-slice:
>>>
flags
[:,
'
v0
'
]
=
99.0
>>>
flags
v0
|
=======
|
0
99.0
|
1
99.0
|
2
99.0
|
After all calls presented here, the history look like this:
>>>
flags
.
history
[
'
v0
'
]
0
1
2
3
0
(
255.0
)
(
25.0
)
(
0.0
)
99.0
1
(
255.0
)
(
nan
)
(
nan
)
99.0
2
(
-
inf
)
(
25.0
)
(
0.0
)
99.0
"""
def
__init__
(
self
,
raw_data
:
Optional
[
Union
[
DictLike
,
Flags
]]
=
None
,
copy
:
bool
=
False
):
...
...
@@ -128,10 +217,26 @@ class Flags:
@property
def
columns
(
self
)
->
pd
.
Index
:
"""
Column index of the flags container
Returns
-------
columns: pd.Index
The columns index
"""
return
pd
.
Index
(
self
.
_data
.
keys
())
@columns.setter
def
columns
(
self
,
value
:
pd
.
Index
):
"""
Set new columns names.
Parameters
----------
value : pd.Index
New column names
"""
if
not
isinstance
(
value
,
pd
.
Index
):
value
=
pd
.
Index
(
value
)
...
...
@@ -157,6 +262,14 @@ class Flags:
@property
def
empty
(
self
)
->
bool
:
"""
True if flags has no columns.
Returns
-------
bool
``True`` if the container has no columns, otherwise ``False``.
"""
return
len
(
self
.
_data
)
==
0
def
__len__
(
self
)
->
int
:
...
...
@@ -231,8 +344,7 @@ class Flags:
Returns
-------
Flags
the same flags object with dropeed column, no copy
flags object with dropped column, not a copy
"""
self
.
__delitem__
(
key
)
...
...
@@ -241,12 +353,41 @@ class Flags:
@property
def
history
(
self
)
->
_HistAccess
:
"""
Accessor for the flags history.
To get a copy of the current history use ``flags.history[
'
var
'
]``.
To set a new history use ``flags.history[
'
var
'
] = value``.
The passed value must be a instance of History or must be convertible to a history.
Returns
-------
history : History
Accessor for the flags history
See Also
--------
saqc.core.History : History storage class.
"""
return
_HistAccess
(
self
)
# ----------------------------------------------------------------------
# copy
def
copy
(
self
,
deep
=
True
):
"""
Copy the flags container.
Parameters
----------
deep : bool, default True
If False, a new reference to the Flags container is returned,
otherwise the underlying data is also copied.
Returns
-------
copy of flags
"""
return
self
.
_constructor
(
self
,
copy
=
deep
)
def
__copy__
(
self
,
deep
=
True
):
...
...
@@ -265,6 +406,13 @@ class Flags:
# transformation and representation
def
toDios
(
self
)
->
dios
.
DictOfSeries
:
"""
Transform the flags container to a ``dios.DictOfSeries``.
Returns
-------
dios.DictOfSeries
"""
di
=
dios
.
DictOfSeries
(
columns
=
self
.
columns
)
for
k
,
v
in
self
.
_data
.
items
():
...
...
@@ -273,6 +421,13 @@ class Flags:
return
di
.
copy
()
def
toFrame
(
self
)
->
pd
.
DataFrame
:
"""
Transform the flags container to a ``pd.DataFrame``.
Returns
-------
pd.DataFrame
"""
return
self
.
toDios
().
to_df
()
def
__repr__
(
self
)
->
str
:
...
...
This diff is collapsed.
Click to expand it.
saqc/core/history.py
+
3
−
3
View file @
82c3725e
...
...
@@ -202,15 +202,15 @@ class History:
Raises
------
ValueError: on index miss-match or wrong dtype
TypeError: if value is not pd.Series
ValueError: on index miss-match or wrong dtype
"""
if
isinstance
(
value
,
History
):
return
self
.
_appendHistory
(
value
,
force
=
force
)
value
=
self
.
_validateValue
(
value
)
if
len
(
self
)
>
0
and
not
value
.
index
.
equals
(
self
.
index
):
raise
ValueError
(
"
Index
must be equal to history index
"
)
raise
ValueError
(
"
Index
does not match
"
)
self
.
_insert
(
value
,
pos
=
len
(
self
),
force
=
force
)
return
self
...
...
@@ -242,7 +242,7 @@ class History:
"""
self
.
_validateHistWithMask
(
value
.
hist
,
value
.
mask
)
if
len
(
self
)
>
0
and
not
value
.
index
.
equals
(
self
.
index
):
raise
ValueError
(
"
Index
must be equal to history index
"
)
raise
ValueError
(
"
Index
does not match
"
)
n
=
len
(
self
.
columns
)
value_hist
=
value
.
hist
...
...
This diff is collapsed.
Click to expand it.
Preview
0%
Loading
Try again
or
attach a new file
.
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Save comment
Cancel
Please
register
or
sign in
to comment