Skip to content
Snippets Groups Projects
Commit 5227acc8 authored by David Schäfer's avatar David Schäfer
Browse files

Merge branch 'docs-improve' into 'develop'

fixed custom function section

Closes #387 and #462

See merge request !786
parents 86766227 eec89826
No related branches found
No related tags found
1 merge request!786fixed custom function section
Pipeline #197990 passed with stages
in 4 minutes and 59 seconds
......@@ -5,88 +5,136 @@
Customizations
==============
SaQC comes with a continuously growing number of pre-implemented
quality checking and processing routines as well as flagging schemes.
For any sufficiently large use case however, it is very likely that the
functions provided won't fulfill all your needs and requirements.
Acknowledging the impossibility to address all imaginable use cases, we
designed the system to allow for extensions and costumizations. The main extensions options, namely
SaQC comes with a continuously growing number of pre-implemented quality-checking and processing
routines as well as flagging schemes. For a sufficiently large use case, however, it might be
necessary to extend the system anyhow. The main extension options, namely
:ref:`quality check routines <documentation/Customizations:custom quality check routines>`
and the :ref:`flagging scheme <documentation/Customizations:custom flagging schemes>`
are described within this documents.
and the :ref:`flagging scheme <documentation/Customizations:custom flagging schemes>`.
Both of these mechanisms are described within this document.
Custom quality check routines
Custom Quality Check Routines
-----------------------------
In case you are missing quality check routines, you are of course very
welcome to file a feature request issue on the project's
`gitlab repository <https://git.ufz.de/rdm-software/saqc>`_. However, if
you are more the "I-get-this-done-by-myself" type of person,
SaQC provides two ways to integrate custom routines into the system:
In case you are missing quality check routines, you are, of course, very welcome to file a feature request issue on the project's `GitLab repository <https://git.ufz.de/rdm-software/saqc>`_. However, if you are more the "I-get-this-done-by-myself" type of person, SaQC offers the possibility to directly extend its functionality using its interface to the evaluation machinery.
#. The :ref:`extension language <documentation/GenericFunctions:Generic Functions>`
#. An :ref:`interface <documentation/Customizations:interface>` to the evaluation machinery
In order to make a function usable within the evaluation framework of SaQC, it needs to implement the following function interface:
Interface
^^^^^^^^^
In order to make a function usable within the evaluation framework of SaQC, it needs to
implement the following function interface
.. code-block:: python
import pandas
import saqc
def yourTestFunction(
saqc: SaQC
field: str,
*args,
**kwargs
) -> SaQC
def yourTestFunction(qc: SaQC, field: str | list[str], *args, **kwargs) -> SaQC:
# your code
return qc
Argument Descriptions
~~~~~~~~~~~~~~~~~~~~~
with the following parameters
.. list-table::
:header-rows: 1
* - Name
- Description
* - ``data``
- The actual dataset, an instance of ``saqc.DictOfSeries``.
* - ``qc``
- An instance of ``SaQC``
* - ``field``
- The field/column within ``data``, that function is processing.
* - ``flags``
- An instance of saqc.Flags, responsible for the translation of test results into quality attributes.
- The field(s)/column(s) of ``data`` the function is processing/flagging.
* - ``args``
- Any other arguments needed to parameterize the function.
- Any number of named arguments needed to parameterize the function.
* - ``kwargs``
- Any other keyword arguments needed to parameterize the function.
- Any number of named keyword arguments needed to parameterize the function. ``kwargs``
need to be present, even if the function needs no keyword arguments at all
Integrate into SaQC
^^^^^^^^^^^^^^^^^^^
In order make your function available to the system it needs to be registered. We provide a decorator
`\ ``flagging`` <saqc/functions/register.py>`_ with saqc, to integrate your
test functions into SaQC. Here is a complete dummy example:
SaQC provides two decorators, :py:func:`@flagging` and :py:func:`@register`, to integrate custom functions
into its workflow. The choice between them depends on the nature of your algorithm. :py:func:`@register`
is a more versatile decorator, allowing you to handle masking, demasking, and squeezing of data and flags, while
:py:func:`@flagging` is simpler and suitable for univariate flagging functions without the need for complex
data manipulations.
Use :py:func:`@flagging` for simple univariate flagging tasks without the need for complex data manipulations.
:py:func:`@flagging` is especially suitable when your algorithm operates on a single column
.. code-block:: python
from saqc import register
from saqc import SaQC
from saqc.core.register import flagging
@flagging()
def yourTestFunction(saqc: SaQC, field: str, *args, **kwargs):
def simpleFlagging(saqc: SaQC, field: str | list[str], param1: ..., param2: ..., **kwargs) -> SaQC:
"""
Your simple univariate flagging logic goes here.
Parameters
----------
saqc : SaQC
The SaQC instance.
field : str
The field or fields on which to apply anomaly detection.
param1 : ...
Additional parameters needed for your algorithm.
param2 : ...
Additional parameters needed for your algorithm.
Returns
-------
SaQC
The modified SaQC instance.
"""
# Your flagging logic here
# Modify saqc._flags as needed
return saqc
Use :py:func:`@register` when your algorithm needs to handle multiple columns simultaneously (``multivariate=True``)
and or you need explicit control over masking, demasking, and squeezing of data and flags.
:py:func:`register` is especially for complex algorithms that involve interactions between different columns.
.. code-block:: python
from saqc import SaQC
from saqc.core.register import register
@register(
mask=["field"], # Parameter(s) of the decorated functions giving the names of columns in SaQC._data to mask before the call
demask=["field"], # Parameter(s) of the decorated functions giving the names of columns in SaQC._data to unmask after the call
squeeze=["field"], # Parameter(s) of the decorated functions giving the names of columns in SaQC._flags to squeeze into a single flags column after the call
multivariate=True, # Set to True to handle multiple columns
handles_target=False,
)
def complexAlgorithm(
saqc: SaQC, field: str | list[str], param1: ..., param2: ..., **kwargs
) -> SaQC:
"""
Your custom anomaly detection logic goes here.
Parameters
----------
saqc : SaQC
The SaQC instance.
field : str or list of str
The field or fields on which to apply anomaly detection.
param1 : ...
Additional parameters needed for your algorithm.
param2 : ...
Additional parameters needed for your algorithm.
Returns
-------
SaQC
The modified SaQC instance.
"""
# Your anomaly detection logic here
# Modify saqc._flags and saqc._data as needed
return saqc
Example
^^^^^^^
The function `\ ``flagRange`` <saqc/funcs/outliers.py>`_ provides a simple, yet complete implementation of
a quality check routine. You might want to look into its implementation as an example.
Custom flagging schemes
-----------------------
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment