Programming interface

Currently it is not convenient to use saqc programmatically, i.e. in code and not with a dedicated configuration file. In order to prepare an implementation of a real API I would like to discuss a few ideas here.

A couple of requirements from the top of my head formulated as user stories:

As a user, I can add functions (qc-tests, harmonization, processing) to certain variables
As a user, I want to add multiple functions and/or multiple parametrizations to a certain variable
As a user, I want to be able to replace certain functions or their arguments
As a user, I do not want to repeatedly pass the same arguments to functions over and over again (data, flagger, field)
As a user, I would like a possibility to run tests in parallel without the need to consider inter-test dependencies
As a user, I want the same possibilities (e.g. variable wild cards) as through the CLI
As a user, I want to use the same test names as through the CLI
As a user, I want to make my runs reproducible, i.e. record the test functions and the passed parameters
As a user, I want the option to evaluate the configuration lazily, i.e. first define the configuration in code and than run the system on a dedicated command
As a user, I want to be able to use the same configuration for multiple datasets.
As a user, I like to change the name of variable to which a test is applied in order to make configuration reuse practical.
As a user, I like to change the test/flagger order of an existing configuration.
As a user, I like to generate configurations (semi-)automatically from a set of given parameters.
As a user, I like to specify variables, that should be checked not only by name but also by columns number/index.
As a developer, I want a construct which can also be used for the CLI, i.e. no separate machineries for the CLI and the new API

A few code snippets as a base for discussion:

At the moment I am thinking of a object oriented interface, so everything starts with an object creation:
```
config = SaQC()
```

Then we need to add some tests.

A more or less literal translation could look like

config.var("x").range(min=0, max=100)
# that way we could reproduce more complex patterns like
config.var("temp[0-9]+").range(min=0, max=100)
# to make things less verbose, maybe we could things like
config.y.range(min=0, max=100)
# not yet sure how to realize them, but generics could look like that
config.var("y").generic(lambda d: (d < 100) & (d >= y/2))

We could also make things more explicit

config.var("x").qc.range(min=10, max=100)
config.var("temp[0-9]+").harm.shift2Grid(freq="10Min")

Or maybe less 'cody':

config.var("x").qc("range", min=10, max=100)

or even:

config.add("x", "range", min=10, max=100)

and finally run the system

data, flagger = saqc.run(data, BaseFlagger())

Any comments, suggestions or alternative ideas @palmb , @luenensc , @schmidle ? I would like to collect ideas on how the interface should look like, to make the usage of saqc in code easier and more convenient and maybe even make the system usable in other contexts (like the implementation and testing phase of new algorithms/methods, ML trainings). So any ideas are welcome, no matter how hard or easy their realization might be.

Edited Mar 17, 2020 by David Schäfer