Programming interface
Currently it is not convenient to use saqc
programmatically, i.e. in code and not with a dedicated configuration file. In order to prepare an implementation of a real API I would like to discuss a few ideas here.
A couple of requirements from the top of my head formulated as user stories:
- As a user, I can add functions (qc-tests, harmonization, processing) to certain variables
- As a user, I want to add multiple functions and/or multiple parametrizations to a certain variable
- As a user, I want to be able to replace certain functions or their arguments
- As a user, I do not want to repeatedly pass the same arguments to functions over and over again (
data
,flagger
,field
) - As a user, I would like a possibility to run tests in parallel without the need to consider inter-test dependencies
- As a user, I want the same possibilities (e.g. variable wild cards) as through the CLI
- As a user, I want to use the same test names as through the CLI
- As a user, I want to make my runs reproducible, i.e. record the test functions and the passed parameters
- As a user, I want the option to evaluate the configuration lazily, i.e. first define the configuration in code and than run the system on a dedicated command
- As a user, I want to be able to use the same configuration for multiple datasets.
- As a user, I like to change the name of variable to which a test is applied in order to make configuration reuse practical.
- As a user, I like to change the test/flagger order of an existing configuration.
- As a user, I like to generate configurations (semi-)automatically from a set of given parameters.
- As a user, I like to specify variables, that should be checked not only by name but also by columns number/index.
- As a developer, I want a construct which can also be used for the CLI, i.e. no separate machineries for the CLI and the new API
A few code snippets as a base for discussion:
-
At the moment I am thinking of a object oriented interface, so everything starts with an object creation:
config = SaQC()
-
Then we need to add some tests.
A more or less literal translation could look like
config.var("x").range(min=0, max=100) # that way we could reproduce more complex patterns like config.var("temp[0-9]+").range(min=0, max=100) # to make things less verbose, maybe we could things like config.y.range(min=0, max=100) # not yet sure how to realize them, but generics could look like that config.var("y").generic(lambda d: (d < 100) & (d >= y/2))
We could also make things more explicit
config.var("x").qc.range(min=10, max=100) config.var("temp[0-9]+").harm.shift2Grid(freq="10Min")
Or maybe less 'cody':
config.var("x").qc("range", min=10, max=100)
or even:
config.add("x", "range", min=10, max=100)
-
and finally run the system
data, flagger = saqc.run(data, BaseFlagger())
Any comments, suggestions or alternative ideas @palmb , @luenensc , @schmidle ? I would like to collect ideas on how the interface should look like, to make the usage of saqc
in code easier and more convenient and maybe even make the system usable in other contexts (like the implementation and testing phase of new algorithms/methods, ML trainings). So any ideas are welcome, no matter how hard or easy their realization might be.