Draft: Optimier saqc
draft status:
- currently failing some indentiation tests for the docstrings
- will implement tests and exhaustive documentation after general architecture has been approved of
Changes
MR Implements fitting of parameters of saqc-pipelines (= series of saqc function calls) to targets (= series of data and flags).
- adds a module of Fitting problems, that fit different
SaQCmethods (saqc.lib.problems) - adds functions module
optisaqcimplementing the methodssupervise,optPipeandapplyConfigwith theSaQCclass. - Adds helper functions to derive Histories from
SaQCinstances tosaqc.lib.tools - slightly modifies
flagByClickso that the GUI title can be determined in the call - splits up
flagUniLOFin 2 parts, so that the scoring part of the algorithm can be cached during optimisation
Usage
With PATH pointing to this example ressource in the documentation or to this file supervisedData.csv
0. Load Data
Example qc object with some flags associated with different anomaly types can be generated as follows:
import saqc
import pandas as pd
import os
data=pd.read_csv(PATH, index_col=0, parse_dates=[0])
dat = data.iloc[:,0].rename('data')
noise = data.iloc[:,1]
outlier = data.iloc[:,2]
constant = data.iloc[:,3]
qc = saqc.SaQC(dat)
qc = qc.copyField('data','data_sv')
qc = qc.setFlags('data_sv',data=constant[constant].index, label='constant')
qc = qc.setFlags('data_sv',data=noise[noise].index, label='noise')
qc = qc.setFlags('data_sv',data=outlier[outlier].index, label='outlier')
Data contains flags labeled as "noise", "outlier" and "constant" (run qc.plot('data') for overview)
To control runtime set the Following variables:
POP_SIZE=50,
TERMINATION=1000
1. Running Single Problems
Assign problems and their targets
To optimize "data", problems and their targets (problem_labels) have to be defined.
problems = ['FitUniLOF']
problem_labels = ['outlier']
So here "outlier" flags are targeted by an optimisation problem, fitting the flagUniLOF method to them.
Supervise data
Supervision modifies meta of targeted labels and removes (if override=True) history columns that are not targeted. If a member of ``problem_labels``` is not a label in field history, a GUI will pop up, so one can assign flags to this target.
_qc = qc.supervise('data',target='data_sv',problem_labels=problem_labels, pop_size=POP_SIZE, termination=('n_evals',TERMINATION))
Fit Parameters
Run the problem sequence against the supervised data.
parameter name will determine the name under which the resulting pipeline is added to the SaQC classes methods
_qc = _qc.optPipe('data_sv',problems=problems, name='qcPipeline', pop_size=POP_SIZE, termination=('n_evals',TERMINATION))
To modify running length and population size you can assign to the pop_size and termination parameters. For testing runs it can be useful to cap the number of evaluations. Default termination (None) will result in exhaustive and prolonged search.
If enough compute is available, increasing the number of individuals (pop_size) may lead to better results and evades local minima.
Access results
Fitted pipeline can be called via the name specified in optPipe calls parameter name.
The pipeline is now registered as a flagging method.
_qc = _qc.qcPipeline('data')
Results should be properly labeled
_qc.plot('data')
2. Sequential Problem Pipeline
Define a pipeline of a sequence of problems that covers all the anomaly types indicated by the flags names:
problems = ['FitUniLOF','FitConstants', 'FitScatterLowPass']
problem_labels = ['outlier', 'constant','noise']
Running he pipeline and obtainig the results:
_qc = qc.supervise('data_sv', problem_labels=problem_labels)
_qc = _qc.optPipe('data_sv', problems=problems, name='qcPipeline', log_path=log_path, logs=['config'],pop_size=POP_SIZE, termination=('n_evals', N_EVALS))
_qc = _qc.qcPipeline('data')
_qc.plot('data')
3. Generate config files representing optimum and load/apply config
Define somewhat more complex problem chain: Fitting LowpassFilter and applying ZScore cutoff to the residuals. The residual generating problem and the flagging problem have to be merged. For the sake of generality and demonstration, append a Constants detection problem.
problems = [['LowpassResids', 'FitZScore'], 'FitConstants']
problem_labels = ['outlier', 'constant']
_qc = qc.supervise('data_sv', problem_labels=problem_labels)
With OUT_PATH being an existing path to a folder in your system, config results from the optimisation can be stored by setting the log_path:
_qc = _qc.optPipe('data_sv', problems=problems, log_path=OUT_PATH, pop_size=POP_SIZE, termination=('n_evals', N_EVALS))
Reloading the result to apply them or instantiate an SaQC method of the same parameterisation is done via the applyConfig method:
_qc = _qc.applyConfig('data', path=os.path.join(OUT_PATH,'config.csv'))
_qc.plot('data')
3. GUI Selection of targets
Assigning labels as targets that are not present in the variables history causes GUI to pop up where one can assign target flags.
problems = ['FitZScore']
problem_labels = ['myFlags']
Running the supervision on those will cause Pop up Gui to do its thing:
_qc = qc.supervise('data_sv',problem_labels=problem_labels)
the selection is added to the targets and can get fitted to with calling optPipe
_qc = _qc.optPipe('data_sv',problems=problems, name='qcPipeline', pop_size=POP_SIZE, termination=('n_evals',N_EVALS))