Draft: Optimier saqc

draft status:

  • currently failing some indentiation tests for the docstrings
  • will implement tests and exhaustive documentation after general architecture has been approved of

Changes

MR Implements fitting of parameters of saqc-pipelines (= series of saqc function calls) to targets (= series of data and flags).

  • adds a module of Fitting problems, that fit different SaQC methods (saqc.lib.problems)
  • adds functions module optisaqc implementing the methods supervise, optPipe and applyConfig with the SaQC class.
  • Adds helper functions to derive Histories from SaQC instances to saqc.lib.tools
  • slightly modifies flagByClick so that the GUI title can be determined in the call
  • splits up flagUniLOF in 2 parts, so that the scoring part of the algorithm can be cached during optimisation

Usage

With PATH pointing to this example ressource in the documentation or to this file supervisedData.csv

0. Load Data

Example qc object with some flags associated with different anomaly types can be generated as follows:

import saqc
import pandas as pd
import os

data=pd.read_csv(PATH, index_col=0, parse_dates=[0])
dat = data.iloc[:,0].rename('data')
noise = data.iloc[:,1]
outlier = data.iloc[:,2]
constant = data.iloc[:,3]
qc = saqc.SaQC(dat)
qc = qc.copyField('data','data_sv')
qc = qc.setFlags('data_sv',data=constant[constant].index, label='constant')
qc = qc.setFlags('data_sv',data=noise[noise].index, label='noise')
qc = qc.setFlags('data_sv',data=outlier[outlier].index, label='outlier')

Data contains flags labeled as "noise", "outlier" and "constant" (run qc.plot('data') for overview)

To control runtime set the Following variables:

POP_SIZE=50, 
TERMINATION=1000

1. Running Single Problems

Assign problems and their targets

To optimize "data", problems and their targets (problem_labels) have to be defined.

problems = ['FitUniLOF']
problem_labels = ['outlier']

So here "outlier" flags are targeted by an optimisation problem, fitting the flagUniLOF method to them.

Supervise data

Supervision modifies meta of targeted labels and removes (if override=True) history columns that are not targeted. If a member of ``problem_labels``` is not a label in field history, a GUI will pop up, so one can assign flags to this target.

_qc = qc.supervise('data',target='data_sv',problem_labels=problem_labels, pop_size=POP_SIZE, termination=('n_evals',TERMINATION))

Fit Parameters

Run the problem sequence against the supervised data. parameter name will determine the name under which the resulting pipeline is added to the SaQC classes methods

_qc = _qc.optPipe('data_sv',problems=problems, name='qcPipeline', pop_size=POP_SIZE, termination=('n_evals',TERMINATION))

To modify running length and population size you can assign to the pop_size and termination parameters. For testing runs it can be useful to cap the number of evaluations. Default termination (None) will result in exhaustive and prolonged search. If enough compute is available, increasing the number of individuals (pop_size) may lead to better results and evades local minima.

Access results

Fitted pipeline can be called via the name specified in optPipe calls parameter name. The pipeline is now registered as a flagging method.

_qc = _qc.qcPipeline('data')

Results should be properly labeled

_qc.plot('data')

2. Sequential Problem Pipeline

Define a pipeline of a sequence of problems that covers all the anomaly types indicated by the flags names:

problems = ['FitUniLOF','FitConstants', 'FitScatterLowPass']
problem_labels = ['outlier', 'constant','noise']

Running he pipeline and obtainig the results:

_qc = qc.supervise('data_sv', problem_labels=problem_labels)
_qc = _qc.optPipe('data_sv', problems=problems, name='qcPipeline', log_path=log_path, logs=['config'],pop_size=POP_SIZE, termination=('n_evals', N_EVALS))
_qc = _qc.qcPipeline('data')
_qc.plot('data')

3. Generate config files representing optimum and load/apply config

Define somewhat more complex problem chain: Fitting LowpassFilter and applying ZScore cutoff to the residuals. The residual generating problem and the flagging problem have to be merged. For the sake of generality and demonstration, append a Constants detection problem.

problems = [['LowpassResids', 'FitZScore'], 'FitConstants']
problem_labels = ['outlier', 'constant']
_qc = qc.supervise('data_sv', problem_labels=problem_labels)

With OUT_PATH being an existing path to a folder in your system, config results from the optimisation can be stored by setting the log_path:

_qc = _qc.optPipe('data_sv', problems=problems, log_path=OUT_PATH, pop_size=POP_SIZE, termination=('n_evals', N_EVALS))

Reloading the result to apply them or instantiate an SaQC method of the same parameterisation is done via the applyConfig method:

_qc = _qc.applyConfig('data', path=os.path.join(OUT_PATH,'config.csv'))
_qc.plot('data')

3. GUI Selection of targets

Assigning labels as targets that are not present in the variables history causes GUI to pop up where one can assign target flags.

problems = ['FitZScore']
problem_labels = ['myFlags']

Running the supervision on those will cause Pop up Gui to do its thing:

_qc = qc.supervise('data_sv',problem_labels=problem_labels)

the selection is added to the targets and can get fitted to with calling optPipe

_qc = _qc.optPipe('data_sv',problems=problems, name='qcPipeline', pop_size=POP_SIZE, termination=('n_evals',N_EVALS))
Edited by Peter Lünenschloß

Merge request reports

Loading