Fix overwriting functions to comply to policy

changed milestone to %2.4.0

added accepted cleanup/refactoring labels

assigned to @palmb

we have a problem with fitPolynomial. Because it assigns the worst flag to each residual. If no target is given the data in field is overwritten. If we want to comply with the policy we might end up with flags that are not the worst as the docstring states.

What should we do? an exception from the policy ?

any opinion @luenensc @schaefed ?

same for rolling (fka. roll)

same for assignChangePointCluster, but here all data will be set to UNFLAGGED

I don't think, we need to deviate from the policy here. In the case of fitPolynomial we can use the same solution as in !580 (merged). As it is a processing function, it should be free to decide what flag to set, if it wants to set the worst flag for a given timestamp/interval/whatever, so be it.

The only thing, the implementation needs to make sure is, that it doesn't overwrite existing flags!

but setting the worst flag overwrite by definition.. either the same flag again or another worse flag from the same window

Okay, then we should adjust the docstring.

IMO, the only thing worth a discussion is, if fitPolynomial should set flags at all, or rather should set everything to UNFLAGGED. In the end, it manipulates values like for example an interpolation, and might therefore yield values, that don't full fill prior conditions, that lead to a flag in the first place. (See the ubiquitous flagRange example, where flagRange(0, 100) flags a value 100.1 and the polynomial fitting changes that value to 99.9).

I also think this case does not really relate to the policy-case, of:

Existing flags won't be flagged again, except the dfilter allows it. Use: lib.isflagged to detect existing flags.

Because the whole process of fitting a curve to data and than assigning it a quality flag, is in my opinion not the process of flagging again new data - it is generating new data with flags calculated for it and than assigning it to a target field.

regarding:

IMO, the only thing worth a discussion is, if fitPolynomial should set flags at all, or rather should set everything to UNFLAGGED

I agree.

I think it should set flags, because:

By going for not setting flags for fitting functions, you make it essentially useless to set the dFilter keyword to any other value than BAD, because, for example, all points generated from a fitting window including a doubtfull point are by means of simplicity at least doubtful - so you lose that valuable information and you can not easily regain it by transfering the flags from the original data to the generated fit, because therefor you would again need to know which points contributed to the generation of which points in the fitting result.

In the end, it manipulates values like for example an interpolation

The difference is, that our interpolation (like pandas´s) does not really manipulate values: it just fills ~~wholes~~ [gaps] (edit: by @palmb ). But there is a slight overlap, since it also fills ~~wholse~~ [gaps] (edit by @palmb) that come from BAD flagged values being removed.

i cant really follow the discussion... and we might should talk about this again in real life or via VC..

i would think that functions that assign a whole new data set like all the assignSomeThing functions do ( and we do have some of them) should

create a whole new dataset with a new empty history (all unflagged)
ideally the target parameter should be mandatory, to avoid implicit overriding field (and its flags)

functions that return data, that is somewhat similar to the data before (like fitting a polynom to the data or an interpolation of gaps) should keep the existing flags and might also set new flags according to the policy (even if this might result in somehow wrong flags, like the flagRange-Example @schaefed just gave)

With this two rules we could have a straight always applying policy without any exceptions..

by this

rolling, assignChangePointCluster, etc would fall in category 1,
but fitPolynomial, interpolate etc in category 2

I generally agree and as far ad I understood @luenensc also. Two questions remain for me:

Should the History be really empty afterwards, i.e. it only has one colum afterwards, or should we 'blank' out the existing History by an additional UNFLAGGED column (I the prefer the former)
Where to draw the line between the categories 1 and 2. So how do we decide, if a target is positional or not without appearing random. I think, that ideally all data changing functions should require target, but that would be a rather incompatible change.

To 1., i also vote for a empty (fresh and new) History

Where to draw the line between the categories 1 and 2 [...] ?

i would say by the intention of the function. Sure i can missconfigure fitPolynomial that its result has nothing to do with the original data, but thats not the intention; in contrast everything in scores.py and residuals.py (eg. assignSomething and calcSeomthingResiduals) clearly return new data (even if i might missconfigure it, that it return nearly unchanged data i guess)

i would say by the intention of the function

Okay, I see, where this is going and well, why not. But I am actually not sure, if, following this argument, fitPolynomial should have a required target parameter then. Like other functions in curvefit, the idea of fitPolynomial is to, well fit a model to the data, which usually menas, that almost all values between the source and the result data are different. And while the result of fitPolynomial is an approximation of the original data, one could (an I would) argue, that the resulting time series has nothing to do with the original.

So, unfortunately this leaves room for disussion...So should we add another point to our policy?

All functions producing new data need a positional target parameter

New policy found: Functions generating new data (fit..., roll...) generate a new empty History

changed milestone to %2.5.0

changed milestone to %2.6.0

changed milestone to %2.7

Closing as this was discussed and led to an updated policy.

closed

Fix overwriting functions to comply to policy

Designs

Child items ...

Activity