we have a problem with fitPolynomial. Because it assigns the worst flag to each residual. If no target is given the data in field is overwritten. If we want to comply with the policy we might end up with flags that are not the worst as the docstring states.
I don't think, we need to deviate from the policy here. In the case of fitPolynomial we can use the same solution as in !580 (merged). As it is a processing function, it should be free to decide what flag to set, if it wants to set the worst flag for a given timestamp/interval/whatever, so be it.
The only thing, the implementation needs to make sure is, that it doesn't overwrite existing flags!
IMO, the only thing worth a discussion is, if fitPolynomial should set flags at all, or rather should set everything to UNFLAGGED. In the end, it manipulates values like for example an interpolation, and might therefore yield values, that don't full fill prior conditions, that lead to a flag in the first place. (See the ubiquitous flagRange example, where flagRange(0, 100) flags a value 100.1 and the polynomial fitting changes that value to 99.9).
I also think this case does not really relate to the policy-case, of:
Existing flags won't be flagged again, except the dfilter allows it. Use: lib.isflagged to detect existing flags.
Because the whole process of fitting a curve to data and than assigning it a quality flag, is in my opinion not the process of flagging again new data - it is generating new data with flags calculated for it and than assigning it to a target field.
regarding:
IMO, the only thing worth a discussion is, if fitPolynomial should set flags at all, or rather should set everything to UNFLAGGED
I agree.
I think it should set flags, because:
By going for not setting flags for fitting functions, you make it essentially useless to set the dFilter keyword to any other value than BAD, because, for example, all points generated from a fitting window including a doubtfull point are by means of simplicity at least doubtful - so you lose that valuable information and you can not easily regain it by transfering the flags from the original data to the generated fit, because therefor you would again need to know which points contributed to the generation of which points in the fitting result.
In the end, it manipulates values like for example an interpolation
The difference is, that our interpolation (like pandas´s) does not really manipulate values: it just fills wholes [gaps] (edit: by @palmb ). But there is a slight overlap, since it also fills wholse [gaps] (edit by @palmb) that come from BAD flagged values being removed.
i cant really follow the discussion... and we might should talk about this again in real life or via VC..
i would think that functions that assign a whole new data set like all the assignSomeThing functions do ( and we do have some of them) should
create a whole new dataset with a new empty history (all unflagged)
ideally the target parameter should be mandatory, to avoid implicit overriding field (and its flags)
functions that return data, that is somewhat similar to the data before (like fitting a polynom to the data or an interpolation of gaps) should keep the existing flags and might also set new flags according to the policy (even if this might result in somehow wrong flags, like the flagRange-Example @schaefed just gave)
With this two rules we could have a straight always applying policy without any exceptions..
I generally agree and as far ad I understood @luenensc also. Two questions remain for me:
Should the History be really empty afterwards, i.e. it only has one colum afterwards, or should we 'blank' out the existing History by an additional UNFLAGGED column (I the prefer the former)
Where to draw the line between the categories 1 and 2. So how do we decide, if a target is positional or not without appearing random. I think, that ideally all data changing functions should require target, but that would be a rather incompatible change.
To 1., i also vote for a empty (fresh and new) History
Where to draw the line between the categories 1 and 2 [...] ?
i would say by the intention of the function. Sure i can missconfigure fitPolynomial that its result has nothing to do with the original data, but thats not the intention; in contrast everything in scores.py and residuals.py (eg. assignSomething and calcSeomthingResiduals) clearly return new data (even if i might missconfigure it, that it return nearly unchanged data i guess)
Okay, I see, where this is going and well, why not. But I am actually not sure, if, following this argument, fitPolynomial should have a required target parameter then. Like other functions in curvefit, the idea of fitPolynomial is to, well fit a model to the data, which usually menas, that almost all values between the source and the result data are different. And while the result of fitPolynomial is an approximation of the original data, one could (an I would) argue, that the resulting time series has nothing to do with the original.
So, unfortunately this leaves room for disussion...So should we add another point to our policy?
All functions producing new data need a positional target parameter