Skip to content

Dataprocessing features

Peter Lünenschloß requested to merge dataprocessing_features into develop

Major changes:

HARMONIZATION

  • no more heap
  • harmonization module is now a pure wrapper containing module functions to deliver the classic harmonization look and feel. The harm_something2grid operators are available under their old names and signatures. Additionally there is the harm_deharmonize wrapper.
  • The wrapper are composed as i recently suggested in the small moduling review session. Although the composed elements mainly do accountants tasks, i hope the better level of transparency and intuitivity is already visible
  • All config fields, set up with the old harmonization functions, should just continue to work

DATAPROCESSING MODULE / TS OPERATORS

  • The dataprocessing module now gathers functions used to "process" data. This includes interpolation, resampling, transformation, projection, dropping,...
  • One major improvement/change is, that now every kind of aggregation and resampling (what includes harmonization), both of data and flags series`, is delegated to one function (aggregate2freq) in the ts_operators module and both, data and flags, are resampled/aggregated in that function by the very same mechanism. So there is one and only one central point where for example, changes/parameters controlling validation aggregation behavior would have to virtue.
  • same holds for all kinds of Interpolations - all interpolation (mainly interpolation of inserted frequency grids and of nan values already present in the data) is done by the interpolateNANs method in ts_operators.
  • To tackle the upcoming tasks of counting nans and or BAD flagged values per aggregation/resampling interval, and make dependent resampling/aggregation from the results of that validations, there are now parameters max_total_valid and max_consec_valid in proc_resample, to control this behavior - passing a numpy nanfunc or a pandas func will now have no differing results, because all funcs get passed only valid values and the whole masking/validation is done in a seperate processing step inside aggregate2freq, by a call to this little fella living in the ts_operators. So validation granularity is increased and ambiguity is mitigated.

TESTING / DOCUMENTATION

  • I rewrote the old harmonization tests so that they would apply to the wrapper. As a result, the new harmonization mechanism is tested at the same level as the old.
  • dataprocessing is tested.
  • I added documentation to the data processing functions in a syntax i copied from the dios project.

QUESTIONS / DISCUSSION

  • @schaefed - i wanted to make available the ts_operators last, first and count in the config file. They are mainly dummies to trigger resample.count(), resample.first(), when passed to proc_resample - So i added them to the visitors environment. Uhm - that is the right place to do that, isnt it? Somehow wasnt sure
Edited by Peter Lünenschloß

Merge request reports