Perf improvements

changed the description

tried your solution. it does not speed up enougth. a simple range test still need >2sec. The main problem is that the dataset has >300 columns. The problem is not the copying, its the access/altering of every single column in masking and unmasking.

i could pass to_mask=[] to every test, but this seems quite a bad idea to me, nevertheless a temporary solution.

mentioned in issue #99 (closed)

a simple range test still need >2sec

how long did it take before?

~4

So something like a 50% speed bump? The GCEF-pipeline in 8.5 instead of 17 hours? Not great, but certainly not bad either...

another 0.5 speed improve (so about 0.25 of original ??) could be done with improving unmasking, by constructing the result from the original data and the (masked) data returned by the flagFuntion. only take the written and freshnew columns from the latter, all other columns from the former. See $67

I run the test suite locally and it succeed perfectly. idk why its faild in the CI

Please also update dios to the latest version. I optimized copy and copy_empty, so we also gain some performance improvement from that.

resolved all threads

added 1 commit

ffe3633c - Apply 1 suggestion(s) to 1 file(s)

Compare with previous version

added 5 commits

f92fba5d - data is reduced to the fields needed by a test
c9c717ac - register takes the optional parameter all_data now
19347f07 - separeted the masking tests from test_core into new file
3b3b4daa - convert numpy arrays to pandas Series as assigning numpy arrays to
a2f3d555 - Merge branch 'perf_improvements' of https://git.ufz.de/rdm-software/saqc into perf_improvements

Compare with previous version

added 1 commit

062db908 - WIP - rework the register and saqcFunc calling machinery

Compare with previous version

marked as a Work In Progress from 062db908

added 1 commit

af4051ec - WIP - rework the register and saqcFunc calling machinery

Compare with previous version

added 1 commit

40a76017 - WIP - func_dump

Compare with previous version

added 2 commits

8c235bda - fixed everything to make it run
013a83ef - clean up register and imports

Compare with previous version

TODO: implement fix for masking/unmasking

take data from new_data

unmasking on subset, defined in columns (those was masked, others not)

for c in columns:
   if index-changed:
      continue
   else:
       take old_data values at NAN positions aka: wasmasked & ismasked & isna

Perf improvements

Merge request reports

Activity