Several fixes
Merge request reports
Activity
added 1 commit
- 0ffd8c04 - [FIX] reduce runtime by ~45% (!) through the removal of builtins.any
added 1 commit
- 9e1d6d25 - [FIX] reduce the memory consumption of SaQC by >50% through Histories of type pd.Categorical
added 1 commit
- ea4d0b06 - [FIX] reduce the memory consumption of SaQC by >50% through Histories of type pd.Categorical
- Resolved by David Schäfer
- Resolved by David Schäfer
78 78 hist = hist.copy() 79 79 mask = mask.copy() 80 80 81 self.hist = hist 81 self.hist = hist.astype("category") changed this line in version 6 of the diff
So maybe one more general remark about the somewhat random collections of commits within in this MR:
All changes are fixes to issues I had when porting the SEEFO-pipeline to the latest develop (through !260 (merged)). Every commit is a fix to one issue I encountered (I tried hard to not intermingle changes) in chronological order. With exception of 1da172e8, which corrects faulty behavior without breakage, all changes fix an issue hardly breaking the formerly running pipeline. So, I had to come up with some sort of solutions to the underlying problems in order to get things into their former state of working. Are the proposed changed the only solution to the underlying problems? Likely not! Are these changed the best solution to the underlying problems? Probably also not...
Not sure, if this comments helps or not, I just wanted to make clear, that there are no simple refactorings incorporated. I chose the commit message prefix
[FIX]
as, from the point of view of the SEEFO-pipeline, these changes are fixing regressions.With that out of the way, I am happy to discuss any objections/suggestions on any suitable channel. Cheers!
added 1 commit
- 686065dc - [FIX] reduce the memory consumption of SaQC by >50% through Histories of type pd.Categorical
mentioned in merge request !260 (merged)
Closing as all commits, will go in through !260 (merged)
From some brief benchmarking, i got, that casting via
df.astype(pd.SparseDtype('float', np.nan))
, instead ofdf.astype('category')
, is faster in casting (around 30 percent) and uses less memory (Factor 1-10, without initial unflagged column: factor 2-20) and is faster in column and row access and also in row wise max calculation.So, since integrating would just mean to replace category cast by sparse cast, maybe we should give it a try?
Sure! But let's please do it after !260 (merged) was merged.
!260 (merged) is in now, so feel free to sparsify.