numba riddance

getting rid of numba dependence seems to be desirable since its conflicting potential

I went through our numba-use instances and checked performance gain from numba and alternatives/workarounds. I think removal is doable without much significant performance loss (in center use cases):

flagzScore:

Numba engine is required when rolling with multicolumn windows to get multicolumn sample statistics/scores (axis=1):

I replaced calls to .rolling with numba engine in !700 (merged), with a custom numpy based roller that is at least as efficient (no jitting overhead) in the case of harmonized data
in Case of not-harmonized data and sample size > 500.000 performance loss factor is about 20
since use case with axis=1 and not-harmonized data is extremely rare if at all used, i think the performance loss is acceptable

fitPolynomial

with growing sample size performance loss from not using numba grows:

series length	~factor	total_time
100.000	1
1.000.000	~5	16s/90s
10.000.000	~11	67s/918s

Numba boost kicks in only at around 200.000 samples and growth slowly (in the factor)
For modelling data, usage of spectral based fit (lowPassFilter) is more easy to parametrize and much more fast, so we could just make that the main modelling function and not support optimized polynomial fitting anymore

flagChangepoints

I checked for detection of jumps

series length	~factor	total_time
100.000	1
1.000.000	10	2s/12s
10.000.000	10	15s/113s

Numba boost seems to be capped at 10 and kicks in at about 200.000 samples
since flag jumps (and other basic changepoints tasks) just compare mean (or other basic) statistics, those calls could be dispatched to built-in numoy functions
performance loss only remains for changepoint tasks where statistics are compared that are not so basic to be built in but not too complex/exotic to not be jit-able
so the loss of removal of numba is neglectable

flagRaise

I didnt check for performance loss for flagRaise explecitly.

the function is never ever used, i implemented it for a demand occuring in an early GCEF-flagging session
flagUniLOF is more reliable in achieving what flagRaise trys to, and also much easier to parametrize
so i would be OK with removing numba from flagRaise or even deprecating the function at all (to keep the toolbox slim)

_exceedConsecutiveNanLimit

Some helper function containing a loop to count consecutive NaNs. The jitted loop can be replaced by an equally performative call to np.lib.stride_tricks.sliding_window_views:

np.isnan(np.lib.stride_tricks.sliding_window_view(arr, window_shape=max_consec)).all(axis=1).any(

Edited 1 year ago

numba riddance

Designs

Child items ...

Activity