Snippets Groups Projects

Memory & GC

Problem

We bought flexibility regarding time steps, time interpolation etc. at the cost of memory consumption and instantiation of a lot of arrays.

Currently, outputs accept data arrays only as "owned" (in Rust terms). I.e. coponents or adapters should not modify pushed data afterwards. The entire data forwarding is based on passing these arrays through the chain, and creating new arrays at calculations (e.g. time interpolation).

Also, depending on the pull interval, outputs or time adapters may hold a considerable amount of pushed arrays.

This could lead to high memory consumption, and requires work by the garbage collector.

The new scheduling algorithm makes the issue worse, as it is not guaranteed anymore that a pull is in the range of the last two pushes.

Is this really a problem?

Examples

Germany, 1 km resolution -> 860x630 cells -> 4.4 MB with numpy.float64

30 grids -> 130 MB
365 grids -> 1.6 GB

EU, 4 km resolution -> 1875x1375 cells -> 20.6 MB with numpy.float64

30 grids -> 620 MB
365 grids -> 7.5 GB

Possible solutions

Ignore it until it really becomes a problem
Re-think the entire timing/step size stuff and be much more restrictive (unlikely)
Store large stacks of rasters in files instead of RAM (using Dask or np.memmap?) (see !238 (merged) for a draft implementation)
Some kinds of coupling, e.g. with an equal time step and no adapters, could use a more memory-conserving method with writing to and reading from a shared array.
Allow linkage over MPI and distribute memory over multiple nodes
Inform time adapters about the next expected pull -- would allow to aggregate data inplace and keep only data after the next pull (which may happen due to dependency scheduling)

Edited 2 years ago

Designs

Child items 0

No child items are currently assigned. Use child items to break down this issue into smaller parts.

Activity

Martin Lange added coupling design performance labels 2 years ago

added coupling design performance labels
Martin Lange changed the description 2 years ago

changed the description
Martin Lange changed the description 2 years ago

changed the description
Martin Lange changed the description 2 years ago

changed the description
Martin Lange added needs-discussion label 2 years ago

added needs-discussion label
Martin Lange changed the description 2 years ago

changed the description
Martin Lange changed the description 2 years ago

changed the description
Martin Lange mentioned in merge request !238 (merged) 2 years ago

mentioned in merge request !238 (merged)
Martin Lange changed the description 2 years ago

changed the description
Martin Lange changed the description 2 years ago

changed the description
Martin Lange @mlange · 2 years ago

Author Owner

I suggest to rename to_xarray() to prepare() or prepare_data().
Martin Lange mentioned in commit afd98511 2 years ago

mentioned in commit afd98511
Martin Lange closed with merge request !238 (merged) 2 years ago

closed with merge request !238 (merged)
Sebastian Müller changed milestone to %release-0.4 1 year ago

changed milestone to %release-0.4

Please register or sign in to reply

Labels

None

Milestone

None

Due date

None

Confidentiality

Not confidential

0 Participants