MPI communication layout
Spawning MPI processes at runtime is probably not supported on EVE. We thus need to find ways to connect and distribute pre-allocated MPI processes.
Initialization
Use of pre-allocated nodes requires that Finam itself is started on each of those nodes. Based on which process/rank it is running on, it then needs to decide whether to run the normal Finam scheduler, or model workers. It is unclear how this can be managed, optimally by the model wrappers themselves without prior knowledge required by the user or the framework.
Possible layouts
Group rank 0 communicates with master
Each model is initialized with a communicator the connects all its assigned processes. The master (running Finam) is grouped with all model processes of rank 0. This communicator is also passed to all models. The models are responsible to communicate with the master to receive inputs and send outputs.
flowchart TB
subgraph Formind
F0 --- |B| F1
F0 --- |B| F2
F0 --- |B| F3
F0 --- |B| F4
end
subgraph OGS
O0 --- |C| O1
O0 --- |C| O2
O0 --- |C| O3
end
master --- |A| F0
master --- |A| O0
Letters on connections denote communicators/groups/communication channels. Processes in each box can also communicate with each other, connections not shown.
All processes communicate with master
Each model is initialized with a communicator for a group comprising its assigned processes, and the master (as rank 0). Models are responsible for collecting results in master.
flowchart TB
subgraph Formind
F1
F2
F3
F4
end
subgraph OGS
O1
O2
O3
end
master --- |A| F1
master --- |A| F2
master --- |A| F3
master --- |A| F4
master --- |B| O1
master --- |B| O2
master --- |B| O3