Currently, data passed may depend on module order (see #75 (closed)).
The problem is that on pull, the last pushed data entry is returned. For modules that have synchronized time steps, the data returned differs depending on update order:
Source updated first:
source o-----o .-'target o
Here, the target gets data for the wrong time.
Target updated first:
source o |target o
Here, the target gets the data for the correct time.
As a solution, outputs should be able to hold the two most recent data pushes. On pull, the data with the timestamp closest to the pull time should be returned, instead of simply the last data pushed.
In the above example, the first data entry to the source would be retrieved in both cases.
Alternatively, we could consider a rule like returning only data for the requested time or earlier. Would still result in the correct behaviour in the above example. This could however be problematic fir timestamps of source and target very close, but not exactly matching.
Edited
Designs
Child items
...
Show closed items
Linked items
0
Link issues together to show that they're related.
Learn more.
I still think, if the source has the data for the time-step, the current time the module updates to, it should take that time.
Also, this discussions is just about models that have the exact same time-stepping. So following to IO chain, models that don't have any inputs (like file readers) should update first and provide data for the target time of models, that are connected to these.
It's like the internal calculations in mHM: processes that are needed by other processes are calculated first (soilMoisture -> runoff for example.. runoff-routine takes the soil moisture of the current time to calculate the runoff for the current time).
I don't see why this should increase the time. That would also mean, that after the first time-step, the calculated initial value is the same as the result of the first time-step, since we use the values from the previous time-step. That doesn't make sense to me.
In the last example, the arrow needs to go from top-right to bottom-left, or not? Otherwise, it is just what I proposed with the first time step not shown.
Also, I think we should not have any special handling for synchronized time steps. We need a solution that covers this case just like any other.
For circular or bi-directional coupling, at least one component needs to work with current/past data. Should that be decided by the user? By adding a PrevTime adapter?
Further, I am not sure how, or if at all, this can be compatible with time interpolation. Do we then always pull with the future time? I.e. 1. Increment time, 2. Pull, 3. Push? Think of Formind with 1y step. How could mHM provide data for 1 year in the future? Update order does not help here.
So I think to implement this in a consistent way, we have to completely rethink the scheduling algorithm. This would be fine for me. Inconsistent application of some rules in one case, but not in others, would not.
For circular or bi-directional coupling, at least one component needs to work with current/past data. Should that be decided by the user? By adding a PrevTime adapter?
Yes I think a PrevTime adapter should be added by the user.
For the other stuff:
I think this is really implying that we need at least two time-steps saved (Or to be able to state how many should be saved (by default 2))
We need a default behavior, how to get the data from available time-steps, options are:
nearest
linear interpolation of available time-steps with nearest interpolation, when out of available time-frame
linear interpolation with extrapolation
other interpolation
we already had no-branch adapters, that store time-steps until data is pulled, that could help with big time-step differences as you have shown
We should probably avoid extrapolation as much as possible. If not possible, it should be nearest. Also, I think extrapolation should require special user action, like adding an adapter.
That would also mean, that after the first time-step, the calculated initial value is the same as the result of the first time-step, since we use the values from the previous time-step. That doesn't make sense to me.
This is definitely a good point that we should keep in mind for our own mental consistency check.
Handling of the big time step diffs is still unclear to me. And we should also keep in mind that we can't plan with 2 components only. (Esp. regarding a run-ahead mechanism you proposed earlier)
Somehow I see our complete concept collapse if we can't use the scheme of 1) pull for current state, 2) advance state, 3) push for new state. But yeah, this was questioned multiple times already.
So to be honest, I am quite a bit concerned that we based all that stuff on unacceptable assumptions, and that I convinced the team of a concept that starts to fall apart now. 😞
And in addition: What about focusing on the cases we have at the moment? I think adding a time_step attribute could be a good idea. It then could be None by default to state the the model keeps its secret (like it is now). This could also be used to inherit time-stepping from other modules (like netcdf readers).
And we could provide output buffers that are always no-branch adapters to provide buffering at each desired place.
I don't see anything falling apart.
For the updating routine, I now go with the following default scheme:
update time (to have a vertical arrow)
pull inputs
calculate data
push outputs
And in addition downstream inputs should be updated first (like readers). With this as default, we can now generate a list of possible constellations of timing and required available data.