xarray for data storing
xarray provides a perfect framework for use to handle data instead of pure numpy arrays.
For example, we could use xarray.DataArray
to add information to a numpy array, like the corresponding timestamp and the axes names:
import numpy as np
import xarray as xr
import pandas as pd
# assume these to be a data / time pair as we use it now
time = pd.Timestamp(year=2022, month=9, day=20)
data = np.arange(6).reshape(3,2)
This could be combined into a data-array, by prepending an empty dimension for the time:
da = xr.DataArray(data[np.newaxis, ...], name="data", dims=["time", "lon", "lat"], coords=dict(time=[time]))
<xarray.DataArray 'data' (time: 1, lon: 3, lat: 2)>
array([[[0, 1],
[2, 3],
[4, 5]]])
Coordinates:
* time (time) datetime64[ns] 2022-09-20
Dimensions without coordinates: lon, lat
This kind of representation fits well with the grid specifications presented in !74 (merged). For unstructured data, we could use "id"
as the dimension next to "time"
.
xarray also recently added pint support (https://xarray.dev/blog/introducing-pint-xarray), so this could also solve our units discussion.
This also allows storing multiple time-steps in and Output object as a list: [da1, da2, da3]
. When pulling data, we could then simply concatenate these data-arrays:
t1 = pd.Timestamp(year=2022, month=9, day=20)
d1 = np.arange(6).reshape(3,2)
t2 = pd.Timestamp(year=2022, month=9, day=21)
d2 = np.arange(6).reshape(3,2) + 10
da1 = xr.DataArray(d1[np.newaxis, ...], name="data", dims=["time", "lon", "lat"], coords=dict(time=[t1]))
da2 = xr.DataArray(d2[np.newaxis, ...], name="data", dims=["time", "lon", "lat"], coords=dict(time=[t2]))
da = xr.concat([da1, da2], dim="time")
print(da)
<xarray.DataArray 'data' (time: 2, lon: 3, lat: 2)>
array([[[ 0, 1],
[ 2, 3],
[ 4, 5]],
[[10, 11],
[12, 13],
[14, 15]]])
Coordinates:
* time (time) datetime64[ns] 2022-09-20 2022-09-21
Dimensions without coordinates: lon, lat