Finam-netCDF refactoring: Transition from Xarray to NetCDF4
Motivation to transition from Xarray to NetCDF4?
When using xarray to write data to a NetCDF file, it follows a two-step process:
- First, it creates an xarray DataArray or Dataset in memory, which holds all the information to be written in a NetCDF file.
- After preparing the data in memory, xarray writes the entire DataArray or Dataset to the NetCDF file in one go.
This means that all the data in the NetCDF file must be available in memory at the time of writing. This behaviour can be limiting when dealing with very large datasets that may not fit into memory or for scenarios when time is a constrain.
In contrast, the netCDF4 library allows to add data to a NetCDF file incrementally, one time step or chunk at a time, without requiring the entire dataset to be present in memory. This feature is especially beneficial for streaming data or cases where data is generated over time (which is the case in Finam) and needs to be added to an existing NetCDF file without the need to load all the data into memory at once.
Highlights
Besides the previous mentioned advantage of using NetCDF4 instead of Xarray, there had been other changes made into Finam-netCF
to improve user experience:
- All
NetCdfReader()
input arguments, butpath
are now optional, which means thatNetCdfReader()
input parameters can be given implicitly or explicitly:
Implicitly:
fm_nc.NetCdfReader(path="path/to/file.nc")
Explicitly:
fm_nc.NetCdfReader(
path="path/to/file.nc",
outputs={
"VAR1": fm_nc.Layer(var="var1", xyz=("x", "y")),
"VAR2": fm_nc.Layer(var="var2", xyz=("x", "y")),
"VAR3": fm_nc.Layer(var="var3", xyz=("x", "y"), static=True),
},
time_var="time",
)
-
When the
NetCdfReader()
is used implicitlyFinam
checks if the input nc files follows the CF conventions, if CF conventions are not followed an error is raised. WhenNetCdfReader()
is used explicitly the variables, dimensions and coordinates names must match the input nc file names. -
Finam writer allows users to add global attributes to outputs NetCDF files if needed.
-
input argument
start
is not longer needed inNetCdfTimedWriter()