Skip to content

Finam-netCDF refactoring: Transition from Xarray to NetCDF4

Jeisson Leal requested to merge Netcdf4_refactoring into main

Motivation to transition from Xarray to NetCDF4?

When using xarray to write data to a NetCDF file, it follows a two-step process:

  1. First, it creates an xarray DataArray or Dataset in memory, which holds all the information to be written in a NetCDF file.
  2. After preparing the data in memory, xarray writes the entire DataArray or Dataset to the NetCDF file in one go.

This means that all the data in the NetCDF file must be available in memory at the time of writing. This behaviour can be limiting when dealing with very large datasets that may not fit into memory or for scenarios when time is a constrain.

In contrast, the netCDF4 library allows to add data to a NetCDF file incrementally, one time step or chunk at a time, without requiring the entire dataset to be present in memory. This feature is especially beneficial for streaming data or cases where data is generated over time (which is the case in Finam) and needs to be added to an existing NetCDF file without the need to load all the data into memory at once.

Highlights

Besides the previous mentioned advantage of using NetCDF4 instead of Xarray, there had been other changes made into Finam-netCF to improve user experience:

  1. All NetCdfReader() input arguments, but path are now optional, which means that NetCdfReader() input parameters can be given implicitly or explicitly:

Implicitly:

fm_nc.NetCdfReader(path="path/to/file.nc")

Explicitly:

fm_nc.NetCdfReader(
path="path/to/file.nc",
outputs={
    "VAR1": fm_nc.Layer(var="var1", xyz=("x", "y")),
    "VAR2": fm_nc.Layer(var="var2", xyz=("x", "y")),
    "VAR3": fm_nc.Layer(var="var3", xyz=("x", "y"), static=True),
},
    time_var="time",
)
  1. When the NetCdfReader() is used implicitly Finam checks if the input nc files follows the CF conventions, if CF conventions are not followed an error is raised. When NetCdfReader() is used explicitly the variables, dimensions and coordinates names must match the input nc file names.

  2. Finam writer allows users to add global attributes to outputs NetCDF files if needed.

  3. input argument start is not longer needed in NetCdfTimedWriter()

Edited by Jeisson Leal

Merge request reports