Skip to content

[Bug]: CF compliance issues when regridding #747

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
DamienIrving opened this issue Mar 28, 2025 · 3 comments · Fixed by #736
Closed

[Bug]: CF compliance issues when regridding #747

DamienIrving opened this issue Mar 28, 2025 · 3 comments · Fixed by #736
Labels
type: bug Inconsistencies or issues which will cause an issue or problem for users or implementors.

Comments

@DamienIrving
Copy link

What happened?

When I updated from an older version (0.5.0) of xcdat to the latest version (0.8.0) regridding of a particular (standard CORDEX-CMIP6) rotated pole grid stopped working.

What did you expect to happen? Are there are possible answers you came across?

I expected it to run successfully as it has in the past.

Minimal Complete Verifiable Example (MVCE)

import numpy as np
import xcdat as xc

ds = xc.open_dataset('https://thredds.nci.org.au/thredds/dodsC/zz63/NARCliM2-0/output/CMIP6/DD/AUS-18/NSW-Government/ACCESS-ESM1-5/historical/r6i1p1f1/NARCliM2-0-WRF412R3/v1-r1/day/tas/latest/tas_AUS-18_ACCESS-ESM1-5_historical_r6i1p1f1_NSW-Government_NARCliM2-0-WRF412R3_v1-r1_day_19510101-19511231.nc')

lats = xc.create_axis('lat', np.round(np.arange(-44.5, -9.99, 0.05), decimals=2))
lons = xc.create_axis('lon', np.round(np.arange(112, 156.26, 0.05), decimals=2))
grid = xc.create_grid(lats, lons)

ds_out = ds.regridder.horizontal('tas', grid, tool='xesmf', method='bilinear')

Relevant log output

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
File ~/opt/anaconda3/envs/test-xcdat/lib/python3.13/site-packages/xesmf/frontend.py:75, in _get_lon_lat(ds)
     74 try:
---> 75     lon = ds.cf['longitude']
     76     lat = ds.cf['latitude']

File ~/opt/anaconda3/envs/test-xcdat/lib/python3.13/site-packages/cf_xarray/accessor.py:2375, in CFDatasetAccessor.__getitem__(self, key)
   2344 """
   2345 Index into a Dataset making use of CF attributes.
   2346 
   (...)   2373 Add additional keys by specifying "custom criteria". See :ref:`custom_criteria` for more.
   2374 """
-> 2375 return _getitem(self, key)

File ~/opt/anaconda3/envs/test-xcdat/lib/python3.13/site-packages/cf_xarray/accessor.py:959, in _getitem(accessor, key, skip)
    958 except KeyError:
--> 959     raise KeyError(
    960         f"{kind}.cf does not understand the key {k!r}. "
    961         f"Use 'repr({kind}.cf)' (or '{kind}.cf' in a Jupyter environment) to see a list of key names that can be interpreted."
    962     ) from None

KeyError: "Dataset.cf does not understand the key 'longitude'. Use 'repr(Dataset.cf)' (or 'Dataset.cf' in a Jupyter environment) to see a list of key names that can be interpreted."

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
Cell In[5], line 1
----> 1 ds_out = ds.regridder.horizontal('tas', grid, tool='xesmf', method='bilinear')

File ~/opt/anaconda3/envs/test-xcdat/lib/python3.13/site-packages/xcdat/regridder/accessor.py:205, in RegridderAccessor.horizontal(self, data_var, output_grid, tool, **options)
    203 input_grid = _get_input_grid(self._ds, data_var, ["X", "Y"])
    204 regridder = regrid_tool(input_grid, output_grid, **options)
--> 205 output_ds = regridder.horizontal(data_var, self._ds)
    207 return output_ds

File ~/opt/anaconda3/envs/test-xcdat/lib/python3.13/site-packages/xcdat/regridder/xesmf.py:158, in XESMFRegridder.horizontal(self, data_var, ds)
    153 if input_da is None:
    154     raise KeyError(
    155         f"The data variable '{data_var}' does not exist in the dataset."
    156     )
--> 158 regridder = xe.Regridder(
    159     self._input_grid,
    160     self._output_grid,
    161     method=self._method,
    162     **self._extra_options,
    163 )
    165 output_da = regridder(input_da, keep_attrs=True)
    167 output_ds = xr.Dataset({data_var: output_da}, attrs=ds.attrs)

File ~/opt/anaconda3/envs/test-xcdat/lib/python3.13/site-packages/xesmf/frontend.py:919, in Regridder.__init__(self, ds_in, ds_out, method, locstream_in, locstream_out, periodic, parallel, **kwargs)
    917     grid_in, shape_in, input_dims = ds_to_ESMFlocstream(ds_in)
    918 else:
--> 919     grid_in, shape_in, input_dims = ds_to_ESMFgrid(
    920         ds_in, need_bounds=need_bounds, periodic=periodic
    921     )
    922 if locstream_out:
    923     grid_out, shape_out, output_dims = ds_to_ESMFlocstream(ds_out)

File ~/opt/anaconda3/envs/test-xcdat/lib/python3.13/site-packages/xesmf/frontend.py:145, in ds_to_ESMFgrid(ds, need_bounds, periodic, append)
    115 """
    116 Convert xarray DataSet or dictionary to ESMF.Grid object.
    117 
   (...)    141 
    142 """
    143 # use np.asarray(dr) instead of dr.values, so it also works for dictionary
--> 145 lon, lat = _get_lon_lat(ds)
    146 if hasattr(lon, 'dims'):
    147     if lon.ndim == 1:

File ~/opt/anaconda3/envs/test-xcdat/lib/python3.13/site-packages/xesmf/frontend.py:80, in _get_lon_lat(ds)
     76     lat = ds.cf['latitude']
     77 except (KeyError, AttributeError, ValueError):
     78     # KeyError if cfxr doesn't detect the coords
     79     # AttributeError if ds is a dict
---> 80     raise ValueError('dataset must include lon/lat or be CF-compliant')
     82 return lon, lat

ValueError: dataset must include lon/lat or be CF-compliant

Anything else we need to know?

If I run it directly with xesmf it works...

regridder = xesmf.Regridder(ds, grid, "bilinear")
da_out = regridder(ds['tas'])

Environment

print(xcdat.__version__)
'0.8.0'

xr.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.13.2 | packaged by conda-forge | (main, Feb 17 2025, 14:02:48) [Clang 18.1.8 ]
python-bits: 64
OS: Darwin
OS-release: 24.3.0
machine: arm64
processor: arm
byteorder: little
LC_ALL: None
LANG: en_AU.UTF-8
LOCALE: ('en_AU', 'UTF-8')
libhdf5: 1.14.3
libnetcdf: 4.9.2

xarray: 2025.3.0
pandas: 2.2.3
numpy: 2.1.3
scipy: 1.15.2
netCDF4: 1.7.2
pydap: None
h5netcdf: None
h5py: None
zarr: None
cftime: 1.6.4
nc_time_axis: None
iris: None
bottleneck: None
dask: 2025.3.0
distributed: 2025.3.0
matplotlib: None
cartopy: None
seaborn: None
numbagg: None
fsspec: 2025.3.0
cupy: None
pint: None
sparse: 0.15.5
flox: None
numpy_groupies: None
setuptools: None
pip: 25.0.1
conda: None
pytest: None
mypy: None
IPython: None
sphinx: None
@DamienIrving DamienIrving added the type: bug Inconsistencies or issues which will cause an issue or problem for users or implementors. label Mar 28, 2025
@pochedls
Copy link
Collaborator

This issue is likely related to this issue and I think @jasonb5 understands what is going on and is working on a fix. I'll try to summarize (based on my recollection/understanding of what @jasonb5 found):

Latitude and longitude for curvilinear grids are stored in 2d arrays (e.g., lat[nlat, nlon]) with spatial indices (nlat and nlon). The data is stored as a function of the spatial indices (e.g., tas[time, nlat, nlon]). But it is difficult to generically distinguish between lat and nlat because both are Y coordinates.

This works in xesmf because they hard-coded in a search for lat and lon before using CF-conventions to look for the coordinate axes.

I think we will end up doing something similar (and maybe something like this was implemented in v0.5). This could be a problem if data is stored as something other than lat and lon (e.g., latitude / longitude), but other solutions are complicated and may create other problems.

I couldn't actually read the data in your comment (OSError: [Errno -68] NetCDF: I/O failure), but I think this is likely the issue you are facing. If you think it is something else, maybe you could post a subset of the data (e.g., one time step) for us to look at.

@tomvothecoder
Copy link
Collaborator

Thanks @pochedls. Yes, it is a similar issue to #718 and both will be addressed by #736 (which should be done soon).

@tomvothecoder
Copy link
Collaborator

tomvothecoder commented Mar 31, 2025

I was able to reproduce the error using the example code in the description.

I re-ran the code using the fix in #736 and successfully produced the regridded output. We're good to go once #736 is merged and v0.8.1 is released (probably soon!).

Example Code

import numpy as np
import xcdat as xc

ds = xc.open_dataset(
    "https://thredds.nci.org.au/thredds/dodsC/zz63/NARCliM2-0/output/CMIP6/DD/AUS-18/NSW-Government/ACCESS-ESM1-5/historical/r6i1p1f1/NARCliM2-0-WRF412R3/v1-r1/day/tas/latest/tas_AUS-18_ACCESS-ESM1-5_historical_r6i1p1f1_NSW-Government_NARCliM2-0-WRF412R3_v1-r1_day_19510101-19511231.nc"
)

lats = xc.create_axis("lat", np.round(np.arange(-44.5, -9.99, 0.05), decimals=2))
lons = xc.create_axis("lon", np.round(np.arange(112, 156.26, 0.05), decimals=2))
grid = xc.create_grid(lats, lons)

ds_out = ds.regridder.horizontal("tas", grid, tool="xesmf", method="bilinear")
print(ds_out)

Output

<xarray.Dataset> Size: 894MB
Dimensions:    (time: 365, lat: 691, lon: 886, bnds: 2)
Coordinates:
    height     float64 8B 2.0
  * time       (time) object 3kB 1951-01-01 12:00:00 ... 1951-12-31 12:00:00
  * lat        (lat) float64 6kB -44.5 -44.45 -44.4 ... -10.1 -10.05 -10.0
  * lon        (lon) float64 7kB 112.0 112.0 112.1 112.2 ... 156.2 156.2 156.2
Dimensions without coordinates: bnds
Data variables:
    tas        (time, lat, lon) float32 894MB 286.0 286.0 286.0 ... 302.7 302.7
    lon_bnds   (lon, bnds) float64 14kB 112.0 112.0 112.0 ... 156.2 156.2 156.3
    lat_bnds   (lat, bnds) float64 11kB -44.52 -44.48 -44.48 ... -10.03 -9.975
    time_bnds  (time, bnds) object 6kB ...
Attributes: (12/43)
    project_id:                      CORDEX
    comment:                         DPIE version of WRF4.1.2
    DPIE_WRF_HASH:                   a051fdc73749349fd244ce8e596088a372bdb0c5
    wrf_options:                     sst_update & tmn_update
    frequency:                       day
    git_url_postprocessing:          git@bitbucket.org:oehcas/cordex_postproc...
    ...                              ...
    wrf_schemes_sf_sfclay_physics:   Revised_MM5; Jimenez et al. (2012, MWR)
    wrf_schemes_sf_surface_physics:  Niu, Guo-Yue, Zong-Liang Yang, Kenneth E...
    wrf_schemes_mp_physics:          Thompson, Gregory, Paul R. Field, Roy M....
    wrf_schemes_bl_pbl_physics:      Nakanishi, M., and H. Niino, 2006: An im...
    wrf_schemes_cu_physics:          Betts-Miller-Janjic; Janjic (1994, MWR; ...
    history:                         Tue May  7 20:01:44 2024: ncrename -a gl...

Grid inspection (sanity check)

print(ds_out.regridder.grid)
<xarray.Dataset> Size: 38kB
Dimensions:   (lon: 886, bnds: 2, lat: 691)
Coordinates:
    height    float64 8B 2.0
  * lon       (lon) float64 7kB 112.0 112.0 112.1 112.2 ... 156.2 156.2 156.2
  * lat       (lat) float64 6kB -44.5 -44.45 -44.4 -44.35 ... -10.1 -10.05 -10.0
Dimensions without coordinates: bnds
Data variables:
    lon_bnds  (lon, bnds) float64 14kB 112.0 112.0 112.0 ... 156.2 156.2 156.3
    lat_bnds  (lat, bnds) float64 11kB -44.52 -44.48 -44.48 ... -10.03 -9.975
Attributes: (12/43)
    project_id:                      CORDEX
    comment:                         DPIE version of WRF4.1.2
    DPIE_WRF_HASH:                   a051fdc73749349fd244ce8e596088a372bdb0c5
    wrf_options:                     sst_update & tmn_update
    frequency:                       day
    git_url_postprocessing:          git@bitbucket.org:oehcas/cordex_postproc...
    ...                              ...
    wrf_schemes_sf_sfclay_physics:   Revised_MM5; Jimenez et al. (2012, MWR)
    wrf_schemes_sf_surface_physics:  Niu, Guo-Yue, Zong-Liang Yang, Kenneth E...
    wrf_schemes_mp_physics:          Thompson, Gregory, Paul R. Field, Roy M....
    wrf_schemes_bl_pbl_physics:      Nakanishi, M., and H. Niino, 2006: An im...
    wrf_schemes_cu_physics:          Betts-Miller-Janjic; Janjic (1994, MWR; ...
    history:                         Tue May  7 20:01:44 2024: ncrename -a gl...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: bug Inconsistencies or issues which will cause an issue or problem for users or implementors.
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

3 participants