-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enhance coordinate handling and add curvilinear dataset support #736
Conversation
regridder.grid
property to ensure cf compliant coordinates
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #736 +/- ##
=========================================
Coverage 100.00% 100.00%
=========================================
Files 16 16
Lines 1658 1680 +22
=========================================
+ Hits 1658 1680 +22 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Things are working now, but I think there is still an issue. Take a look at the map from xcdat (this PR): compared to calling xesmf directly: A vague idea/possibility: In curvilinear grids |
@pochedls Thanks Steve. It looks like we need more work on this. I'll push back the release of v0.8.0 depending on when the next E3SM Unified is released (which is probably being pushed back from 2/15)
Hmm this is a good question. I'll step through the code to see which values are being used for latitude. |
I explain the issue here. You're correct, This is mainly a metadata/coordinates issue rather than an issue in xCDAT's logic. We'll need to think more carefully about how we can retrieve |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Thanks @tomvothecoder @pochedls
More work needs to be done based on @jasonb5's comment here: #718 (comment) |
regridder.grid
property to ensure cf compliant coordinatesregridder.grid
property to return desired coordinates
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @pochedls and @jasonb5, this PR is ready for review. Unit tests have been updated.
I verified this PR now works on the test example in #718.
# %%
# imports
import xcdat as xc
import numpy as np
# target grid
nlat = xc.create_axis("lat", np.arange(-88.5, 90, 2.5))
nlon = xc.create_axis("lon", np.arange(1.25, 360, 2.5))
ngrid = xc.create_grid(x=nlon, y=nlat)
# open / regrid
p = "/p/css03/esgf_publish/CMIP6/ScenarioMIP/NCAR/CESM2-WACCM/ssp585/r1i1p1f1/Omon/fgco2/gn/v20200702/"
ds = xc.open_mfdataset(p)
# %%
ds = ds.regridder.horizontal(
"fgco2", ngrid, tool="xesmf", method="conservative_normed", periodic=True
)
Output
/home/vo13/miniforge3/envs/xcdat_dev_531/lib/python3.13/site-packages/xarray/conventions.py:204: SerializationWarning: variable 'fgco2' has multiple fill values {np.float32(1e+20), np.float64(1e+20)} defined, decoding all values to NaN.
var = coder.decode(var, name=name)
/home/vo13/miniforge3/envs/xcdat_dev_531/lib/python3.13/site-packages/xarray/conventions.py:204: SerializationWarning: variable 'fgco2' has multiple fill values {np.float32(1e+20), np.float64(1e+20)} defined, decoding all values to NaN.
var = coder.decode(var, name=name)
/home/vo13/miniforge3/envs/xcdat_dev_531/lib/python3.13/site-packages/xarray/conventions.py:204: SerializationWarning: variable 'fgco2' has multiple fill values {np.float32(1e+20), np.float64(1e+20)} defined, decoding all values to NaN.
var = coder.decode(var, name=name)
/home/vo13/miniforge3/envs/xcdat_dev_531/lib/python3.13/site-packages/xarray/conventions.py:204: SerializationWarning: variable 'fgco2' has multiple fill values {np.float32(1e+20), np.float64(1e+20)} defined, decoding all values to NaN.
var = coder.decode(var, name=name)
/home/vo13/miniforge3/envs/xcdat_dev_531/lib/python3.13/site-packages/xarray/conventions.py:204: SerializationWarning: variable 'fgco2' has multiple fill values {np.float32(1e+20), np.float64(1e+20)} defined, decoding all values to NaN.
var = coder.decode(var, name=name)
2025-03-28 14:58:48,413 [WARNING]: bounds.py(add_missing_bounds:194) >> The nlat coord variable has a 'units' attribute that is not in degrees.
2025-03-28 14:58:48,413 [WARNING]: bounds.py(add_missing_bounds:194) >> The nlat coord variable has a 'units' attribute that is not in degrees.
I think we can ignore the below warnings since the grid
object does not include nlat_bnds
and nlon_bnds
(they aren't necessary). xESMF is still able to retrieve the correct bounds (lat_bnds
and lon_bnds
).
/home/vo13/miniforge3/envs/xcdat_dev_531/lib/python3.13/site-packages/xesmf/frontend.py:95: UserWarning: Variables {'nlon_bnds'} not found in object but are referred to in the CF attributes.
lon_bnds = ds.cf.get_bounds('longitude')
/home/vo13/miniforge3/envs/xcdat_dev_531/lib/python3.13/site-packages/xesmf/frontend.py:96: UserWarning: Variables {'nlon_bnds'} not found in object but are referred to in the CF attributes.
lat_bnds = ds.cf.get_bounds('latitude')
def generate_curvilinear_dataset() -> xr.Dataset: | ||
"""Generate a curvilinear Dataset with CF-compliant metadata. | ||
|
||
The dataset includes variables for time, latitude, longitude, and their | ||
respective bounds. It also contains a synthetic data variable (``test_var``) | ||
and additional coordinate information. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function generates a dummy curvilinear dataset, similar to the one used in the example in #718.
p = '/p/css03/esgf_publish/CMIP6/ScenarioMIP/NCAR/CESM2-WACCM/ssp585/r1i1p1f1/Omon/fgco2/gn/v20200702/'
def test_grid_curvilinear(self): | ||
ds = fixtures.generate_curvilinear_dataset() | ||
|
||
result = ds.regridder.grid | ||
expected = xr.Dataset( | ||
coords={"lat": ds.lat, "lon": ds.lon, "nlon": ds.nlon, "nlat": ds.nlat}, | ||
data_vars={ | ||
"lat_bnds": ds.lat_bnds, | ||
"lon_bnds": ds.lon_bnds, | ||
}, | ||
) | ||
|
||
xr.testing.assert_identical(result, expected) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New unit test to cover curvilinear case
xcdat/regridder/accessor.py
Outdated
def _get_coord_by_name(self, name: CFAxisKey) -> xr.DataArray: | ||
"""Get the coordinate variable based on its name. | ||
|
||
This method is useful for returning the desired coordinates for an | ||
axis that has multiple sets of coordinates. For example, curvilinear | ||
grids can have multiple coordinates for the same axis (e.g., (nlat,lat) | ||
for X and (nlon,lon)) for Y. | ||
|
||
This method will find the desired coordinates based on its name while | ||
ensuring that the coordinate variable is 1D by dropping any extra | ||
dimensions (e.g., "nlat" or "nlon"). | ||
|
||
Parameters | ||
---------- | ||
name : CFAxisKey | ||
Name of the coordinate. | ||
|
||
Returns | ||
------- | ||
xr.DataArray | ||
The coordinate variable. | ||
|
||
Raises | ||
------ | ||
KeyError | ||
If the coordinate variable is not found in the dataset. | ||
""" | ||
coord_names = VAR_NAME_MAP[name] | ||
|
||
for coord_name in coord_names: | ||
if coord_name in self._ds.coords: | ||
coord = self._ds.coords[coord_name] | ||
|
||
return coord | ||
|
||
raise KeyError(f"Coordinate with name '{name}' not found in the dataset.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This new function retrieves X and Y axis coordinates using the xCDAT VAR_NAME_MAP
table (e.g., "lat" for latitude coordinates).
Lines 41 to 49 in a282117
# A dictionary that maps common variable names to coordinate variables. This | |
# map is used as fall-back when coordinate variables don't have CF attributes | |
# set for ``cf_xarray`` to interpret using `CF_ATTR_MAP`. | |
VAR_NAME_MAP: Dict[CFAxisKey, List[str]] = { | |
"X": ["longitude", "lon"], | |
"Y": ["latitude", "lat"], | |
"T": ["time"], | |
"Z": ["vertical", "height", "pressure", "lev", "plev"], | |
} |
It is more dynamic than the xESMF hard-coded retrieval of just "lat" and "lon" here.
ds = ds.bounds.add_missing_bounds(axes=["X", "Y", "Z"]) | ||
# Add bounds only for axes that do not already have them. This | ||
# prevents multiple sets of bounds being added for the same axis. | ||
# For example, curvilinear grids can have multiple coordinates for the | ||
# same axis (e.g., (nlat, lat) for X and (nlon, lon) for Y). We only | ||
# need lat_bnds and lon_bnds for the X and Y axes, respectively, and not | ||
# nlat_bnds and nlon_bnds. | ||
for axis, has_bounds in axis_has_bounds.items(): | ||
if not has_bounds: | ||
ds = ds.bounds.add_bounds(axis=axis) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new logic prevents unnecessary attempts to add bounds for axes if they already exist. Otherwise, I found that ds.bounds.add_missing_bounds()
will unnecessarily add nlon_bnds
while failing to add bounds for nlat
(raises warning about units not in degrees).
Thanks for putting this PR together so quickly, @tomvothecoder (and for the work you did in identifying/explaining the issue @jasonb5)! Have a great weekend! |
- CF-compliance is required for xESMF to properly interpret coordinates using CF-xarray
ab6761e
to
2e27396
Compare
- Update `accessor.py` to use `get_coords_by_name` and remove `_get_coord_by_name`
def get_coords_by_name(obj: xr.Dataset | xr.DataArray, axis: CFAxisKey) -> xr.DataArray: | ||
"""Retrieve the coordinate variable based on its name. | ||
|
||
This method is useful for returning the desired coordinate in the following | ||
cases: | ||
|
||
- Coordinates that are not CF-compliant (e.g., missing CF attributes like | ||
"axis", "standard_name", or "long_name") but use common names. | ||
- Axes with multiple sets of coordinates. For example, curvilinear grids may | ||
have multiple coordinates for the same axis (e.g., (nlat, lat) for X and | ||
(nlon, lon) for Y). In most cases, "lat" and "lon" are the desired | ||
coordinates, which this function will return. | ||
|
||
Common variable names for each axis (from ``VAR_NAME_MAP``): | ||
|
||
- "X" axis: ["longitude", "lon"] | ||
- "Y" axis: ["latitude", "lat"] | ||
- "T" axis: ["time"] | ||
- "Z" axis: ["vertical", "height", "pressure", "lev", "plev"] | ||
|
||
Parameters | ||
---------- | ||
axis : CFAxisKey | ||
The CF axis key ("X", "Y", "T", or "Z"). | ||
|
||
Returns | ||
------- | ||
xr.DataArray | ||
The coordinate variable. | ||
|
||
Raises | ||
------ | ||
KeyError | ||
If the coordinate variable is not found in the dataset. | ||
""" | ||
coord_names = VAR_NAME_MAP[axis] | ||
|
||
for coord_name in coord_names: | ||
if coord_name in obj.coords: | ||
coord = obj.coords[coord_name] | ||
|
||
return coord | ||
|
||
raise KeyError(f"Coordinate with name '{axis}' not found in the dataset.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pochedls I made this function a public API (formerly _get_coord_by_name()
in accessor.py
) because I found it was a general function and it might be useful for users who deal with the cases mentioned in the docstring.
def _get_axis_coord_and_bounds( | ||
self, axis: CFAxisKey | ||
) -> Tuple[xr.DataArray | None, xr.DataArray | None]: | ||
try: | ||
bounds_var = self._ds.bounds.get_bounds(name, coord_var.name) | ||
except KeyError: | ||
bounds_var = None | ||
coord_var = get_coords_by_name(self._ds, axis) | ||
if coord_var.size == 1: | ||
raise ValueError( | ||
f"Coordinate '{coord_var}' is a singleton and cannot be used." | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ignore singleton coordinates because we cannot generate bounds for a singleton coordinates yet (e.g., "height" with CF-compliant metadata mapping it to a Z axis).
regridder.grid
property to return desired coordinates
Description
This pull request focuses on improving coordinate handling and adding support for curvilinear datasets, specifically to ensure horizontal regridding capabilities with correct grid generation. The most important changes include the addition of the
get_coords_by_name
function, updates to theaxis
module to use type hinting, and new test cases for curvilinear datasets.Enhancements to coordinate handling:
get_coords_by_name
function to retrieve coordinate variables based on their names, which is useful for non-CF-compliant coordinates and axes with multiple sets of coordinates. (xcdat/axis.py
)axis
module to use modern type hinting and simplified type definitions. (xcdat/axis.py
) [1] [2] [3] [4] [5]Support for curvilinear datasets:
grid()
property inaccessor.py
to handle curvilinear grids with correct bound generation.generate_curvilinear_dataset
to create synthetic curvilinear datasets with CF-compliant metadata. (tests/fixtures.py
)tests/test_regrid.py
)Documentation and imports:
get_coords_by_name
function. (docs/api.rst
)get_coords_by_name
in various files. (xcdat/__init__.py
,tests/test_axis.py
,tests/fixtures.py
) [1] [2] [3]docs/demos/1-25-23-cwss-seminar/introduction-to-xcdat.ipynb
)xcdat.tutorial.open_dataset()
API docstring formattingChecklist
If applicable: