Question

我有每隔六个小时记录一次的几十年的空间可变风数据。我需要在每个六个小时的时间间隔中平均2十年的数据，因此我得到了365 * 4个时间步长。数据为netcdf格式。

数据如下：

import xarray as xr
filename = 'V-01011999-01012019.nc'
ds = xr.open_dataset(filename)

print(ds)
<xarray.Dataset>
Dimensions:  (lat: 8, lon: 7, time: 29221)
Coordinates:
  * lat      (lat) float32 -2.5 -5.0 -7.5 -10.0 -12.5 -15.0 -17.5 -20.0
  * lon      (lon) float32 130.0 132.5 135.0 137.5 140.0 142.5 145.0
  * time     (time) datetime64[ns] 1999-01-01 1999-01-01T06:00:00 .. 2019-01-01
Data variables:
vwnd     (time, lat, lon) float32 ...

#remove feb 29 from records
ds = ds.sel(time=~((ds.time.dt.month == 2) & (ds.time.dt.day == 29)))

我已经能够按一年中的每一天进行分组，从而获得一年中第二个十年的平均值。

tsavg = ds.groupby('time.dayofyear').mean('time')

print(tsavg)
<xarray.Dataset>
Dimensions:    (dayofyear: 366, lat: 8, lon: 7)
Coordinates:
  * lat        (lat) float32 -2.5 -5.0 -7.5 -10.0 -12.5 -15.0 -17.5 -20.0
  * lon        (lon) float32 130.0 132.5 135.0 137.5 140.0 142.5 145.0
  * dayofyear  (dayofyear) int64 1 2 3 4 5 6 7 8 ... 360 361 362 363 364 365 366
Data variables:
    vwnd       (dayofyear, lat, lon) float32 -2.61605 -1.49012 ... -0.959997

我真正想要的是一个长度为365 * 4（一天中的时间间隔为4 x 6小时）的时间坐标，每个时间步都是该时间步过去20年的平均值。另外，由于某种原因，即使我删除了2月29日，tsavg.dayofyear的长度仍然是366。我无法应用或遵循this post的答案。我已经广泛研究了groupby资源，并尝试了很多东西，但我无法弄清楚。我正在寻找编码方面的帮助。

Answer 1

实际上，没有很好的文档记录方式。另请注意，dayofyear may not be exactly what you expect it to be。

代替能够在多个级别上使用groupby（例如，有关如何执行与您在熊猫中所要求的内容类似的操作，请参见this answer），这在xarray中尚不可用，解决此类问题的合理干净方法是为分组定义一个新坐标，该坐标代表数据集中每次的“一年中的时间”。

在您的情况下，您希望按“一年中的小时”（即匹配的月，日和小时）进行分组。为此，您可以创建一个字符串数组，该字符串基本上只是时间坐标中日期的字符串表示形式，以及所放下的年份：

ds['hourofyear'] = xr.DataArray(ds.indexes['time'].strftime('%m-%d %H'), coords=ds.time.coords)
result = ds.groupby('hourofyear').mean('time')

使用netcdf数据和python在6小时的时间步长上平均2十年的数据

1 个答案: