我想用多维坐标交换两个坐标维度,以便我可以按time.month进行分组,然后减去另一个数据集。
import xarray as xr
ds = xr.Dataset()
# DataArray indexed by 'init_time' and an offset, 'tau'
ds['tst'] = xr.DataArray(
[[0, 1, 2], [3, 4, 5]],
dims=('init_time', 'tau'),
coords={
'init_time': pd.date_range('2017-01-01', periods=2),
'tau': pd.to_timedelta([1, 2, 3], unit='days')})
# multidimensional coordinate 'time'
ds.coords['time'] = ds['init_time'] + ds['tau']
ds.swap_dims({('init_time', 'tau'): 'time'})
ds
有点像这样的结果
clim = pd.Series([2], index=[1]).rename_axis('month')
df = ds.to_dataframe().reset_index()
df['month'] = df['time'].dt.month
df = (
pd.DataFrame(
df.set_index(['init_time', 'tau', 'time', 'month'])['tst']
- clim))
df
答案 0 :(得分:1)
与此相关的问题是,调暗将导致索引中的值重复。理想情况下,您将能够在多维坐标上进行分组。您目前可以执行此操作,但是功能不完整(例如,您无法执行ds.groupby('time.month').mean(dim='time')
)。看起来可能正在创作中(请参见#324,#2525)。
现在,我认为您有两种选择。您可以在熊猫中做到这一点:
df = ds.to_dataframe().reset_index()
monthly_mean = (
df
.groupby([df.other_dims, df.time.dt.month])
.mean()[['tst']]
.to_xarray())
clim = xr.DataArray([2], dims=['month'], coords=[[1]])
anom = monthly_mean.rename({'time': 'month'}) - clim
或者,您可以通过堆叠init_time
和tau
将其保留在xarray中:
In [35]: stacked = ds.stack(obs=('init_time', 'tau'))
In [36]: stacked.coords['obs_num'] = ('obs', ), np.arange(len(stacked.obs))
In [37]: stacked.coords['time'] = ('obs', ), stacked.init_time + stacked.tau
In [38]: swapped = stacked.swap_dims({'obs': 'obs_num'})
In [39]: swapped
Out[39]:
<xarray.Dataset>
Dimensions: (obs_num: 150)
Coordinates:
time (obs_num) datetime64[ns] 2017-01-01 2017-01-02 ... 2017-02-03
obs (obs_num) object (Timestamp('2017-01-01 00:00:00', freq='D'), Timedelta('0 days 00:00:00')) ... (Timestamp('2017-01-30 00:00:00', freq='D'), Timedelta('4 days 00:00:00'))
* obs_num (obs_num) int64 0 1 2 3 4 5 6 7 ... 142 143 144 145 146 147 148 149
Data variables:
tst (obs_num) int64 0 1 2 3 4 5 6 7 ... 142 143 144 145 146 147 148 149
In [47]: swapped.groupby(swapped.time.dt.month).mean(dim='obs_num')
Out[47]:
<xarray.Dataset>
Dimensions: (month: 2)
Coordinates:
* month (month) int64 1 2
Data variables:
tst (month) float64 71.56 145.0