用于多维坐标的xarray数据集swap_dims

时间:2018-12-07 18:30:03

标签: python-xarray

我想用多维坐标交换两个坐标维度,以便我可以按time.month进行分组,然后减去另一个数据集。

import xarray as xr

ds = xr.Dataset()

# DataArray indexed by 'init_time' and an offset, 'tau'
ds['tst'] = xr.DataArray(
    [[0, 1, 2], [3, 4, 5]],
    dims=('init_time', 'tau'),
    coords={
        'init_time': pd.date_range('2017-01-01', periods=2),
        'tau': pd.to_timedelta([1, 2, 3], unit='days')})

# multidimensional coordinate 'time'
ds.coords['time'] = ds['init_time'] + ds['tau']

ds.swap_dims({('init_time', 'tau'): 'time'})

ds

有点像这样的结果

clim = pd.Series([2], index=[1]).rename_axis('month')
df = ds.to_dataframe().reset_index()
df['month'] = df['time'].dt.month
df = (
    pd.DataFrame(
        df.set_index(['init_time', 'tau', 'time', 'month'])['tst']
        - clim))

df

1 个答案:

答案 0 :(得分:1)

与此相关的问题是,调暗将导致索引中的值重复。理想情况下,您将能够在多维坐标上进行分组。您目前可以执行此操作,但是功能不完整(例如,您无法执行ds.groupby('time.month').mean(dim='time'))。看起来可能正在创作中(请参见#324#2525)。

现在,我认为您有两种选择。您可以在熊猫中做到这一点:

df = ds.to_dataframe().reset_index()
monthly_mean = (
    df
    .groupby([df.other_dims, df.time.dt.month])
    .mean()[['tst']]
    .to_xarray())

clim = xr.DataArray([2], dims=['month'], coords=[[1]])

anom = monthly_mean.rename({'time': 'month'}) - clim

或者,您可以通过堆叠init_timetau将其保留在xarray中:

In [35]: stacked = ds.stack(obs=('init_time', 'tau'))

In [36]: stacked.coords['obs_num'] = ('obs', ), np.arange(len(stacked.obs))

In [37]: stacked.coords['time'] = ('obs', ), stacked.init_time + stacked.tau

In [38]: swapped = stacked.swap_dims({'obs': 'obs_num'})

In [39]: swapped
Out[39]:
<xarray.Dataset>
Dimensions:  (obs_num: 150)
Coordinates:
    time     (obs_num) datetime64[ns] 2017-01-01 2017-01-02 ... 2017-02-03
    obs      (obs_num) object (Timestamp('2017-01-01 00:00:00', freq='D'), Timedelta('0 days 00:00:00')) ... (Timestamp('2017-01-30 00:00:00', freq='D'), Timedelta('4 days 00:00:00'))
  * obs_num  (obs_num) int64 0 1 2 3 4 5 6 7 ... 142 143 144 145 146 147 148 149
Data variables:
    tst      (obs_num) int64 0 1 2 3 4 5 6 7 ... 142 143 144 145 146 147 148 149

In [47]: swapped.groupby(swapped.time.dt.month).mean(dim='obs_num')
Out[47]:
<xarray.Dataset>
Dimensions:  (month: 2)
Coordinates:
  * month    (month) int64 1 2
Data variables:
    tst      (month) float64 71.56 145.0