Question

有一种方法可以将Serie（熊猫）转换为DataArray（xarray），并保持维度？的值的当前顺序。

一维以上时，会发生此问题。例如：

In [1]: import xarray as xr

In [2]: coord1 = ("city",["Las Perdices","Córdoba","General Deheza"])
      : coord2 = ("year",[2018,2019])

In [3]: da = xr.DataArray([[10,20],[30,40],[50,60]],coords=[coord1,coord2])
      : da

Out[3]:
<xarray.DataArray (city: 3, year: 2)>
array([[10, 20],
       [30, 40],
       [50, 60]])
Coordinates:
  * city     (city) <U14 'Las Perdices' 'Córdoba' 'General Deheza'
  * year     (year) int32 2018 2019

In [4]: se = da.to_series()
      : se

Out[4]:
city            year
Las Perdices    2018    10
                2019    20
Córdoba         2018    30
                2019    40
General Deheza  2018    50
                2019    60
dtype: int32

In [5]: newArr = se.to_xarray()
      : newArr

Out[5]:
<xarray.DataArray (city: 3, year: 2)>
array([[30, 40],
       [50, 60],
       [10, 20]])
Coordinates:
  * city     (city) object 'Córdoba' 'General Deheza' 'Las Perdices'
  * year     (year) int64 2018 2019

在此示例中，维度“城市”具有以下值：

'Las Perdices' 'Córdoba' 'General Deheza'

因此，在运行.to_xarray（）（从serie转换为xarray）之后，值的顺序更改为：

'Córdoba' 'General Deheza' 'Las Perdices'

有什么办法可以防止这种行为？

Answer 1

熊猫中的许多重塑操作都会导致索引被排序，包括to_xarray，例如unstack：

In [5]: se.unstack()
Out[5]:
year            2018  2019
city
Córdoba           30    40
General Deheza    50    60
Las Perdices      10    20

维护排序的唯一方法是对城市列表使用CategoricalIndex：

In [2]: se = pd.Series(
   ...:     np.arange(10, 70, 10),
   ...:     index=pd.MultiIndex.from_product([
   ...:         pd.Categorical(
   ...:             ["Las Perdices","Córdoba","General Deheza"],
   ...:             categories=["Las Perdices","Córdoba","General Deheza"],
   ...:             ordered=True),
   ...:         [2018, 2019]],
   ...:         names=['city', 'year']))

这将明确保留排序顺序：

In [3]: se.sort_index()
Out[3]:
city            year
Las Perdices    2018    10
                2019    20
Córdoba         2018    30
                2019    40
General Deheza  2018    50
                2019    60
dtype: int64

现在您的索引顺序保留在xarray中：

In [4]: se.to_xarray()
Out[4]:
<xarray.DataArray (city: 3, year: 2)>
array([[10, 20],
       [30, 40],
       [50, 60]])
Coordinates:
  * city     (city) object 'Las Perdices' 'Córdoba' 'General Deheza'
  * year     (year) int64 2018 2019

Categorical data上的pandas文档提供了有关创建分类系列和索引的有用提示，并提供了使用说明。

如果您希望与xarray进行一次往返，只需在示例中将pd.Categorical()位放在创建city坐标的位置即可。

将Serie（pandas）转换为DataArray（xarray），并保持维度值的当前顺序

1 个答案: