Question

我正在努力重新索引多索引。示例代码如下：

rng = pd.date_range('01/01/2000 00:00', '31/12/2004 23:00', freq='H')
ts = pd.Series([h.dayofyear for h in rng], index=rng)
daygrouped = ts.groupby(lambda x: x.dayofyear)
daymean = daygrouped.mean()
myindex = np.arange(1,367)
myindex = np.concatenate((myindex[183:],myindex[:183]))
daymean.reindex(myindex)

给出（如预期的那样）：

184    184
185    185
186    186
187    187
...
180    180
181    181
182    182
183    183
Length: 366, dtype: int64

但如果我创建了一个多索引：

hourgrouped = ts.groupby([lambda x: x.dayofyear, lambda x: x.hour])
hourmean = hourgrouped.mean()
myindex = np.arange(1,367)
myindex = np.concatenate((myindex[183:],myindex[:183]))
hourmean.reindex(myindex, level=1)

我明白了：

1  1     1
   2     1
   3     1
   4     1
...
366  20    366
     21    366
     22    366
     23    366
Length: 8418, dtype: int64

关于我的错误的任何想法？ - 谢谢。

贝文

Answer 1

首先，您必须指定level=0而不是1（因为它是第一级 - ＆gt;基于零的索引 - > 0）。
但是，仍然存在一个问题：重建索引有效，但在MultiIndex的情况下似乎没有保留所提供索引的顺序：

In [54]: hourmean.reindex([5,4], level=0)
Out[54]:
4  0     4
   1     4
   2     4
   3     4
   4     4
   ...
   20    4
   21    4
   22    4
   23    4
5  0     5
   1     5
   2     5
   3     5
   4     5
   ...
   20    5
   21    5
   22    5
   23    5
dtype: int64

因此获取索引的新子集有效，但它与原始索引的顺序相同，而不是新提供的索引。
这可能是reindex在某个级别上的错误（我打开了一个问题来讨论这个问题：https://github.com/pydata/pandas/issues/8241）

现在重新编制索引系列的解决方案是创建一个MultiIndex并使用它重新索引（因此不是在指定的级别上，而是使用完整的索引，这确实保留了顺序）。使用MultiIndex.from_product已经myindex完成此操作非常简单：

In [79]: myindex2 = pd.MultiIndex.from_product([myindex, range(24)])

In [82]: hourmean.reindex(myindex2)
Out[82]:
184  0     184
     1     184
     2     184
     3     184
     4     184
     5     184
     6     184
     7     184
     8     184
     9     184
     10    184
     11    184
     12    184
     13    184
     14    184
...
183  9     183
     10    183
     11    183
     12    183
     13    183
     14    183
     15    183
     16    183
     17    183
     18    183
     19    183
     20    183
     21    183
     22    183
     23    183
Length: 8784, dtype: int64

重新索引多索引的问题

1 个答案: