在级别上对Pandas Multiindex重新编制索引

时间:2019-02-14 23:01:16

标签: python pandas

我有多个不同的系列数据,另存为Multiindex(2-level)pandas数据框。我想知道如何为多索引数据帧重新编制索引,以便获得两个现有索引之间所有(每小时)数据的索引。

这是我的数据框的一个示例:

                                   A     B     C     D
tick       act
2019-01-10 2019-01-09 20:00:00   5.0   5.0   5.0   5.0                                        
           2019-01-10 00:00:00  52.0  34.0   1.0   9.0
           2019-01-10 01:00:00  75.0  52.0  61.0   1.0
           2019-01-10 02:00:00  28.0  29.0  46.0  61.0
2019-01-16 2019-01-09 22:00:00  91.0  42.0   3.0  34.0
           2019-01-10 02:00:00   2.0  22.0  41.0  59.0
           2019-01-10 03:00:00  16.0   9.0  92.0  53.0

这就是我想要得到的:

tick       act
2019-01-10 2019-01-09 20:00:00   5.0   5.0   5.0   5.0
           2019-01-09 21:00:00   NaT   NaN   NaN   NaN   NaN
           2019-01-09 22:00:00   NaT   NaN   NaN   NaN   NaN
           2019-01-09 23:00:00   NaT   NaN   NaN   NaN   NaN
           2019-01-10 00:00:00  52.0  34.0   1.0   9.0
           2019-01-10 01:00:00  75.0  52.0  61.0   1.0
           2019-01-10 02:00:00  28.0  29.0  46.0  61.0
2019-01-16 2019-01-09 22:00:00  91.0  42.0   3.0  34.0
           2019-01-09 23:00:00   NaT   NaN   NaN   NaN   NaN
           2019-01-10 00:00:00   NaT   NaN   NaN   NaN   NaN
           2019-01-10 01:00:00   NaT   NaN   NaN   NaN   NaN
           2019-01-10 02:00:00   2.0  22.0  41.0  59.0
           2019-01-10 03:00:00  16.0   9.0  92.0  53.0

要记住的重要一点是,“ act”索引级别的日期范围不同(例如,在2019-01-10中,它的起始日期是2019-01-09 20:00:00,而在2019- 01-10 02:00:00,而对于2019-01-16,它始于2019-01-09 22:00:00,结束于2019-01-10 03:00:00)。

我主要感兴趣的是是否存在使用pandas方法的解决方案,而没有不必要的外部循环。

1 个答案:

答案 0 :(得分:1)

最初reset_index个数据。

d = df.reset_index()

d

         tick                 act     A     B     C     D
0  2019-01-10 2019-01-09 20:00:00   5.0   5.0   5.0   5.0
1  2019-01-10 2019-01-10 00:00:00  52.0  34.0   1.0   9.0
2  2019-01-10 2019-01-10 01:00:00  75.0  52.0  61.0   1.0
3  2019-01-10 2019-01-10 02:00:00  28.0  29.0  46.0  61.0
4  2019-01-16 2019-01-09 22:00:00  91.0  42.0   3.0  34.0
5  2019-01-16 2019-01-10 02:00:00   2.0  22.0  41.0  59.0
6  2019-01-16 2019-01-10 03:00:00  16.0   9.0  92.0  53.0

tick对数据进行分组,并将interpolate函数应用于每个组。

def interpolate(df):
    # generate new index
    new_index = pd.date_range(df.act.min(),df.act.max(),freq="h")
    # set `act` as index and unsampleing it to hours
    return df.set_index("act").reindex(new_index) 

d.groupby("tick").apply(interpolate)

它给出:

                                      tick     A     B     C     D
tick                                                              
2019-01-10 2019-01-09 20:00:00  2019-01-10   5.0   5.0   5.0   5.0
           2019-01-09 21:00:00         NaN   NaN   NaN   NaN   NaN
           2019-01-09 22:00:00         NaN   NaN   NaN   NaN   NaN
           2019-01-09 23:00:00         NaN   NaN   NaN   NaN   NaN
           2019-01-10 00:00:00  2019-01-10  52.0  34.0   1.0   9.0
           2019-01-10 01:00:00  2019-01-10  75.0  52.0  61.0   1.0
           2019-01-10 02:00:00  2019-01-10  28.0  29.0  46.0  61.0
2019-01-16 2019-01-09 22:00:00  2019-01-16  91.0  42.0   3.0  34.0
           2019-01-09 23:00:00         NaN   NaN   NaN   NaN   NaN
           2019-01-10 00:00:00         NaN   NaN   NaN   NaN   NaN
           2019-01-10 01:00:00         NaN   NaN   NaN   NaN   NaN
           2019-01-10 02:00:00  2019-01-16   2.0  22.0  41.0  59.0
           2019-01-10 03:00:00  2019-01-16  16.0   9.0  92.0  53.0