熊猫:groupby均值后的条件切片

时间:2019-02-28 12:07:46

标签: python pandas dataframe pandas-groupby

必须先问过这个问题,但我找不到解决方案-如果重复,对不起!我对一个具有日期时间索引(称为“时间”)的数据帧进行了逐月分组,并应用了平均值df = df.groupby([df.index.year, df.index.month]).mean(),该变量给出了以下内容:

               0
time    time    

2000    1   0.245888
    2   0.579210
    3   0.519101
    4   1.724130
    5   2.909998
    6   6.754044
    7   5.654214
    8   0.972300
    9   0.207180
    10  -0.608038
    11  -2.271975
    12  -9.407542
2001    1   -4.206406
    2   0.339256
    3   2.447668
    4   2.159161
    5   2.014476
    6   4.495522
    7   2.130116
    8   4.280266
    9   2.329842
    10  -1.560461
    11  -2.232722
    12  -2.182392

它有2个索引,分别称为“时间”,分别对应年和月。现在,我想按月分片(用month = 1或从month = 6到8等创建一个新的数据框),但是我不确定如何对此进行操作。

我想做类似的事情:

df.loc[(df.index.month == 1)]
df.loc[(df.index.month == 1) | (df.index.month == 2)]
df.loc[(df.index.month >= 1) & (df.index.month <= 6)]

这样做可以得到AttributeError: 'MultiIndex' object has no attribute 'month'(可以理解)。我尝试使用df.rename(['year', 'month'])重命名索引,从而得到AttributeError list object is not callable。我以为我可能需要重置索引,使其再次以日期时间格式显示,但是df.reset_index()给出了ValueError cannot insert time

df.index给出:

MultiIndex(levels=[[2000, 2001], [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]],
           codes=[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]],
           names=['time', 'time'])

修改- 1.编辑说我想对切片进行更灵活的操作,而不仅仅是获得特定的月份。 2.原始df如下:

             0
time    
2000-01-01  1.427332
2000-01-02  1.468405
2000-01-03  1.525916
2000-01-04  1.399915
2000-01-05  1.192117
2000-01-06  1.191234
2000-01-07  1.431109
2000-01-08  1.687709
2000-01-09  1.876527
2000-01-10  1.871062
2000-01-11  1.759002
2000-01-12  1.553009
2000-01-13  1.336487
2000-01-14  1.105376
2000-01-15  0.732866
2000-01-16  0.259119
2000-01-17  -0.003458
2000-01-18  -0.180170
2000-01-19  -0.275862
2000-01-20  -0.580456
2000-01-21  -0.800049
2000-01-22  -0.990277
2000-01-23  -1.139482
2000-01-24  -1.264528
2000-01-25  -1.378858
2000-01-26  -1.516954
2000-01-27  -1.394427
2000-01-28  -1.371782
2000-01-29  -1.337087
2000-01-30  -1.120146
... ...
2001-12-02  -4.521928
2001-12-03  -4.499393
2001-12-04  -4.425628
2001-12-05  -4.270720
2001-12-06  -4.286983
2001-12-07  -4.141410
2001-12-08  -3.886460
2001-12-09  -4.008633
2001-12-10  -3.772096
2001-12-11  -3.261724
2001-12-12  -3.271314
2001-12-13  -3.306891
2001-12-14  -3.111070
2001-12-15  -2.694092
2001-12-16  -2.063524
2001-12-17  -1.593670
2001-12-18  -1.279061
2001-12-19  -0.957185
2001-12-20  -0.616801
2001-12-21  -0.316757
2001-12-22  -0.292797
2001-12-23  -0.226818
2001-12-24  -0.196901
2001-12-25  -0.237203
2001-12-26  -0.221769
2001-12-27  -0.167911
2001-12-28  -0.050808
2001-12-29  -0.044765
2001-12-30  -0.384740
2001-12-31  -0.913277
730 rows × 1 columns

1 个答案:

答案 0 :(得分:3)

首先可以使用rename

df = df.groupby([df.index.year.rename('year'), 
                 df.index.month.rename('month')]).mean()

或为集合MultiIndexrename_axis

df = df.groupby([df.index.year, df.index.month]).mean().rename_axis(('year','month'))

要选择使用DataFrame.xs

df1 = df.xs(1, axis=0, level=1)

如果您的解决方案中需要过滤器,则需要get_level_values来选择第二级:

df.loc[(df.index.get_level_values(1) == 1)]