使用Pandas按天间隔对数据系列进行分组

时间:2018-08-17 13:02:08

标签: python pandas

我必须按季节进行一些数据分析。

从2015年底到2017年下半年,我大约有一个半年的小时测量值。我要做的是按季节对这些数据进行排序。

以下是我正在处理的数据的示例:

Date,Year,Month,Day,Day week,Hour,Holiday,Week Day,Impulse,Power (kW),Temperature (C)
04/12/2015,2015,12,4,6,18,0,6,2968,1781,16.2
04/12/2015,2015,12,4,6,19,0,6,2437,1462,16.2
19/04/2016,2016,4,19,3,3,0,3,1348,809,14.4
19/04/2016,2016,4,19,3,4,0,3,1353,812,14.1
11/06/2016,2016,6,11,7,19,0,7,1395,837,18.8
11/06/2016,2016,6,11,7,20,0,7,1370,822,17.4
11/06/2016,2016,6,11,7,21,0,7,1364,818,17
11/06/2016,2016,6,11,7,22,0,7,1433,860,17.5
04/12/2016,2016,12,4,1,17,0,1,1425,855,14.6
04/12/2016,2016,12,4,1,18,0,1,1466,880,14.4
07/03/2017,2017,3,7,3,14,0,3,3668,2201,14.2
07/03/2017,2017,3,7,3,15,0,3,3666,2200,14
24/04/2017,2017,4,24,2,5,0,2,1347,808,11.4
24/04/2017,2017,4,24,2,6,0,2,1816,1090,11.5
24/04/2017,2017,4,24,2,7,0,2,2918,1751,12.4
15/06/2017,2017,6,15,5,13,1,1,2590,1554,22.5
15/06/2017,2017,6,15,5,14,1,1,2629,1577,22.5
15/06/2017,2017,6,15,5,15,1,1,2656,1594,22.1
15/11/2017,2017,11,15,4,13,0,4,3765,2259,15.6
15/11/2017,2017,11,15,4,14,0,4,3873,2324,15.9
15/11/2017,2017,11,15,4,15,0,4,3905,2343,15.8
15/11/2017,2017,11,15,4,16,0,4,3861,2317,15.3

如您所见,我拥有三个不同年份的数据。

我当时想做的是使用pd.to_datetime()命令转换第一列。然后根据天/月对行进行分组,而不考虑年份(以dd / mm为间隔)(如果冬天从21/12到21/03,则创建一个新的数据框,其中所有行的日期都为包括在此间隔中,而与年份无关),但是我不能通过忽略年份来做到这一点(这会使情况变得更加复杂)。

编辑: 所需的输出将是:

df_spring
Date,Year,Month,Day,Day week,Hour,Holiday,Week Day,Impulse,Power (kW),Temperature (C)
19/04/2016,2016,4,19,3,3,0,3,1348,809,14.4
19/04/2016,2016,4,19,3,4,0,3,1353,812,14.1
07/03/2017,2017,3,7,3,14,0,3,3668,2201,14.2
07/03/2017,2017,3,7,3,15,0,3,3666,2200,14
24/04/2017,2017,4,24,2,5,0,2,1347,808,11.4
24/04/2017,2017,4,24,2,6,0,2,1816,1090,11.5
24/04/2017,2017,4,24,2,7,0,2,2918,1751,12.4

df_autumn
Date,Year,Month,Day,Day week,Hour,Holiday,Week Day,Impulse,Power (kW),Temperature (C)
04/12/2015,2015,12,4,6,18,0,6,2968,1781,16.2
04/12/2015,2015,12,4,6,19,0,6,2437,1462,16.2
04/12/2016,2016,12,4,1,17,0,1,1425,855,14.6
04/12/2016,2016,12,4,1,18,0,1,1466,880,14.4
15/11/2017,2017,11,15,4,13,0,4,3765,2259,15.6
15/11/2017,2017,11,15,4,14,0,4,3873,2324,15.9
15/11/2017,2017,11,15,4,15,0,4,3905,2343,15.8
15/11/2017,2017,11,15,4,16,0,4,3861,2317,15.3

在剩余的季节中依此类推。

2 个答案:

答案 0 :(得分:0)

使用DayMonth列过滤冬季相关的行来定义每个季节:

df_winter = df.loc[((df['Day'] >= 21) & (df['Month'] == 12)) | (df['Month'] == 1) | (df['Month'] == 2) | ((df['Day'] <= 21) & (df['Month'] == 3))]

答案 1 :(得分:0)

您可以简单地通过month.isin()过滤数据框

# spring
df[df['Month'].isin([3,4])]

    Date    Year    Month   Day Day week    Hour    Holiday Week Day    Impulse Power (kW)  Temperature (C)
2   19/04/2016  2016    4   19  3   3   0   3   1348    809 14.4
3   19/04/2016  2016    4   19  3   4   0   3   1353    812 14.1
10  07/03/2017  2017    3   7   3   14  0   3   3668    2201    14.2
11  07/03/2017  2017    3   7   3   15  0   3   3666    2200    14.0
12  24/04/2017  2017    4   24  2   5   0   2   1347    808 11.4
13  24/04/2017  2017    4   24  2   6   0   2   1816    1090    11.5
14  24/04/2017  2017    4   24  2   7   0   2   2918    1751    12.4


# autumn

df[df['Month'].isin([11,12])]

Date    Year    Month   Day Day week    Hour    Holiday Week Day    Impulse Power (kW)  Temperature (C)
0   04/12/2015  2015    12  4   6   18  0   6   2968    1781    16.2
1   04/12/2015  2015    12  4   6   19  0   6   2437    1462    16.2
8   04/12/2016  2016    12  4   1   17  0   1   1425    855 14.6
9   04/12/2016  2016    12  4   1   18  0   1   1466    880 14.4
18  15/11/2017  2017    11  15  4   13  0   4   3765    2259    15.6
19  15/11/2017  2017    11  15  4   14  0   4   3873    2324    15.9
20  15/11/2017  2017    11  15  4   15  0   4   3905    2343    15.8
21  15/11/2017  2017    11  15  4   16  0   4   3861    2317    15.3