按周和月汇总熊猫数据框

时间:2018-06-29 10:21:47

标签: python pandas dataframe aggregate

下面的数据框包含有关 启动程序 的信息,其中只有一列日期:

                        indate
           2016-12-19 12:16:00
           2016-12-19 12:21:00
           2016-12-20 12:32:00
           2016-12-20 12:34:00
           2016-12-20 12:40:00
           2016-12-21 13:47:01
           2016-12-21 14:27:01
           2016-12-21 14:43:00
           2016-12-21 15:02:00
           2016-12-22 15:16:00
           2016-12-22 15:22:00
           2016-12-22 15:25:00
           2016-12-22 15:22:00
           2016-12-22 15:25:00
                ........

我想汇总以获取 每天的发射次数

          indate  number of launchings
      2016-12-19             2
      2016-12-20             3
      2016-12-21             4
      2016-12-22             5
                   ...

然后还可以获取发布日期的星期,发布日期和编号。发射数量:

              week                  day      number of launchings
    2016-12-19 - 2016-12-25         Mo                2
    2016-12-19 - 2016-12-25         Tu                3
    2016-12-19 - 2016-12-25         We                4
    2016-12-19 - 2016-12-25         Th                5
    2016-12-19 - 2016-12-25         Fr                n
    2016-12-19 - 2016-12-25         Su                n
    2016-12-19 - 2016-12-25         Sa                n
    2016-12-26 - 2017-01-01         Mo                n
    2016-12-26 - 2017-01-01         Tu                n
    2016-12-26 - 2017-01-01         We                n
              ....

我没有在Pandas中找到任何特殊方法来执行此操作。

1 个答案:

答案 0 :(得分:2)

首先将resample按天使用size,然后按strftime提取天的名称,最后持续数周,使用transform first每星期重采样和last值:

df1 = df.resample('d', on='indate').size().reset_index(name='number of launchings')

df1['day'] = df1['indate'].dt.strftime('%a')
g = df1.resample('W', on='indate')['indate']
df1['week'] = g.transform('first').dt.strftime('%Y-%m-%d') + ' - ' + 
              g.transform('last').dt.strftime('%Y-%m-%d')

另一种解决方案是使用Grouper

df1 = (df.groupby(pd.Grouper(freq='d', key='indate'))
        .size()
        .reset_index(name='number of launchings'))

df1['day'] = df1['indate'].dt.strftime('%a')
g = df1.groupby(pd.Grouper(freq='W', key='indate'))['indate']
df1['week'] = (g.transform('first').dt.strftime('%Y-%m-%d') + ' - ' + 
               g.transform('last').dt.strftime('%Y-%m-%d'))
print (df1)

       indate  number of launchings  day                     week
0  2016-12-19                     2  Mon  2016-12-19 - 2016-12-25
1  2016-12-20                     3  Tue  2016-12-19 - 2016-12-25
2  2016-12-21                     4  Wed  2016-12-19 - 2016-12-25
3  2016-12-22                     5  Thu  2016-12-19 - 2016-12-25
4  2016-12-23                     1  Fri  2016-12-19 - 2016-12-25
5  2016-12-24                     1  Sat  2016-12-19 - 2016-12-25
6  2016-12-25                     1  Sun  2016-12-19 - 2016-12-25
7  2016-12-26                     1  Mon  2016-12-26 - 2017-01-01
8  2016-12-27                     1  Tue  2016-12-26 - 2017-01-01
9  2016-12-28                     1  Wed  2016-12-26 - 2017-01-01
10 2016-12-29                     1  Thu  2016-12-26 - 2017-01-01
11 2016-12-30                     1  Fri  2016-12-26 - 2017-01-01
12 2016-12-31                     1  Sat  2016-12-26 - 2017-01-01
13 2017-01-01                     1  Sun  2016-12-26 - 2017-01-01

样本数据

print (df)
                indate
0  2016-12-19 12:16:00
1  2016-12-19 12:21:00
2  2016-12-20 12:32:00
3  2016-12-20 12:34:00
4  2016-12-20 12:40:00
5  2016-12-21 13:47:01
6  2016-12-21 14:27:01
7  2016-12-21 14:43:00
8  2016-12-21 15:02:00
9  2016-12-22 15:16:00
10 2016-12-22 15:22:00
11 2016-12-22 15:25:00
12 2016-12-22 15:22:00
13 2016-12-22 15:25:00
14 2016-12-23 12:16:00
15 2016-12-24 12:21:00
16 2016-12-25 12:32:00
17 2016-12-26 12:34:00
18 2016-12-27 12:40:00
19 2016-12-28 13:47:01
20 2016-12-29 14:27:01
21 2016-12-30 14:43:00
22 2016-12-31 15:02:00
23 2017-01-01 15:16:00