Python Pandas分组的月份和年份

时间:2019-10-30 07:34:22

标签: python-3.x pandas pandas-groupby

我有以下内容,

import pandas as pd

data = [['AAA','2019-01-01', 10], ['AAA','2019-01-02', 20],
        ['AAA','2019-02-01', 30], ['AAA','2019-02-02', 40],
        ['BBB','2019-01-01', 50], ['BBB','2019-01-02', 60],
        ['BBB','2019-02-01', 70],['BBB','2019-02-02', 80]]

dfx = pd.DataFrame(data, columns = ['NAME', 'TIMESTAMP','VALUE'])

  NAME   TIMESTAMP  VALUE
0  AAA  2019-01-01     10
1  AAA  2019-01-02     20
2  AAA  2019-02-01     30
3  AAA  2019-02-02     40
4  BBB  2019-01-01     50
5  BBB  2019-01-02     60
6  BBB  2019-02-01     70
7  BBB  2019-02-02     80

我正在尝试对“ TIMESTAMP”和“ NAME”列中按MONTH和YEAR分组的“ VALUE”列进行求和。

所以最终所需的输出是

  NAME   TIMESTAMP  VALUE SUM
0  AAA  2019-01-01     10  30
1  AAA  2019-01-02     20  30
2  AAA  2019-02-01     30  70
3  AAA  2019-02-02     40  70
4  BBB  2019-01-01     50 110
5  BBB  2019-01-02     60 110
6  BBB  2019-02-01     70 150
7  BBB  2019-02-02     80 150

如何获得此输出?

谢谢。

1 个答案:

答案 0 :(得分:3)

GroupBy.transformSeries.dt.year一起使用, Series.dt.month

d = pd.to_datetime(dfx['TIMESTAMP'])
dfx['SUM'] = (dfx.groupby(['NAME', 
                           dfx['TIMESTAMP'].dt.year, 
                           dfx['TIMESTAMP'].dt.month])['VALUE']
                 .transform('sum'))

或月周期为Series.dt.to_period

dfx['SUM'] = (dfx.groupby(['NAME', dfx['TIMESTAMP'].dt.to_period('m')])['VALUE']
                 .transform('sum'))

print (dfx)
  NAME   TIMESTAMP  VALUE  SUM
0  AAA  2019-01-01     10   30
1  AAA  2019-01-02     20   30
2  AAA  2019-02-01     30   70
3  AAA  2019-02-02     40   70
4  BBB  2019-01-01     50  110
5  BBB  2019-01-02     60  110
6  BBB  2019-02-01     70  150
7  BBB  2019-02-02     80  150