我有以下内容,
import pandas as pd
data = [['AAA','2019-01-01', 10], ['AAA','2019-01-02', 20],
['AAA','2019-02-01', 30], ['AAA','2019-02-02', 40],
['BBB','2019-01-01', 50], ['BBB','2019-01-02', 60],
['BBB','2019-02-01', 70],['BBB','2019-02-02', 80]]
dfx = pd.DataFrame(data, columns = ['NAME', 'TIMESTAMP','VALUE'])
NAME TIMESTAMP VALUE
0 AAA 2019-01-01 10
1 AAA 2019-01-02 20
2 AAA 2019-02-01 30
3 AAA 2019-02-02 40
4 BBB 2019-01-01 50
5 BBB 2019-01-02 60
6 BBB 2019-02-01 70
7 BBB 2019-02-02 80
我正在尝试对“ TIMESTAMP”和“ NAME”列中按MONTH和YEAR分组的“ VALUE”列进行求和。
所以最终所需的输出是
NAME TIMESTAMP VALUE SUM
0 AAA 2019-01-01 10 30
1 AAA 2019-01-02 20 30
2 AAA 2019-02-01 30 70
3 AAA 2019-02-02 40 70
4 BBB 2019-01-01 50 110
5 BBB 2019-01-02 60 110
6 BBB 2019-02-01 70 150
7 BBB 2019-02-02 80 150
如何获得此输出?
谢谢。
答案 0 :(得分:3)
将GroupBy.transform
与Series.dt.year
一起使用,
Series.dt.month
:
d = pd.to_datetime(dfx['TIMESTAMP'])
dfx['SUM'] = (dfx.groupby(['NAME',
dfx['TIMESTAMP'].dt.year,
dfx['TIMESTAMP'].dt.month])['VALUE']
.transform('sum'))
或月周期为Series.dt.to_period
:
dfx['SUM'] = (dfx.groupby(['NAME', dfx['TIMESTAMP'].dt.to_period('m')])['VALUE']
.transform('sum'))
print (dfx)
NAME TIMESTAMP VALUE SUM
0 AAA 2019-01-01 10 30
1 AAA 2019-01-02 20 30
2 AAA 2019-02-01 30 70
3 AAA 2019-02-02 40 70
4 BBB 2019-01-01 50 110
5 BBB 2019-01-02 60 110
6 BBB 2019-02-01 70 150
7 BBB 2019-02-02 80 150