我有一个df,其中包含每日数据和每天的某些水平:
date | value1 | value2 | level
2020-01-01 | 1 | 2 | "a"
2020-01-01 | 3 | 10 | "b"
2020-01-01 | 2 | 3 | "c"
2020-01-02 | 1 | 2 | "a"
2020-01-02 | 3 | 10 | "b"
2020-01-02 | 2 | 3 | "c"
... | ... | ... | ...
2021-02-01 | 10 | 1 | "a"
2021-02-01 | 8 | 4 | "b"
2021-02-01 | 1 | 5 | "c"
2021-02-03 | 10 | 1 | "a"
2021-02-03 | 8 | 4 | "b"
2021-02-03 | 1 | 5 | "c"
在保持每日行的同时,我需要在新列中每个月的value1和value2之和,例如:
date | value1 | value2 | level | value1_permonth | value2_permonth
2020-01-01 | 1 | 2 | "a" | 12 | 30
2020-01-01 | 3 | 10 | "b" | 12 | 30
2020-01-01 | 2 | 3 | "c" | 12 | 30
2020-01-02 | 1 | 2 | "a" | 12 | 30
2020-01-02 | 3 | 10 | "b" | 12 | 30
2020-01-02 | 2 | 3 | "c" | 12 | 30
... | ... | ... | ... | ... | ...
2021-02-01 | 10 | 1 | "a" | 38 | 20
2021-02-01 | 8 | 4 | "b" | 38 | 20
2021-02-01 | 1 | 5 | "c" | 38 | 20
2021-02-03 | 10 | 1 | "a" | 38 | 20
2021-02-03 | 8 | 4 | "b" | 38 | 20
2021-02-03 | 1 | 5 | "c" | 38 | 20
我该如何用熊猫呢?
答案 0 :(得分:1)
将Grouper
与GroupBy.transform
一起用于由汇总值填充的新列:
cols = ['value1','value2']
df1 = df.groupby(pd.Grouper(freq='MS', key='date'))[cols].transform('sum')
cols = ['value1','value2']
df1 = df.resample('MS', on='date')[cols].transform('sum')
或者使用Series.dt.to_period
传递给groupby
的月度周期:
cols = ['value1','value2']
df1 = df.groupby(df['date'].dt.to_period('m'))[cols].transform('sum')
print (df1)
df2 = df.join(df1.add_suffix('_permonth'))
print (df2)
date value1 value2 level value1_permonth value2_permonth
0 2020-01-01 1 2 a 12 30
1 2020-01-01 3 10 b 12 30
2 2020-01-01 2 3 c 12 30
3 2020-01-02 1 2 a 12 30
4 2020-01-02 3 10 b 12 30
5 2020-01-02 2 3 c 12 30
6 2021-02-01 10 1 a 38 20
7 2021-02-01 8 4 b 38 20
8 2021-02-01 1 5 c 38 20
9 2021-02-03 10 1 a 38 20
10 2021-02-03 8 4 b 38 20
11 2021-02-03 1 5 c 38 20