将每月数据从每日数据获取到新列,同时保留每日数据

时间:2020-09-23 10:37:51

标签: pandas date sum

我有一个df,其中包含每日数据和每天的某些水平:

date       | value1 | value2 | level
2020-01-01 | 1      | 2      | "a"   
2020-01-01 | 3      | 10     | "b"   
2020-01-01 | 2      | 3      | "c"   
2020-01-02 | 1      | 2      | "a"   
2020-01-02 | 3      | 10     | "b"  
2020-01-02 | 2      | 3      | "c"  
...        | ...    | ...    | ...   
2021-02-01 | 10     | 1      | "a"   
2021-02-01 | 8      | 4      | "b"   
2021-02-01 | 1      | 5      | "c"  
2021-02-03 | 10     | 1      | "a" 
2021-02-03 | 8      | 4      | "b"   
2021-02-03 | 1      | 5      | "c"   

在保持每日行的同时,我需要在新列中每个月的value1和value2之和,例如:

date       | value1 | value2 | level | value1_permonth | value2_permonth
2020-01-01 | 1      | 2      | "a"   | 12               | 30
2020-01-01 | 3      | 10     | "b"   | 12               | 30
2020-01-01 | 2      | 3      | "c"   | 12               | 30
2020-01-02 | 1      | 2      | "a"   | 12               | 30
2020-01-02 | 3      | 10     | "b"   | 12               | 30
2020-01-02 | 2      | 3      | "c"   | 12               | 30
...        | ...    | ...    | ...   | ...              | ...
2021-02-01 | 10     | 1      | "a"   | 38               | 20
2021-02-01 | 8      | 4      | "b"   | 38               | 20
2021-02-01 | 1      | 5      | "c"   | 38               | 20
2021-02-03 | 10     | 1      | "a"   | 38               | 20
2021-02-03 | 8      | 4      | "b"   | 38               | 20
2021-02-03 | 1      | 5      | "c"   | 38               | 20

我该如何用熊猫呢?

1 个答案:

答案 0 :(得分:1)

GrouperGroupBy.transform一起用于由汇总值填充的新列:

cols = ['value1','value2']
df1 = df.groupby(pd.Grouper(freq='MS', key='date'))[cols].transform('sum')

DataFrame.resample

cols = ['value1','value2']
df1 = df.resample('MS', on='date')[cols].transform('sum')

或者使用Series.dt.to_period传递给groupby的月度周期:

cols = ['value1','value2']
df1 = df.groupby(df['date'].dt.to_period('m'))[cols].transform('sum')
print (df1)

df2 = df.join(df1.add_suffix('_permonth'))

print (df2)
         date  value1  value2 level  value1_permonth  value2_permonth
0  2020-01-01       1       2     a               12               30
1  2020-01-01       3      10     b               12               30
2  2020-01-01       2       3     c               12               30
3  2020-01-02       1       2     a               12               30
4  2020-01-02       3      10     b               12               30
5  2020-01-02       2       3     c               12               30
6  2021-02-01      10       1     a               38               20
7  2021-02-01       8       4     b               38               20
8  2021-02-01       1       5     c               38               20
9  2021-02-03      10       1     a               38               20
10 2021-02-03       8       4     b               38               20
11 2021-02-03       1       5     c               38               20