我有一个这样的数据框:
Customer Id Start Date End Date Count
1403120020 2014-03-13 2014-03-17 38.0
1403120020 2014-03-18 2014-04-16 283.0
1403120020 2014-04-17 2014-04-25 100.0
1403120020 2014-04-26 2014-05-15 50.0
1812040169 2018-12-07 2018-12-19 122.0
1812040169 2018-12-19 2018-12-20 10.0
1812040169 2018-12-21 2019-01-18 365.0
在这里,对于单个客户,我在特定月份内有多个开始日期,该月的结束日期之一在下个月。我想以以下方式为客户提供一个开始日期和一个结束日期,并进行总计:
Customer Id Start Date End Date Count
1403120020 2014-03-13 2014-04-16 321
1403120020 2014-04-17 2014-05-15 150.0
1812040169 2018-12-07 2019-1-18 497
答案 0 :(得分:3)
使用groupby.agg
:
df = (df.groupby('Customer_Id').agg({'Start_Date':'first', 'End_Date':'last', 'Count':'sum'})
.reset_index())
print(df)
Customer_Id Start_Date End_Date Count
0 1403120020 2014-03-13 2014-04-16 321.0
1 1812040169 2018-12-07 2019-01-18 497.0
编辑:
df['grp'] = df['Start_Date'].dt.month
df = (df.groupby(['Customer_Id','grp'])
.agg({'Start_Date':'first', 'End_Date':'last', 'Count':'sum'})
.reset_index().drop('grp', axis=1))
print(df)
Customer_Id Start_Date End_Date Count
0 1403120020 2014-03-13 2014-04-16 321.0
1 1403120020 2014-04-17 2014-05-15 150.0
2 1812040169 2018-12-07 2019-01-18 497.0