有2个数据帧:
print df1 userid reg_date 1 2015-07-21 2 2015-07-11 3 2015-07-14 print df2 userid date status amount 1 2015-07-22 CHARGED 11.68 1 2015-07-29 CHARGED 21.4 2 2015-07-13 CHARGED 18.98 2 2015-07-15 DECLINED 10.96
需要来自df1的每个用户ID在df2中查找总和(金额),其中status =“CHARGED”和reg_date + 7> date
# result userid amount 1 11.68 2 18.98 3 0
我以这种方式构建解决方案。 但是这样,如果在df2中没有满足条件的行,则UserId将不返回任何内容(需要返回0)。
import pandas as pd
from datetime import timedelta
df1 = pd.read_csv('Task2_data1.csv', sep=',',parse_dates=['reg_date'])
df2 = pd.read_csv('Task2_data2.csv', sep=',',parse_dates=['date'])
df2['amount'] = df2['amount'].replace(',','.', regex=True).astype(float)
df3 = pd.merge(df1, df2, how='outer', on=['userid', 'userid'])
df3 = df3[(df3.status == 'CHARGED') &
(df3.reg_date + timedelta(days=7)>df3.date)]
print df3.groupby(['userid'])['amount'].sum()
有没有其他方法可以做到这一点?
答案 0 :(得分:1)
使用
In [4974]: dff = df2.merge(df1)
In [4975]: (dff[dff['status'].eq('CHARGED') & (dff['date']-dff['reg_date']).dt.days.le(7)]
.groupby('userid')['amount'].sum()
.reindex(df1['userid'].unique(), fill_value=0)
.reset_index())
Out[4975]:
userid amount
0 1 11.68
1 2 18.98
2 3 0.00