我有一个熊猫数据框,其结构类似于:
condition=unknown
另一个具有这样结构的数据框:
Application | Account | Application_Date
1 | 444444 | 10/01/2018
2 | 444444 | 09/01/2018
3 | 555555 | 10/01/2018
仅在Case_date大于或等于Application_Date的情况下,我要检查第一个数据框中的Account是否存在于第二个数据框中,并在第一个数据框中的列中获取输出以及案例编号,例如:
Case | Account | Case_Date
1 | 444444 | 09/01/2018
2 | 444444 | 11/01/2018
3 | 444444 | 10/01/2018
4 | 555555 | 07/01/2018
能请你指教吗?
谢谢!
答案 0 :(得分:1)
这是一个令人费解的解决方案,但是它可以带您到达那里:
Application
和Account
上进行分组,并获得唯一的案件Y
分配给非null值(找到大小写的地方):>>> df1
Application Account Application_Date
0 1 444444 10/01/2018
1 2 444444 09/01/2018
2 3 555555 10/01/2018
>>> df2
Case Account Case_Date
0 1 444444 09/01/2018
1 2 444444 11/01/2018
2 3 444444 10/01/2018
3 4 555555 07/01/2018
# set to datetime
df1['Application_Date'] = pd.to_datetime(df1['Application_Date'])
df2['Case_Date'] = pd.to_datetime(df2['Case_Date'])
# first merge
merged = df2.merge(df1)
# loc and groupby
cases = (merged.loc[merged['Case_Date'] >= merged['Application_Date']]
.groupby(['Account','Application'])['Case']
.unique())
# merge back
final = (cases.to_frame('Case_Number').merge(df1,left_index=True,
right_on=['Account', 'Application'],
how='outer')
# Following line is just to re-adjust column order
[['Application','Account','Application_Date','Case_Number']])
# assign Y and N
final['Case_Exists'] = final.Case_Number.notnull().map({True:'Y',False:'N'})
>>> final
Application Account Application_Date Case_Number Case_Exists
0 1 444444 2018-10-01 [2, 3] Y
1 2 444444 2018-09-01 [1, 2, 3] Y
2 3 555555 2018-10-01 NaN N