我有两个数据帧,如下所示:
df1 = pd.DataFrame({'serialNo':['aaaa','bbbb','cccc','ffff','aaaa','bbbb','aaaa'],
'Name':['Sayonti','Ruchi','Tony','Gowtam','Toffee','Tom','Sayonti'],
'testName': [4402, 3747 ,5555,8754,1234,9876,3602],
'moduleName': ['singing', 'dance','booze', 'vocals','drama','paint','singing'],
'endResult': ['WARNING', 'FAILED', 'WARNING', 'FAILED','WARNING','FAILED','WARNING'],
'Date':['2018-10-5','2018-10-6','2018-10-7','2018-10-8','2018-10-9','2018-10-10','2018-10-8']})`
df2 = pd.DataFrame({'serialNo':['aaaa','bbbb','aaaa','ffff','xyzy','aaaa'],
'Food':['Strawberry','Coke','Pepsi','Nuts','Apple','Candy'],
'Work': ['AP', 'TC','OD', 'PU','NO','PM'],
'Date':['2018-10-1','2018-10-6','2018-10-2','2018-10-3','2018-10-5','2018-10-10']
})
我想加入我可以通过这种方式实现的两个方面:
result = pd.merge(df1,df2,on=['serialNo','Date'],how='inner')
但是我想对此进行一些更改,使两个数据帧合并,对日期列进行一定的检查,即我要检查df2 ['Date']是否在df1 ['的三天内日期']。我不想添加单独的列来检查此条件,而希望即时执行,以便在加入时检查此条件。我该如何实现?
答案 0 :(得分:2)
您只能在serialNo
上加入,然后过滤加入后的结果:
df1['Date'] = pd.to_datetime(df1['Date'])
df2['Date'] = pd.to_datetime(df2['Date'])
result = pd.merge(df1,df2,on='serialNo' ,how='inner')
result = result[result.Date_x.sub(result.Date_y).abs().dt.days.le(3)]
根据下面的评论,删除.abs()
链接方法,并使用.between()
代替.le()
:
result = result[result.Date_x.sub(result.Date_y).dt.days.between(0,3)]