我有两个数据框如下:
df1=
date company userDomain keyword pageViews category
2015-12-02 1-800 Contacts glasses.com SAN 2 STORAGE
2015-12-02 1-800 Contacts rhgi.com SAN 3 STORAGE
2015-12-02 100 Percent Fun dialogdesign.ca SAN 1 STORAGE
2015-12-02 101netlink 101netlink.com SAN 8 STORAGE
2015-12-02 1020 nlc.bc.ca SAN 4 STORAGE
df2=
Outcome Job Title Wave
Created Opportunity IT Manager 1.0
Closed Out Prospect/Contact Infrastructure Manager 1.0
NaN IT Director 1.0
NaN Supervisor Technical Support 1.0
Created Opportunity Director of IT Services 1.0
Wave Date userDomain
2016-02-16 15:07:05 dialogdesign.ca
2016-02-16 15:07:05 rhgi.com
2016-02-16 15:07:05 surefire.com
2016-02-16 15:07:05 isd2144.org
2016-02-16 15:07:05 nlc.bc.ca
我想在df1
中添加一个名为wave_date
的列,df2['Wave Date']
的日期为df1['userDomain']
df2['userDomain']
如果两个帧中的userDomain
都不匹配,则值应为nan
。对不起,如果这是一个非常天真的问题,但我对失败感到沮丧。我正在做的是这样的事情:
df1['wave_date'] = df1.apply(lambda x: df2['Wave Date'] if x['userDomain'].isin(df2['userDomain']) else np.nan)
我一直在
IndexError:(' userDomain','发生在索引日期') 你能指出正确的做法吗?非常感谢
答案 0 :(得分:1)
m = dict(zip(df2['userDomain'], df2['Wave Date']))
df1.assign(wave_date=df1.userDomain.map(m))
date company userDomain keyword pageViews category wave_date
0 2015-12-02 1-800 Contacts glasses.com SAN 2 STORAGE NaN
1 2015-12-02 1-800 Contacts rhgi.com SAN 3 STORAGE 2016-02-16 15:07:05
2 2015-12-02 100 Percent Fun dialogdesign.ca SAN 1 STORAGE 2016-02-16 15:07:05
3 2015-12-02 101netlink 101netlink.com SAN 8 STORAGE NaN
4 2015-12-02 1020 nlc.bc.ca SAN 4 STORAGE 2016-02-16 15:07:05