我正在尝试执行以下操作:
给出df1中的一行,如果str(row ['code'])在df2 ['code']的任何行中,那么我希望df2 ['lamer_url_1']和df2 ['shopee_url_1]中的所有行']取自df1的相应值。 然后继续进行df1 ['code'] ...的下一行...
'''
==============
初始表格:
df1
code lamer_url_1 shopee_url_1
0 L61B18H089 b a
1 L61S19H014 e d
2 L61S19H015 z y
df2
code lamer_url_1 shopee_url_1 lamer_url_2 shopee_url_2
0 L61B18H089-F1424 NaN NaN NaN NaN
1 L61S19H014-S1500 NaN NaN NaN NaN
2 L61B18H089-F1424 NaN NaN NaN NaN
==============
预期输出:
df2
code lamer_url_1 shopee_url_1 lamer_url_2 shopee_url_2
0 L61B18H089-F1424 b a NaN NaN
1 L61S19H014-S1500 e d NaN NaN
2 L61B18H089-F1424 b a NaN NaN
'''
答案 0 :(得分:1)
我假设来自“ df2”的“代码”的共同部分是“-”之前的字符。我还假设从“ df1”开始,我们想要“ lamer_url_1”,“ shopee_url_1”,从“ df2”开始,我们想要“ lamer_url_2”,“ shopee_url_2”(如果我输入错了,请在注释中纠正我,以便我可以完善代码):
df1.set_index(df1['code'], inplace=True)
df2.set_index(df2['code'].apply(lambda x: x.split('-')[0]), inplace=True)
df2.index.names = ['code_join']
df3 = pd.merge(df2[['code', 'lamer_url_2', 'shopee_url_2']],
df1[['lamer_url_1', 'shopee_url_1']],
left_index=True, right_index=True)