匹配Excel文件中的两列并获取其他列值 - Python Pandas

时间:2018-05-11 06:24:51

标签: python excel pandas

我有两个Excel文件,例如 wb1.xlsx wb2.xlsx

wb1.xlsx

adsl    svc_no    port_stat    adsl.1    Comparison result
2/17
2/24
2/27
2/33
2/37
3/12

wb2.xlsx

caller_id    status    adsl    Comparison result
n/a          SP        2/37    Not Match
n/a          RE        2/24    Not Match
n/a          SP        2/27    Match
n/a          SP        2/33    Not Match
n/a          SP        2/17    Match

我想要做的是将 wb2.xlsx 的adsl与 wb1.xlsx 匹配,并将其他值与其他列匹配。

我的预期输出是使用 wb2.xlsx

中的值更新 wb1.xlsx
adsl    svc_no    port_stat    adsl.1    Comparison result
2/17    n/a       SP           2/17      Match
2/24    n/a       RE           2/24      Not Match
2/27    n/a       SP           2/27      Match
2/33    n/a       SP           2/33      Not Match
2/37    n/a       SP           2/37      Not Match
3/12 

在搜索时,我能够检查pd.merge()是否能够进行匹配。

我试过这种方式:

result = pd.merge(df2, pri_df, on=['adsl', 'adsl'])

不幸的是,它会创建新列并且不会更新现有列。此外,它只获取它能够匹配的值并忽略其他行。

我还尝试获取 wb2.xlsx 中列的索引,并将其分配给列 wb1.xlsx ,但它只是字面上复制了它。

任何有用的参考资料。

2 个答案:

答案 0 :(得分:1)

我建议intersection使用combine_first

print (df1)
   adsl  svc_no  port_stat  adsl.1  Comparison result
0  2/17     NaN        NaN     NaN                NaN
1  2/24     NaN        NaN     NaN                NaN
2  2/27     NaN        NaN     NaN                NaN
3  2/33     NaN        NaN     NaN                NaN
4  2/37     NaN        NaN     NaN                NaN
5  3/12     NaN        NaN     NaN                NaN

print (df2)
   caller_id port_stat  adsl Comparison result
0        NaN        SP  2/37         Not Match
1        NaN        RE  2/24         Not Match
2        NaN        SP  2/27             Match
3        NaN        SP  2/33         Not Match
4        NaN        SP  2/17             Match
df2 = df2.rename(columns={'status':'port_stat'})
d = {'adsl.1': lambda x: x['adsl']}
df2 = df2.assign(**d)
print (df2)
   caller_id port_stat  adsl Comparison result adsl.1
0        NaN        SP  2/37         Not Match   2/37
1        NaN        RE  2/24         Not Match   2/24
2        NaN        SP  2/27             Match   2/27
3        NaN        SP  2/33         Not Match   2/33
4        NaN        SP  2/17             Match   2/17

df22 = df2[df2.columns.intersection(df1.columns)]
print (df22)
  port_stat  adsl Comparison result adsl.1
0        SP  2/37         Not Match   2/37
1        RE  2/24         Not Match   2/24
2        SP  2/27             Match   2/27
3        SP  2/33         Not Match   2/33
4        SP  2/17             Match   2/17

result = (df22.set_index('adsl')
              .combine_first(df1.set_index('adsl'))
              .reset_index()
              .reindex(columns=df1.columns))
print (result)
   adsl  svc_no port_stat adsl.1 Comparison result
0  2/17     NaN        SP   2/17             Match
1  2/24     NaN        RE   2/24         Not Match
2  2/27     NaN        SP   2/27             Match
3  2/33     NaN        SP   2/33         Not Match
4  2/37     NaN        SP   2/37         Not Match
5  3/12     NaN       NaN    NaN               NaN

答案 1 :(得分:1)

您可以使用pandas的isin功能:

result = df2.loc[df2['adsl'].isin(pri_df['adsl'])]

希望这对你有用。