我有两个Excel文件,例如 wb1.xlsx 和 wb2.xlsx 。
wb1.xlsx
adsl svc_no port_stat adsl.1 Comparison result
2/17
2/24
2/27
2/33
2/37
3/12
wb2.xlsx
caller_id status adsl Comparison result
n/a SP 2/37 Not Match
n/a RE 2/24 Not Match
n/a SP 2/27 Match
n/a SP 2/33 Not Match
n/a SP 2/17 Match
我想要做的是将 wb2.xlsx 的adsl与 wb1.xlsx 匹配,并将其他值与其他列匹配。
我的预期输出是使用 wb2.xlsx
中的值更新 wb1.xlsxadsl svc_no port_stat adsl.1 Comparison result
2/17 n/a SP 2/17 Match
2/24 n/a RE 2/24 Not Match
2/27 n/a SP 2/27 Match
2/33 n/a SP 2/33 Not Match
2/37 n/a SP 2/37 Not Match
3/12
在搜索时,我能够检查pd.merge()
是否能够进行匹配。
我试过这种方式:
result = pd.merge(df2, pri_df, on=['adsl', 'adsl'])
不幸的是,它会创建新列并且不会更新现有列。此外,它只获取它能够匹配的值并忽略其他行。
我还尝试获取 wb2.xlsx 中列的索引,并将其分配给列 wb1.xlsx ,但它只是字面上复制了它。
任何有用的参考资料。
答案 0 :(得分:1)
我建议intersection
使用combine_first
:
print (df1)
adsl svc_no port_stat adsl.1 Comparison result
0 2/17 NaN NaN NaN NaN
1 2/24 NaN NaN NaN NaN
2 2/27 NaN NaN NaN NaN
3 2/33 NaN NaN NaN NaN
4 2/37 NaN NaN NaN NaN
5 3/12 NaN NaN NaN NaN
print (df2)
caller_id port_stat adsl Comparison result
0 NaN SP 2/37 Not Match
1 NaN RE 2/24 Not Match
2 NaN SP 2/27 Match
3 NaN SP 2/33 Not Match
4 NaN SP 2/17 Match
df2 = df2.rename(columns={'status':'port_stat'})
d = {'adsl.1': lambda x: x['adsl']}
df2 = df2.assign(**d)
print (df2)
caller_id port_stat adsl Comparison result adsl.1
0 NaN SP 2/37 Not Match 2/37
1 NaN RE 2/24 Not Match 2/24
2 NaN SP 2/27 Match 2/27
3 NaN SP 2/33 Not Match 2/33
4 NaN SP 2/17 Match 2/17
df22 = df2[df2.columns.intersection(df1.columns)]
print (df22)
port_stat adsl Comparison result adsl.1
0 SP 2/37 Not Match 2/37
1 RE 2/24 Not Match 2/24
2 SP 2/27 Match 2/27
3 SP 2/33 Not Match 2/33
4 SP 2/17 Match 2/17
result = (df22.set_index('adsl')
.combine_first(df1.set_index('adsl'))
.reset_index()
.reindex(columns=df1.columns))
print (result)
adsl svc_no port_stat adsl.1 Comparison result
0 2/17 NaN SP 2/17 Match
1 2/24 NaN RE 2/24 Not Match
2 2/27 NaN SP 2/27 Match
3 2/33 NaN SP 2/33 Not Match
4 2/37 NaN SP 2/37 Not Match
5 3/12 NaN NaN NaN NaN
答案 1 :(得分:1)
您可以使用pandas的isin
功能:
result = df2.loc[df2['adsl'].isin(pri_df['adsl'])]
希望这对你有用。