具有两个带有列的数据框
df1
name cell marks
tom 2 21862
df2
name cell marks passwd
tom 2 11111 2548
matt 2 158416 2483
2 21862 26846
如何比较df2和df1并获取最接近的匹配数据帧
expected_output:
df2
name cell marks passwd
tom 2 11111 2548
2 21862 26846
尝试merge
,但数据是动态的。在一种情况下,name
可能会发生变化,而在另一种情况下,marks
可能会发生变化
答案 0 :(得分:1)
您可以尝试以下操作:
import pandas as pd
dict1 = {'name': ['tom'], 'cell': [2], 'marks': [21862]}
dict2 = {'name': ['tom', 'matt'], 'cell': [2, 2],
'marks': [21862, 158416], 'passwd': [2548, 2483]}
df1 = pd.DataFrame(dict1)
df2 = pd.DataFrame(dict2)
compare = df2.isin(df1)
df2 = df2.iloc[df2.where(compare).dropna(how='all').index]
print(df2)
输出:
name cell marks passwd
0 tom 2 21862 2548
答案 1 :(得分:1)
您可以将pandas.merge
与选项indicator=True
一起使用,过滤'both'
的结果:
import pandas as pd
df1 = pd.DataFrame([['tom', 2, 11111]], columns=["name", "cell", "marks"])
df2 = pd.DataFrame([['tom', 2, 11111, 2548],
['matt', 2, 158416, 2483]
], columns=["name", "cell", "marks", "passwd"])
def compare_dataframes(df1, df2):
"""Find rows which are similar between two DataFrames."""
comparison_df = df1.merge(df2,
indicator=True,
how='outer')
return comparison_df[comparison_df['_merge'] == 'both'].drop(columns=["_merge"])
print(compare_dataframes(df1, df2))
返回:
name cell marks passwd
0 tom 2 11111 2548