比较两个数据框并获取最接近的匹配数据框

时间:2020-06-29 18:59:20

标签: python python-3.x pandas dataframe compare

具有两个带有列的数据框

df1


name    cell     marks  

tom      2       21862


df2


name    cell    marks     passwd

tom      2       11111      2548

matt     2       158416      2483
         2       21862      26846

如何比较df2和df1并获取最接近的匹配数据帧

expected_output:

df2


name    cell    marks     passwd

tom      2       11111      2548
         2       21862      26846

尝试merge,但数据是动态的。在一种情况下,name可能会发生变化,而在另一种情况下,marks可能会发生变化

2 个答案:

答案 0 :(得分:1)

您可以尝试以下操作:

import pandas as pd
dict1 = {'name': ['tom'], 'cell': [2], 'marks': [21862]}
dict2 = {'name': ['tom', 'matt'], 'cell': [2, 2],
         'marks': [21862, 158416], 'passwd': [2548, 2483]}

df1 = pd.DataFrame(dict1)
df2 = pd.DataFrame(dict2)

compare = df2.isin(df1)
df2 = df2.iloc[df2.where(compare).dropna(how='all').index]
print(df2)

输出:

  name  cell  marks  passwd
0  tom     2  21862    2548

答案 1 :(得分:1)

您可以将pandas.merge与选项indicator=True一起使用,过滤'both'的结果:

import pandas as pd

df1 = pd.DataFrame([['tom', 2, 11111]], columns=["name", "cell", "marks"])

df2 = pd.DataFrame([['tom', 2, 11111, 2548],
                    ['matt', 2, 158416, 2483]
                    ], columns=["name", "cell", "marks", "passwd"])


def compare_dataframes(df1, df2):
    """Find rows which are similar between two DataFrames."""
    comparison_df = df1.merge(df2,
                              indicator=True,
                              how='outer')
    return comparison_df[comparison_df['_merge'] == 'both'].drop(columns=["_merge"])


print(compare_dataframes(df1, df2))

返回:

  name  cell  marks  passwd
0  tom     2  11111    2548