计算Python中2个数据框的匹配百分比

时间:2020-07-05 13:22:47

标签: python pandas

使用df2df1First_NameLast_Name上加入Email时,如何计算可以{匹配df2

df1

df1:

First_Name Last_Name Email Value1 0 Aaron Potter aaronpotter@gmail.com 10 1 Bella Granger bellagranger@gmail.com 2 2 Ron Black black@hotmail.com 20 3 Harry Weasley harryweasley@hotmail.com 11

df2

例如,在这种情况下,匹配百分比是4分之2。

2 个答案:

答案 0 :(得分:2)

@anky对于此问题有很好的解决方案。我将在indicator中提供merge参数,以便直观地查看匹配项。

df_out = df1.merge(df2, on = ['First_Name', 'Last_Name', 'Email'], 
          indicator='Matched', how='out')
df_out

输出:

  First_Name Last_Name                     Email  Value1  Value2     Matched
0      Aaron    Potter     aaronpotter@gmail.com    10.0    10.0        both
1      Bella   Granger    bellagranger@gmail.com     2.0     2.0        both
2        Ron     Black         black@hotmail.com    20.0     NaN   left_only
3      Harry   Weasley  harryweasley@hotmail.com    11.0     NaN   left_only
4     Ronald     Black   ronaldblack@hotmail.com     NaN     5.0  right_only
5      Harry   Weasley     tomriddle@hotmail.com     NaN    20.0  right_only

或者,左联接:

df_out = df1.merge(df2, on = ['First_Name', 'Last_Name', 'Email'], 
          indicator='Matched', how='left')
print(df_out)

输出:

  First_Name Last_Name                     Email  Value1  Value2    Matched
0      Aaron    Potter     aaronpotter@gmail.com      10    10.0       both
1      Bella   Granger    bellagranger@gmail.com       2     2.0       both
2        Ron     Black         black@hotmail.com      20     NaN  left_only
3      Harry   Weasley  harryweasley@hotmail.com      11     NaN  left_only

并使用@anky的解决方案:

(df_out['Matched'] == 'both').sum()/df_out.shape[0]

输出:

0.5

答案 1 :(得分:1)

@Scott Boston的答案是完美的!如果只有“ First_Name”,“ Last_Name”和“ Email”,则可以使用以下代码。

df = pd.concat([df1[['First_Name','Last_Name','Email']],df2[['First_Name','Last_Name','Email']]])
df = df.reset_index(drop=True)
gb = df.groupby(list(df.columns))
idx = [x[0] for x in gb.groups.values() if len(x) == 2]
df.reindex(idx)

    First_Name  Last_Name   Email
0   Aaron   Potter  aaronpotter@gmail.com
1   Bella   Granger bellagranger@gmail.com
相关问题