具有两个数据帧
import pandas as pd
df1 = pd.DataFrame([['tom', 2, 11111]], columns=["name", "cell", "marks"])
df2 = pd.DataFrame([['tomm', 2, 11111, 2548],
['matt', 2, 158416, 2483],
['tonmmm', 2, 11111, 2549]
], columns=["name", "cell", "marks", "passwd"])
输入
df1
name cell marks
0 tom 2 11111
df2
name cell marks passwd
0 tomm 2 11111 2548
1 matt 2 158416 2483
2 tonmmm 2 11111 2549
映射两个具有相似列的数据框
从df2中获取匹配的列至少为2。这里cell
和marks
与具有2个值的df1匹配
预期输出:
name cell marks passwd
0 tomm 2 11111 2548
1 tonmmm 2 11111 2549
答案 0 :(得分:2)
您可以尝试以下方法:
df1 = pd.DataFrame([['tom', 2, 11111]], columns=["name", "cell", "marks"])
df2 = pd.DataFrame([['tomm', 2, 11111, 2548],
['matt', 2, 158416, 2483],
['tonmmm', 2, 11111, 2549]
], columns=["name", "cell", "marks", "passwd"])
temp=[len([i for i in list(row)[1:] if i in list(df1.iloc[0,:])])>=2 for row in df2[df2.columns[:len(df2.columns)-1]].to_records()]
newdf=df2[temp]
print(newdf)
输出:
name cell marks passwd
0 tomm 2 11111 2548
2 tonmmm 2 11111 2549
编辑:如果要根据匹配数对它进行排序,可以尝试:
import pandas as pd
import numpy as np
df1 = pd.DataFrame([['tom', 2, 11111]], columns=["name", "cell", "marks"])
df2 = pd.DataFrame([['tomm', 2, 11111, 2548],['matt', 2, 158416, 2483], ['tom', 2, 11111, 2549]], columns=["name", "cell", "marks", "passwd"])
temp=[len([i for i in list(row)[1:] if i in list(df1.iloc[0,:])]) for row in df2[df2.columns[:len(df2.columns)-1]].to_records()]
newdf=df2.copy().assign(val=temp).sort_values(by='val',ascending=False)
mask=np.where(newdf.val.ge(2), True, False)
newdf=newdf.drop(['val'],axis=1).reset_index(drop=True)[mask]
print(newdf)
输出:
name cell marks passwd
0 tom 2 11111 2549
1 tomm 2 11111 2548