只要DF1中的A,B,C和D列匹配DF2,则打印“ True”(在DF1中的新列“ Test”中),否则打印“ False”。
我的代码:
import pandas as pd
import numpy as np
DF1= pd.DataFrame({'A':['a','a','b','b','b','c','c','d'],
'A1':['a11','a11','b11','b11','c11','c11','c11','d11'],
'B':['a1','a2','b1','b2','b3','c1','c2','d1'],
'C':[1,0,np.nan,2,3,1,3,5],
'D':[2.5,4.3,7.2,13.1,3.0,6.2,3.8,78.5]})
DF2= pd.DataFrame({'A':['a','b','b','c','c'],
'B':['a1','b1','b3','c1','c2'],
'C':[1,np.nan,3,1,3],
'D':[2.5,7.2,14.5,6.2,59.2]})
a_s = DF1['A'].unique().tolist()
b_s = DF1['B'].unique().tolist()
c_s = DF1['C'].unique().tolist()
for i in a_s:
for j in b_s:
for k in c_s:
a1=DF1.loc[(DF1.A==i) & (DF1.B==j) & (DF1.C==k),'D'].values
b1=DF2.loc[(DF2.A==i) & (DF2.B==j) & (DF2.C==k),'D'].values
if (len(a1)!=0) & (len(b1)!=0):
if a1==b1:
DF1.loc[(DF1.A==i) & (DF1.B==j) & (DF1.C==k),'Test'] = "True"
else:
DF1.loc[(DF1.A==i) & (DF1.B==j) & (DF1.C==k),'Test'] = "False"
结果:
A A1 B C D Test
0 a a11 a1 1.0 2.5 True
1 a a11 a2 0.0 4.3 NaN
2 b b11 b1 NaN 7.2 NaN
3 b b11 b2 2.0 13.1 NaN
4 b c11 b3 3.0 3.0 False
5 c c11 c1 1.0 6.2 True
6 c c11 c2 3.0 3.8 False
7 d d11 d1 5.0 78.5 NaN
问题:
1。代码很慢。
2.如果必须测试更多列(例如15列),则很难扩展代码。
编辑: 问题寻找答案:
A A1 B C D Test
0 a a11 a1 1.0 2.5 True
1 a a11 a2 0.0 4.3 False
2 b b11 b1 NaN 7.2 True
3 b b11 b2 2.0 13.1 False
4 b c11 b3 3.0 3.0 False
5 c c11 c1 1.0 6.2 True
6 c c11 c2 3.0 3.8 False
7 d d11 d1 5.0 78.5 False
答案 0 :(得分:3)
您只需要merge
DF1=DF1.merge(DF2,indicator=True,how='left')
DF1._merge=DF1._merge.eq('both')
DF1
A A1 B C D _merge
0 a a11 a1 1.0 2.5 True
1 a a11 a2 0.0 4.3 False
2 b b11 b1 NaN 7.2 True
3 b b11 b2 2.0 13.1 False
4 b c11 b3 3.0 3.0 False
5 c c11 c1 1.0 6.2 True
6 c c11 c2 3.0 3.8 False
7 d d11 d1 5.0 78.5 False
答案 1 :(得分:2)
我写了一段代码来解决这个问题,尝试一下
DF3 = DF1[['A', 'B', 'C', 'D']].fillna('None')
DF4 = DF2[['A', 'B', 'C', 'D']].fillna('None')
DF1['Test'] = 'F'
for i in range(DF3.shape[0]):
for k in range(DF4.shape[0]):
if list(DF3.iloc[i]) == list(DF4.iloc[k]):
DF1['Test'][i] = 'T'
输出:
DF1
Out[190]:
A A1 B C D Test
0 a a11 a1 1.0 2.5 T
1 a a11 a2 0.0 4.3 F
2 b b11 b1 NaN 7.2 T
3 b b11 b2 2.0 13.1 F
4 b c11 b3 3.0 3.0 F
5 c c11 c1 1.0 6.2 T
6 c c11 c2 3.0 3.8 F
7 d d11 d1 5.0 78.5 F
答案 2 :(得分:1)
数据框的形状不相等
if len(DF1) > len(DF2):
max_length,false_nos=len(DF2),len(DF1)
else:
max_length,false_nos=len(DF1),len(DF2)
new_df=DF1.copy()
new_df['Test']=(DF1[required_columns].iloc[:max_length,].values==DF2[required_columns].iloc[:max_length].values).all(1).tolist()+([False]*(false_nos-max_length))