在多个条件下比较两个不同大小的数据框

时间:2019-01-13 14:45:56

标签: python pandas

只要DF1中的A,B,C和D列匹配DF2,则打印“ True”(在DF1中的新列“ Test”中),否则打印“ False”。

我的代码:

import pandas as pd
import numpy as np

DF1= pd.DataFrame({'A':['a','a','b','b','b','c','c','d'],
                   'A1':['a11','a11','b11','b11','c11','c11','c11','d11'],
                   'B':['a1','a2','b1','b2','b3','c1','c2','d1'],
                   'C':[1,0,np.nan,2,3,1,3,5],
                   'D':[2.5,4.3,7.2,13.1,3.0,6.2,3.8,78.5]})

DF2= pd.DataFrame({'A':['a','b','b','c','c'],
                  'B':['a1','b1','b3','c1','c2'],
                  'C':[1,np.nan,3,1,3],
                  'D':[2.5,7.2,14.5,6.2,59.2]})

a_s = DF1['A'].unique().tolist()
b_s = DF1['B'].unique().tolist()
c_s = DF1['C'].unique().tolist()

for i in a_s:
    for j in b_s:
        for k in c_s:

            a1=DF1.loc[(DF1.A==i) & (DF1.B==j) & (DF1.C==k),'D'].values
            b1=DF2.loc[(DF2.A==i) & (DF2.B==j) & (DF2.C==k),'D'].values

            if (len(a1)!=0) & (len(b1)!=0):

                if a1==b1:
                    DF1.loc[(DF1.A==i) & (DF1.B==j) & (DF1.C==k),'Test'] = "True"
                else:
                    DF1.loc[(DF1.A==i) & (DF1.B==j) & (DF1.C==k),'Test'] = "False"

结果:

    A   A1  B   C   D   Test
0   a   a11 a1  1.0 2.5 True
1   a   a11 a2  0.0 4.3 NaN
2   b   b11 b1  NaN 7.2 NaN
3   b   b11 b2  2.0 13.1    NaN
4   b   c11 b3  3.0 3.0 False
5   c   c11 c1  1.0 6.2 True
6   c   c11 c2  3.0 3.8 False
7   d   d11 d1  5.0 78.5    NaN

问题:

1。代码很慢。
2.如果必须测试更多列(例如15列),则很难扩展代码。

编辑: 问题寻找答案:

    A   A1  B   C   D   Test
0   a   a11 a1  1.0 2.5 True
1   a   a11 a2  0.0 4.3 False
2   b   b11 b1  NaN 7.2 True
3   b   b11 b2  2.0 13.1    False
4   b   c11 b3  3.0 3.0 False
5   c   c11 c1  1.0 6.2 True
6   c   c11 c2  3.0 3.8 False
7   d   d11 d1  5.0 78.5    False

3 个答案:

答案 0 :(得分:3)

您只需要merge

DF1=DF1.merge(DF2,indicator=True,how='left')

DF1._merge=DF1._merge.eq('both')
DF1
   A   A1   B    C     D  _merge
0  a  a11  a1  1.0   2.5    True
1  a  a11  a2  0.0   4.3   False
2  b  b11  b1  NaN   7.2    True
3  b  b11  b2  2.0  13.1   False
4  b  c11  b3  3.0   3.0   False
5  c  c11  c1  1.0   6.2    True
6  c  c11  c2  3.0   3.8   False
7  d  d11  d1  5.0  78.5   False

答案 1 :(得分:2)

我写了一段代码来解决这个问题,尝试一下

DF3 = DF1[['A', 'B', 'C', 'D']].fillna('None')
DF4 = DF2[['A', 'B', 'C', 'D']].fillna('None')
DF1['Test'] = 'F'
for i in range(DF3.shape[0]):
    for k in range(DF4.shape[0]):        
        if list(DF3.iloc[i]) == list(DF4.iloc[k]):
            DF1['Test'][i] = 'T'

输出:

DF1
Out[190]: 
   A   A1   B    C     D Test
0  a  a11  a1  1.0   2.5    T
1  a  a11  a2  0.0   4.3    F
2  b  b11  b1  NaN   7.2    T
3  b  b11  b2  2.0  13.1    F
4  b  c11  b3  3.0   3.0    F
5  c  c11  c1  1.0   6.2    T
6  c  c11  c2  3.0   3.8    F
7  d  d11  d1  5.0  78.5    F   

答案 2 :(得分:1)

数据框的形状不相等

if len(DF1) > len(DF2):
    max_length,false_nos=len(DF2),len(DF1)

else:
    max_length,false_nos=len(DF1),len(DF2)
new_df=DF1.copy()
new_df['Test']=(DF1[required_columns].iloc[:max_length,].values==DF2[required_columns].iloc[:max_length].values).all(1).tolist()+([False]*(false_nos-max_length))