Question

具有两个df，每个df具有相同的列，但行数不同。

想在df1中添加一列，以对df2中与df1的多个（并非全部）列条件匹配的行进行计数。

最好是最快/最有效的方法，因为我将拥有N对df，每对nf。

伪装：

count = where (df1.one == df2.one) AND 
              (df1.two between (df2.two * 0.9 AND df2.two * 1.1)) AND
              (df1.three == df2.three) AND
              (df1.four == df2.four)

我已经尝试了许多迭代：

df1['count'] = np.count_nonzero((df1.one==df2.one)&(con2)..) 
               and df1['count'] = sum((con1)&(con2)..)
               and df1['count'] = len(df1.loc((con1)&(con2)..))

使用.isin()和.values等，但出现Value错误，基本上告诉我df的大小不同。根据我在这里找到的其他答案，还尝试过重置索引的etc等。

我看到了一些使用merge和groupby回答的类似问题，但是不确定是否可以使用我拥有的条件数量+我的某些条件是“介于”还是“范围查找”。

谢谢！

示例df：

first = [(1001,'', 10, 'KK', 5),
         (1002,'A', 9, 'QK' , 7),
         (1003,'B', 11, 'QQ', 11) 
        ]

second = [(1004,'', 10.5, 'KK', 5),
          (1005,'', 9.9, 'KK', 5),
          (1006,'', 10, 'KK', 5),
          (1007,'', 10, 'KQ', 5),
          (1008,'A', 7, 'QK' , 9),
          (1009,'A', 9.1, 'QK' , 7),
          (1010,'A', 9, 'QK' , 7),
          (1011,'A', 9, 'KK' , 7),
          (1012,'B', 12, 'KQ', 9),
          (1013,'B', 11, 'QQ', 11),
          (1014,'B', 11, 'QK', 12),
          (1015,'B', 1, 'QQ', 11)
         ]

df1 = pd.DataFrame(first, columns=['ID', 'one', 'two', 'three','four'])
df2 = pd.DataFrame(second, columns=['ID', 'one', 'two', 'three','four'])

df1：

     ID one  two three  four
0  1001       10    KK     5
1  1002   A    9    QK     7
2  1003   B   11    QQ    11

df2

      ID one   two three  four
0   1004      10.5    KK     5
1   1005       9.9    KK     5
2   1006      10.0    KK     5
3   1007      10.0    KQ     5
4   1008   A   7.0    QK     9
5   1009   A   9.1    QK     7
6   1010   A   9.0    QK     7
7   1011   A   9.0    KK     7
8   1012   B  12.0    KQ     9
9   1013   B  11.0    QQ    11
10  1014   B  11.0    QK    12
11  1015   B   1.0    QQ    11

所需的输出（df1）：

     ID one  two three  four  count
0  1001       10    KK     5     3
1  1002   A    9    QK     7     2
2  1003   B   11    QQ    11     1

Answer 1

您可以尝试以下方法：

def condition(x):
    return df2[(x.one == df2.one) & (x.three == df2.three) & 
               (x.four == df2.four) & (x.two > (df2.two*0.9)) & 
               (x.two < (df2.two*1.1))].shape[0]

df1['count'] = df1.apply(lambda x: condition(x), axis=1)

df1
    ID   one  two three  four  count
0  1001       10    KK     5      3
1  1002   A    9    QK     7      2
2  1003   B   11    QQ    11      1

向df中添加一列，其中包含满足多个条件的单独df中的行数

1 个答案: