Question

根据多列的组合计算两个数据帧的差异的最佳方法是什么。如果我有以下内容：

DF1：

  A B C
0 1 2 3
1 3 4 2

DF2：

  A B C
0 1 2 3
1 3 5 2

想要显示存在差异的所有行，例如上面示例中的（3,4,2）与（3,5,2）。我已经尝试使用pd.merge（）认为如果我使用所有列作为使用外连接加入的键，我最终会得到数据帧，这将帮助我得到我想要的但它没有结果那样。

感谢EdChum我可以使用布尔差异中的掩码，但首先必须确保索引具有可比性。

df1 = df1.set_index('A')
df2 = df2.set_index('A') #this gave me a nice index using one of the keys.
                  #if there are different rows than I would get nulls. 
df1 = df1.reindex_like(df2)
df1[~(df1==df2).all(axis=1)] #this gave me all rows that differed.

Answer 1

我们可以使用.all并传递axis=1来执行行比较，然后我们可以使用这个布尔索引通过否定~布尔索引来显示不同的行：

In [43]:

df[~(df==df1).all(axis=1)]
Out[43]:
   A  B  C
1  3  4  2

打破这个：

In [44]:

df==df1
Out[44]:
      A      B     C
0  True   True  True
1  True  False  True
In [45]:

(df==df1).all(axis=1)
Out[45]:
0     True
1    False
dtype: bool

然后我们可以将上面作为布尔索引传递给df并使用~

将其反转

将pandas数据帧与多列进行比较

1 个答案: