熊猫更新1.1.0 +

Question

我有一个脚本，该脚本具有到2个不同数据库的2个连接。我需要比较查询的结果，并显示常见元素以及结果之间的差异。

我有一个与数据帧进行比较的函数，它给出了差异和共同的元素，但是却给我一个错误。我认为是因为查询中列的名称不同。

def compare(a,b):
    if a.equals(b):
       print("SAME!")
    else:
        df = a.merge(b, how='outer',indicator=True)
        x = df.loc[df['_merge'] == 'both', 'm.id']
        y = df.loc[df['_merge'] == 'left_only', 'm.id']
        z = df.loc[df['_merge'] == 'right_only', 'm.id']
        print (f'Display Common Elements contained in Neo4j and MySQL: {", ".join(x)}')
        print (f'Elements found only in Neo4j: {", ".join(y)}')
        print (f'Elements found only in MySQL: {", ".join(z)}')

我希望

Common elements: C0012345
Elements found only in Neo4j: C027415, C189274
Elements found only in MySQL: C086356, C098876

Answer 1

这可以工作

df1 = pd.DataFrame({"a" : ["1","2","3","4","5","6","7"]})
df2 = pd.DataFrame({"b" : ["1","3","2","9","11","23","4"]})

def compare(df1, df2):
    result = pd.merge(df1,df2, how='outer', left_on='a', right_on='b')
    missing_from_a = result.loc[pd.isna(result.a)].b
    missing_from_b = result.loc[pd.isna(result.b)].a
    have_both = result.loc[~pd.isna(result.b)].a.copy()
    have_both.dropna(inplace=True)
    print(", ".join(list(missing_from_b)))
    print(", ".join(list(missing_from_a)))
    print(", ".join(list(have_both)))

Answer 2

除了合并上面已经描述过的@Anna Semjen之外，您还可以尝试使用isin()方法来查找哪个值是否在另一个数据帧中：

df1 = pd.DataFrame({0 : ["1","2","3","4","5","6","7"]}) # as MySQL
df2 = pd.DataFrame({"m.id" : ["1","3","2","9","11","23","4"]}) # as Neo4j
print('Elements found only in MySQL: '+ ','.join(list(df1[~df1[0].isin(df2['m.id'])].iloc[:,0].tolist())))
print('Elements found only in Neo4j: '+ ','.join(list(df2[~(df2['m.id'].isin(df1[0]))].iloc[:,0].tolist())))
print('Elements found in both Neo4j & MySQL: '+ ','.join(df1[df1[0].isin(df2['m.id'])].iloc[:,0].tolist()))

输出：

Elements found only in MySQL: 5,6,7
Elements found only in Neo4j: 9,11,23
Elements found in both Neo4j & MySQL: 1,2,3,4

希望这可以帮助您作为另一种方法的参考：）

Answer 3

熊猫更新1.1.0 +

熊猫提供了一种`pd.DataFrame.compare`方法：

我们可以像这样比较具有相同索引的数据框：

df1.compare(df2.rename(columns={'b':'a'}))

输出：

     a      
  self other
1    2     3
2    3     2
3    4     9
4    5    11
5    6    23
6    7     4

或者我们可以像这样使用`pd.Series.compare`：

df1['a'].compare(df2['b'])

输出：

  self other
1    2     3
2    3     2
3    4     9
4    5    11
5    6    23
6    7     4

合并2个具有不同列名的数据框，以显示公共元素以及数据框之间的差异

3 个答案:

熊猫更新1.1.0 +

熊猫提供了一种`pd.DataFrame.compare`方法：

或者我们可以像这样使用`pd.Series.compare`：

合并2个具有不同列名的数据框，以显示公共元素以及数据框之间的差异

3 个答案:

熊猫更新1.1.0 +

熊猫提供了一种pd.DataFrame.compare方法：

或者我们可以像这样使用pd.Series.compare：

熊猫提供了一种`pd.DataFrame.compare`方法：

或者我们可以像这样使用`pd.Series.compare`：