我有一个脚本,该脚本具有到2个不同数据库的2个连接。我需要比较查询的结果,并显示常见元素以及结果之间的差异。
我有一个与数据帧进行比较的函数,它给出了差异和共同的元素,但是却给我一个错误。我认为是因为查询中列的名称不同。
def compare(a,b):
if a.equals(b):
print("SAME!")
else:
df = a.merge(b, how='outer',indicator=True)
x = df.loc[df['_merge'] == 'both', 'm.id']
y = df.loc[df['_merge'] == 'left_only', 'm.id']
z = df.loc[df['_merge'] == 'right_only', 'm.id']
print (f'Display Common Elements contained in Neo4j and MySQL: {", ".join(x)}')
print (f'Elements found only in Neo4j: {", ".join(y)}')
print (f'Elements found only in MySQL: {", ".join(z)}')
我希望
Common elements: C0012345
Elements found only in Neo4j: C027415, C189274
Elements found only in MySQL: C086356, C098876
答案 0 :(得分:2)
这可以工作
df1 = pd.DataFrame({"a" : ["1","2","3","4","5","6","7"]})
df2 = pd.DataFrame({"b" : ["1","3","2","9","11","23","4"]})
def compare(df1, df2):
result = pd.merge(df1,df2, how='outer', left_on='a', right_on='b')
missing_from_a = result.loc[pd.isna(result.a)].b
missing_from_b = result.loc[pd.isna(result.b)].a
have_both = result.loc[~pd.isna(result.b)].a.copy()
have_both.dropna(inplace=True)
print(", ".join(list(missing_from_b)))
print(", ".join(list(missing_from_a)))
print(", ".join(list(have_both)))
答案 1 :(得分:2)
除了合并上面已经描述过的@Anna Semjen之外,您还可以尝试使用isin()方法来查找哪个值是否在另一个数据帧中:
df1 = pd.DataFrame({0 : ["1","2","3","4","5","6","7"]}) # as MySQL
df2 = pd.DataFrame({"m.id" : ["1","3","2","9","11","23","4"]}) # as Neo4j
print('Elements found only in MySQL: '+ ','.join(list(df1[~df1[0].isin(df2['m.id'])].iloc[:,0].tolist())))
print('Elements found only in Neo4j: '+ ','.join(list(df2[~(df2['m.id'].isin(df1[0]))].iloc[:,0].tolist())))
print('Elements found in both Neo4j & MySQL: '+ ','.join(df1[df1[0].isin(df2['m.id'])].iloc[:,0].tolist()))
输出:
Elements found only in MySQL: 5,6,7
Elements found only in Neo4j: 9,11,23
Elements found in both Neo4j & MySQL: 1,2,3,4
希望这可以帮助您作为另一种方法的参考:)
答案 2 :(得分:0)
pd.DataFrame.compare
方法:我们可以像这样比较具有相同索引的数据框:
df1.compare(df2.rename(columns={'b':'a'}))
输出:
a
self other
1 2 3
2 3 2
3 4 9
4 5 11
5 6 23
6 7 4
pd.Series.compare
:df1['a'].compare(df2['b'])
输出:
self other
1 2 3
2 3 2
3 4 9
4 5 11
5 6 23
6 7 4