我有2个包含字符串值的DataFrame。它们的大小也不同。我想显示2个DataFrame之间的共同点和区别。
我的方法是:我创建了一个功能compare(DataFrame1,DataFrame2),该函数将使用equals方法比较2个DataFrame。如果它们是相同的,那么我就不需要再寻找差异了。我需要第二个函数,该函数实际上将显示DataFrame之间的差异。有人可以帮我继续吗?
def test2_expansion():
test1 = graph.run('match (n:Disease)-[:HAS_CHILD]->(m:Disease) where n.id ="C0039446" return distinct m.id order by m.id;')
test1 = pd.DataFrame(test1.data())
return test1
g = test2_expansion()
g = g.to_dict(orient='list')
print ("The result of test 2 for expansion in Neo4j is ")
for key, value in g.items():
for n in value:
print(n)
def compareResults(a,b):
if a.equals(b):
return True
else:
return False
def takeDifferences():
a = "Search differences"
if (compareResult() == True):
return "Amaizing!"
else:
return a
DataFrame1
C0494228
C0272078
C2242772
DataFrame2
C2242772
C1882062
C1579212
C1541065
C1306459
C0442867
C0349036
C0343748
C0027651
C0272078
Display Common Elements: C0272078 C2242772
Elements found only in DataFrame1:C0494228
Elements found only in DataFrame2:C2242772
C1882062
C1579212
C1541065
C1306459
C0442867
C0349036
C0343748
C0027651
答案 0 :(得分:1)
如果DataFrames的列相同-例如m.id
与indicator
参数一起使用DataFrame.merge
:
df = df1.merge(df2, how='outer', indicator=True)
print (df)
m.id _merge
0 C0494228 left_only
1 C0272078 both
2 C2242772 both
3 C1882062 right_only
4 C1579212 right_only
5 C1541065 right_only
6 C1306459 right_only
7 C0442867 right_only
8 C0349036 right_only
9 C0343748 right_only
10 C0027651 right_only
然后按boolean indexing
进行过滤:
a = df.loc[df['_merge'] == 'both', 'm.id']
b = df.loc[df['_merge'] == 'left_only', 'm.id']
c = df.loc[df['_merge'] == 'right_only', 'm.id']
最后f-string
个连接值:
print (f'Display Common Element: {", ".join(a)}')
Display Common Element: C0272078, C2242772
print (f'Elements found only in DataFrame1: {", ".join(b)}')
Elements found only in DataFrame1: C0494228
print (f'Elements found only in DataFrame2: {", ".join(c)}')
Elements found only in DataFrame2: C1882062, C1579212, C1541065,
C1306459, C0442867, C0349036,
C0343748, C0027651
答案 1 :(得分:0)
我现在可以向您展示我的通用函数,它将回答我的问题
def compare(a,b):
if a.equals(b):
print("SAME!")
else:
df = a.merge(b, how='outer',indicator=True)
x = df.loc[df['_merge'] == 'both', 'm.id']
y = df.loc[df['_merge'] == 'left_only', 'm.id']
z = df.loc[df['_merge'] == 'right_only', 'm.id']
print (f'Display Common Element: {", ".join(x)}')
print (f'Elements found only in DataFrame1: {", ".join(y)}')
print (f'Elements found only in DataFrame2: {", ".join(z)}')
在这一刻,我的函数返回None,因为我不知道是否应该返回某些东西,但是它运行良好。谢谢@jezrael