Question

我目前有两个熊猫数据框：

sales = [{'account': 'Jones LLC', 'Jan': 150, 'Feb': 200, 'Mar': 140},
         {'account': 'Alpha Co',  'Jan': 200, 'Feb': 210, 'Mar': 215}]
sales2 = [{'account': 'Jones LLC', 'Jan': 150, 'Feb': 200, 'Mar': 140},
         {'account': 'Alpha Co',  'Jan': 200, 'Feb': 210, 'Mar': 215},
         {'account': 'Blue Inc',  'Jan': 50,  'Feb': 90,  'Mar': 95 }]
test_1 = pd.DataFrame(sales)
test_2 = pd.DataFrame(sales2)

我想要实现的是仅显示'test_2'中的差异，而不是'test_1'中的差异。

我目前拥有的代码将两个数据帧连接起来，并向我显示了两个数据帧的总差异，但是我想看看是否所有的内容都由'test_2'到'test_1'而不是相反：

def compare_dataframes(df1, df2):

    print 'Comparing dataframes...'
    df = pd.concat([df1, df2])
    df = df.reset_index(drop=True)
    df_gpby = df.groupby(list(df.columns))
    idx = [x[0] for x in df_gpby.groups.values() if len(x) == 1]
    compared_data = df.reindex(idx)
    if len(compared_data) > 1:
        print 'No new sales on site!'
    else:
        print 'New sales on site!'
        print(compared_data)

如何使当前功能适应这样的工作？

Answer 1

将merge与外部联接和indicator参数一起使用：

df = test_1.merge(test_2, how='outer', indicator=True)
print (df)
   Feb  Jan  Mar    account      _merge
0  200  150  140  Jones LLC        both
1  210  200  215   Alpha Co        both
2   90   50   95   Blue Inc  right_only

然后仅按boolean indexing过滤right_only行：

only2 = df[df['_merge'] == 'right_only']
print (only2)
   Feb  Jan  Mar   account      _merge
2   90   50   95  Blue Inc  right_only

感谢@Jon Clements提供带回调的一线解决方案：

only2 = test_1.merge(test_2, how='outer', indicator=True)[lambda r: r._merge == 'right_only']
print (only2)
   Feb  Jan  Mar   account      _merge
2   90   50   95  Blue Inc  right_only

或使用query：

only2 = test_1.merge(test_2, how='outer', indicator=True).query("_merge == 'right_only'")

Answer 2

import pandas as pd
import numpy as np
sales = [{'account': 'Jones LLC', 'Jan': 150, 'Feb': 200, 'Mar': 140},
         {'account': 'Alpha Co',  'Jan': 200, 'Feb': 210, 'Mar': 215}]
sales2 = [{'account': 'Jones LLC', 'Jan': 150, 'Feb': 200, 'Mar': 140},
         {'account': 'Alpha Co',  'Jan': 200, 'Feb': 210, 'Mar': 215},
         {'account': 'Blue Inc',  'Jan': 50,  'Feb': 90,  'Mar': 95 }]
test_1 = pd.DataFrame(sales)
test_2 = pd.DataFrame(sales2)
test_3 = test_1.append(test_2).drop_duplicates(keep=False)
print (test_3)

它打印不同的行

如何比较两个熊猫数据框并在数据框2中显示差异

2 个答案: