熊猫:DataFrame差异函数

时间:2015-08-14 02:36:47

标签: python pandas

我正在尝试使用Pandas解决以下问题:

DataFrame 1:

Apple  Banana Orange
Orange Banana Apple
Kiwi   Lime   Apple
Banana Apple  Orange

DataFrame 2:

Orange Banana Apple
Apple  Banana Orange
Apple  Orange Apple
Kiwi   Apple  Apple

功能:

DataFrame 1 - DataFrame 2

输出:

Kiwi   Lime  Apple
Banana Apple  Orange

本质上,我正在处理多列中的分类变量,并希望找到DataFrame 1中的行,但不想在DataFrame 2中找到行。我还想按顺序保留行,如输出。即不是这样:

Banana Apple  Orange
Kiwi   Lime  Apple

1 个答案:

答案 0 :(得分:1)

考虑使用pandas.merge,然后删除任何结果连接。

#!/usr/bin/python
import pandas as pd

df1 = pd.DataFrame({'Categ1':['Apple', 'Orange', 'Kiwi', 'Banana'],
                    'Categ2':['Banana', 'Banana', 'Lime', 'Apple'],
                    'Categ3':['Orange', 'Apple', 'Apple', 'Orange']})

df2 = pd.DataFrame({'Categ1':['Orange', 'Apple', 'Apple', 'Kiwi'],
                    'Categ2':['Banana', 'Banana', 'Orange', 'Apple'],
                    'Categ3':['Apple', 'Orange', 'Apple', 'Apple']})

# MERGE BOTH DATA FRAMES   
merged = pd.merge(df1, df2, on=['Categ1', 'Categ2', 'Categ3'])

# DROP FROM ORIGINAL DF1 ANY ITEMS IN MERGED
df1 = df1.drop(merged.index.get_values())

数据框输出:

ORIGINAL DF1
   Categ1  Categ2  Categ3
0   Apple  Banana  Orange
1  Orange  Banana   Apple
2    Kiwi    Lime   Apple
3  Banana   Apple  Orange

MERGED DF
   Categ1  Categ2  Categ3
0   Apple  Banana  Orange
1  Orange  Banana   Apple

FINAL DF1
   Categ1 Categ2  Categ3
2    Kiwi   Lime   Apple
3  Banana  Apple  Orange