我正在尝试使用Pandas解决以下问题:
DataFrame 1:
Apple Banana Orange
Orange Banana Apple
Kiwi Lime Apple
Banana Apple Orange
DataFrame 2:
Orange Banana Apple
Apple Banana Orange
Apple Orange Apple
Kiwi Apple Apple
功能:
DataFrame 1 - DataFrame 2
输出:
Kiwi Lime Apple
Banana Apple Orange
本质上,我正在处理多列中的分类变量,并希望找到DataFrame 1中的行,但不想在DataFrame 2中找到行。我还想按顺序保留行,如输出。即不是这样:
Banana Apple Orange
Kiwi Lime Apple
答案 0 :(得分:1)
考虑使用pandas.merge,然后删除任何结果连接。
#!/usr/bin/python
import pandas as pd
df1 = pd.DataFrame({'Categ1':['Apple', 'Orange', 'Kiwi', 'Banana'],
'Categ2':['Banana', 'Banana', 'Lime', 'Apple'],
'Categ3':['Orange', 'Apple', 'Apple', 'Orange']})
df2 = pd.DataFrame({'Categ1':['Orange', 'Apple', 'Apple', 'Kiwi'],
'Categ2':['Banana', 'Banana', 'Orange', 'Apple'],
'Categ3':['Apple', 'Orange', 'Apple', 'Apple']})
# MERGE BOTH DATA FRAMES
merged = pd.merge(df1, df2, on=['Categ1', 'Categ2', 'Categ3'])
# DROP FROM ORIGINAL DF1 ANY ITEMS IN MERGED
df1 = df1.drop(merged.index.get_values())
数据框输出:
ORIGINAL DF1
Categ1 Categ2 Categ3
0 Apple Banana Orange
1 Orange Banana Apple
2 Kiwi Lime Apple
3 Banana Apple Orange
MERGED DF
Categ1 Categ2 Categ3
0 Apple Banana Orange
1 Orange Banana Apple
FINAL DF1
Categ1 Categ2 Categ3
2 Kiwi Lime Apple
3 Banana Apple Orange